Diagnosing server or network availability

Network diagnosis with pingplotter

Pingplotter is a Windows program, similar to the traceroute command, that shows the path between a client and a server, but that does so in a much clearer fashion. A freeware version can be downloaded from

http://www.pingplotter.com

Pingplotter perfect.png

This picture shows an almost perfect path from the client to the server. The second column lists the packet loss, so the test packets that got lost. Since this example didn't have any packet loss this column is empty.

In the columns IP and DNSName the path that the test packets take can be traced. The path starts with the Munich provider M-Net, at hop 6 and 7 (at the peering node INXS) given to Noris and then finally in hop 9 (at the peering node NIX) given to the Hetzner network.

The columns Avg and Cur show the amount of time it takes a packet (in ms) to get to the hop: Avg is the average time, Cur shows the value of the last test packet.

At the end the the timing of the packet transmissions is shown graphically. "Outliers" can be quickly identified by the course of the red line. However, you can only draw limited conclusions from the position of the line: the black lines show the different between the fastest and slowest reply for the particular hop. If a hop responds particularly slowly however, then the red line moves to the left (since more space is needed to the right).

mtr

If you don't have Pingplotter then you can get the same basic functionality with the program mtr (WinMTR for Windows). How mtr works is described here.

Typical errors

It's "lagging"!

Ping times increase, terminal sessions become jerky and games become unplayable. The server is not always the culprit, though using pings as a diagnostic tool is not very helpful:

Pingplotter ping.png

Pingplotter offers much better information:

Pingplotter lag.png

One can immediately see that the error (in this simulation) is in the network of M-Netz. Starting with hop 4 there are significant delays, packets from hop 5 are partially discarded (which can in turn be contributed to an error in hop 4) and hop 6 introduces excessive delays. Fault patterns such as this can be caused by severe shortages, such as those incurred by DDoS attacks.

Packet loss

Pingplotter loss.png

In this example the case is different: while there are many discarded packets starting at hop 10, the ping times of 37ms are in the normal range. This could be caused by a defective router or a faulty cable.

Everything is slow

Especially with DSL connections the following error screen is often seen:

Pingplotter trace.png

This traceroute does not really show the problem. Pingplotter is again more helpful:

Pingplotter lag local.png

A high load as early as the second hop points to an overload of the private internet connection, which can be caused for example by sending large amounts of data via email, clogging up the upstream ADSL connection. File-sharing tools that suck up bandwidth in the background and have long since been forgotten could also be a problem. A clear picture of the problem can be found by performing a trace in the opposite direction:

Pingplotter lag local rev.png

Everything looks find (green), up until the router of the client answering too late. Conceivable problems would be either a disruption of the internet connection or an overloading of the router.

Packet losses, which aren't real

Pingplotter Routerloss.png

This is not actually a problem/mistake, since not answering test packets is normal for routers. Depending on the configuration of the router it will route packets but only reply to test packets if there is enough free computing power. As long as the following hops don't also show packet loss there is no cause for concern.

Asymmetrical routing

A fault diagnosis is relatively easy if the round trip between the client and the server follows the same path. However, often packets will take a different return route (this is caused by different requirements of the provider of the client and the server location).

The following animation (Flash plug-in required) illustrates the problem:

Error example

In the following example the Hetzner network seems to be the cause of the delays. Starting with the first Hetzner router ping times are significantly increased and packets are discarded:

Client ---> Server

1 67 ms 65 ms 66 ms 62.26.xx.xx
2 63 ms 63 ms 65 ms 62.26.251.97
3 80 ms 74 ms 76 ms so-6-0-0.core3.f.tiscali.de (62.27.95.2)
4 74 ms 75 ms 73 ms so-3-0-0.fra30.ip.tiscali.net (213.200.64.25)
5 73 ms 74 ms 75 ms ffm-s2-rou-1071.DE.eurorings.net (80.81.192.22)
6 75 ms 74 ms 75 ms ffm-s1-rou-1001.DE.eurorings.net (134.222.227.65)
7 78 ms 79 ms 78 ms nbg-s1-rou-1071.DE.eurorings.net (134.222.227.30)
8 84 ms 78 ms 79 ms gi-0-1-286-nbg5.noris.net (134.222.107.26)
9 392 ms 393 ms * ne.gi-2-1.RS8000.RZ2.hetzner.de (213.133.96.65)
10 393 ms * 392 ms et-2-16.RS3000.RZ2.hetzner.de (213.133.96.38)
11 393 ms 392 ms * (...)

 

Only when a trace route in the opposite direction is made is the real source of the error recognized:

Server ---> Client

1 213.133.xx.xx (213.133.xx.xx) 0.233 ms 0.205 ms 0.281 ms
2 et-1-11.RS8000.RZ2.hetzner.de (213.133.96.37) 0.653 ms 0.660 ms 0.650 ms
3 nefkom-gw.hetzner.de (213.133.96.66) 1.119 ms 0.423 ms 0.415 ms
4 GW-SF-BBR-06-S2-3.nefkom.de (212.114.147.23) 0.635 ms 0.807 ms 0.457 ms
5 hsa2.mun1.pos6-0.eu.level3.net (212.162.44.25) 6.811 ms 6.347 ms 6.143 ms
6 ae0-19.mp1.Munich1.Level3.net (195.122.176.193) 315.587 ms 314.949 ms 315.164 ms
7 so-0-0-0.mp1.Frankfurt1.Level3.net (212.187.128.90) 301.324 ms 300.789 ms 300.742 ms
8 gige1-2.core1.Frankfurt1.Level3.net (195.122.136.101) 301.673 ms 300.853 ms 301.087 ms
9 de-cix.fra30.ip.tiscali.net (80.81.192.30) - 317.844 ms 317.634 ms
10 so-4-0-0.core3f.tiscali.de (213.200.64.26) 318.453 ms - 318.021 ms
11 so-1-0-0.core1.hh.tiscali.de (62.27.95.38) 307.780 ms 307.230 ms 307.252 ms
12 ge-2-0-0.7.core0.hh.tiscali.de (62.27.93.83) 307.431 ms 307.298 ms 307.084 ms
13 62.26.251.101 (62.26.251.101) - 307.753 ms 308.933 ms
14 (...) 390.856 ms 399.355 ms -

 


In this case there was an error between two Level3 routers in Munich. The problem seemed to be with Hetzner since the packages up to hope 8 (noris) were successfully routed back. Test packets to the Hetzner network and back however took a different route back, resulting in the delay in Munich.

  • 0 Користувачі вважають це корисним
Ця відповідь вам допомогла?

Схожі статті

Использование mtr для диагностики сети

В глобальной сети IP пакеты проходят множество транзитных узлов, каждый из которых может служить...