15. 01. 2016 Luca Di Stefano NetEye, Real User Experience

Network Performance: Explicit Congestion Notification

When the network apparatuses are asking for help.

We often hear of network congestion, perhaps a little less often than what are the mechanisms that try to manage it.

The primary mechanism for managing the congestion is the drop packet. When the apparatus is in trouble it throws packets in a pseudo-random order to save time (and bandwidth), by trusting that the application protocols are able to manage their packet loss.

What are the effects of this situation for the applications that receive the packet loss?

There are two side effects that affect the performance of the transmission:

  1. The protocol / the application must realize that they have lost a packet,it should take some time to let this happen (RTO retransmission time out). In the Windows operating systems the RTO begins usually after 3 seconds.
  2. In the TCP protocol, when there is a retransmission, the transmission rate is halved, and is increased later in small steps (slow start) during the transmission.So a packet loss can lead to deadlocks also for a few seconds in the transmission, for example to load a small web page and this has of course a negative effect on the experience of the end user.Is there a mechanism for managing the congestion without drastically affecting the throughput of the transmission?In 2001, EMC and TeraOptics have proposed the ‘Explicit Congestion Notification (ECN RFC 3168 ) a mechanism by which network apparatus that realize they are in a situation that can lead to congestion, indicate this state to the recipient of the packets.Explicit congestionThe network apparatuses are asking for help by raising a flag that is saying to the sender of the packets to send them a little slower. If the tcp receives these notifications, the transmission rate will be reduced.
    If the TCP protocol receives these packets with the help notification, the TCP will send a request to the sender to reduce the transmission rate.
    If everything goes well the rate of connection drops and the network apparatus can exit the critical state without problems on the throughput and without RTO.On the other hand, you could also have two situations where the mechanism does not work:
    1. the operating system of the client or of the server or of the network apparatuses are not supporting ECN.
    2. The packets that contain the requests for help or those who ask the sender to reduce the rate, could be dropped by other apparatuses.

The Real User Experience is able to identify the packets which contain the ECN flag. To do so it is sufficient that the network apparatuses have ECN activated. This can be done by configuring the queuing mechanisms such as RED, WRED, GRED, HIGH depending on the vendor.

congestion_aggregate  congestion_details

The information about the status of congestion of a network device helps us to quickly localize the cause of the performance reduction.

As it can be seen in the example above, we have requests between two IPs in the local network, which “suffer” packet drops (see field “Retransmission”). This drops are likely to be caused from a network device which has send its state of congestion (see field “Congestion”). We don’t know the identity of the device in question, but have the possibility to simply follow the route of the packets and to verify the statistics of the devices which meet each other to identify them.

Using Nedi integration we can discover which are the switch/router bound to the client or server for go further in analysis of the network device:ecn-nedi-nav

Hint: Eventually, you are asking yourself, which is the number of congestions starting from which you should pay particular attention to them. A congestion does not necessarily require excitement. I personally suggest you to observe the development of the congestions. If they become more frequent, you should further investigate on the cause of them and therefore follow the packet-route. In this way, you will be able to identify emerging bottlenecks before they have a negative impact on the application performance experienced by the users.

In order that the recovery mechanism of the state of congestion (reduction of the transmission rate) works, it is necessary that also the clients and servers have ECN activated. Generally, all recent operating systems support the ECN, even if in some case the mechanism is disabled by default:

Microsoft Windows

From Windows Server 2012 is enabled by default. In the previous versions and in the client versions is disabled by default.

The mechanism can be enabled with: netsh interface tcp set global ecncapability = enabled

Linux

From kernel 2.4.20 it supports the ECN in these ways, configuring it through the sysctl interface by setting parameter /proc/sys/net/ipv4/tcp_ecn :

0 – disable ECN and neither initiate nor accept it

1 – enable ECN When requested by incoming connections, and anche request ECN on outgoing connection attempts

2 – (default) enable ECN When requested by incoming connections, but do not request ECN on outgoing connections

Mac OS X

In the version 10.5 it supports ECN but it is disabled by default. It can be enabled by setting sysctl variables net.inet.tcp.ecn_negotiate_in

In the version 11.10 it is enabled by default

iOS

In the version 9 ECN is enabled by default

 

Luca Di Stefano

Luca Di Stefano

Solution Architect at Würth Phoenix
Hi everyone, I’m Luca, graduated in electrical engineering from the University of Bologna. I am employed by Würth Phoenix since its foundation. I worked mainly as enterprise architect and quality assurance engineer. Previously I was involved in systems measurement and embedded systems programming. I have gained experience on Unix (Solaris, HPUX), Windows, and C, C + +, Java. I personally contribute to the Open Source community as beta tester and developer. During my spare time I love piloting airplanes fly over the beautiful Alps. I practice many sports: tennis, broomball, skiing, alpine skiing, volleyball, soccer, mountain biking, middle distance, none have a sample but the competition excites me! I love hiking, tracking and traveling.

Author

Luca Di Stefano

Hi everyone, I’m Luca, graduated in electrical engineering from the University of Bologna. I am employed by Würth Phoenix since its foundation. I worked mainly as enterprise architect and quality assurance engineer. Previously I was involved in systems measurement and embedded systems programming. I have gained experience on Unix (Solaris, HPUX), Windows, and C, C + +, Java. I personally contribute to the Open Source community as beta tester and developer. During my spare time I love piloting airplanes fly over the beautiful Alps. I practice many sports: tennis, broomball, skiing, alpine skiing, volleyball, soccer, mountain biking, middle distance, none have a sample but the competition excites me! I love hiking, tracking and traveling.

Leave a Reply

Your email address will not be published. Required fields are marked *

Archive