More capacity doesn’t solve congestion
Since the early days of computer networking, there has been a long standing debate as to whether we can grow our way out of network congestion problems with more network capacity. It’s a complicated topic and even technically savvy people confuse a lot of the network engineering concepts because it is such a complex and field-specific topic. This article will attempt to simplify this complicated issue and debunk many of the misconceptions about capacity versus congestion. It will illustrate why capacity isn’t a substitute for intelligent network management any more than the inverse is true.
The myth that more network capacity eliminates congestion
It is a myth that additional network capacity eliminates congestion. Additional capacity can reduce the frequency and duration of congested states if and only if the workload on the network remains constant, but even then it does not eliminate congestion and that simply isn’t good enough for real-time applications like Voice over Internet Protocol (VoIP) or online gaming applications. But the reality is that workloads don’t remain constant and the demand from users of the network has historically grown to match the rate of capacity increases.
Figure 1 – Congestion reduction through increased bandwidth (assuming same workload)
Figure 1 shows congestion generated by the file transfer of a 1.25 megabyte file. On a 2.5 Mbps network, it takes about 4 seconds to transfer using a bursty protocol such as HTTP (used by web browsers), Peer-to-Peer (P2P), or the ancient File Transfer Protocol (FTP) which are designed to burst at maximum throughput. That means the network is completely congested for a period of 4 seconds. During that 4 seconds of time, some real-time applications like online gaming or Voice over IP (VoIP) will stutter and drop frames or audio.
But what happens if we quadruple the network performance to a 10 Mbps network? That same 1.25 megabyte file transfer will fully congest that network for roughly 1 second. During that 1 second of time, some real-time applications like online gaming or Voice over IP (VoIP) will stutter and drop frames or audio. If we boost the network to 40 Mbps, the same 1.25 megabyte file transfer will congest the network for 250 milliseconds (one quarter of a second) which is still enough to cause dropped audio or performance problems in an online game. Reduction of congestion isn’t good enough, a proper network management system will completely eliminate the harmful effects of congestion while ensuring good performance for all applications.
Realistically we’re not just dealing with 1.25 megabyte files, they’re thousands of times larger in real life which means even the 40 Mbps can be congested for 250 seconds. The idea that we can eliminate congestion through increased network capacity is simply a fallacy even if the file sizes stay the same as network capacity increases. But in reality, file sizes don’t stay the same as network performance increases. The faster the network, the larger the files people try to transfer which means the congestion time isn’t even reduced much less eliminated.
There was a time many of us though 0.096 Mbps was blazing fast and we were willing to pay $700 for the 9600 baud modems that offered this kind of “speed”. There will also be a time in our lifetimes when we will think that 10 gigabit networks are pathetically slow. It’s really no different than how computers are always too slow and storage devices are always too small even though they’ve improved nearly a million fold over the last 20 years.
The moral of the story is that applications since the beginning of the Internet have been designed to operate at maximum speed which by definition causes congestion. Congestion will always be with us and it will always need to be managed. Can we artificially limit applications to prevent them from using up all of the available bandwidth? In theory yes but it would be difficult to implement, reduce application performance, and it would not be effective at eliminating congestion.
For one thing, how would an application know what the maximum safe rate is when condition states change on the network due to the number of active users and applications? And why would we want to artificially slow applications down just so to satisfy some irrational obsession with a dumb network? The reality is that we want applications run as fast as possible and let the network intelligently manage congestion. The other problem is that limiting the bandwidth of applications is that it doesn’t actually eliminate congestion, and this is explained in the next section.
The myth of zero congestion during low network utlization
A commonly held misconception is that congestion only occurs when network utilization is close to 100% utilization. Even some network engineers may be lulled into believing that there isn’t any congestion because nearly all network monitoring tools are limited to low-resolution graphs that merely show average utilization across 1 to 10 second intervals. This produces some conceptually simple graphs that show network congestion at some fine-grain percent utilization number like 26% or 70% utilization, but it hides the congested state of the network. The reality isn’t so simple because packet switching networks like the Internet or Local Area Network (LAN) in the office or home only have two states; 0% and 100%. The 1-second or 10-second graphs show the average utilization levels between these states of full which provides a convenient abstraction to visualize, but there are no actual in-between states at the packet level.
Figure 2 illustrates how congestion is hidden by the typical low resolution graphs that most network monitoring tools show. The top graph looks like a perfectly healthy uncongested network when we’re only observing congestion at the 1-second resolution level, but congestion is revealed as we get a finer grain horizontal axis which represents time.
Figure 2 – The deception of low-resolution network utilization graphs
We can usually get away with a single packet delay even for the most delay-sensitive real-time applications, but things become problematic at 30 to 60 milliseconds and above. The 62.5 millisecond resolution graph (1/16th of a second) in figure 2 is probably one of the more crucial charts to observe problematic micro-congestion called “jitter”. Anything above 50% level in the 62.5 millisecond resolution graph is entering the problematic stage.
Figure 3 shows all of this relates to real-world measured micro-congestion where just 20% network utilization can cause 50 to 60 milliseconds of jitter which can cause substantial problems for VoIP or any other real-time application. Note that the higher bars in figure 3 translate to wider bars in the packet-level resolution network utilization chart in figure 2.
Figure 3 – uTorrent 2.0 download traffic causes massive jitter
Source: Analysis of BitTorrent uTP congestion avoidance
Solutions to congestion
There is a technical solution to this problem and the answer lies in intelligent network management where packets are intelligently forwarded to benefit all applications. This is crucial if we want a more efficient network that is truly neutral and conducive to real-time applications of the present and future.