Home » CurrentHeader, Internet

Flawed data in Berkman broadband study

By 19 October 2009 22 Comments

You can find the follow up to this post here.

Much has been made of the Berkman broadband study (October 2009 draft) authored by Yochai Benkler of The Berkman Center for Internet & Society at Harvard University.  The study looks at broadband speeds and prices from around the world and attempts to draw some broad conclusions on policies recommendations.  After looking at the underlying data that the paper bases its conclusions on, it appears that the conclusions in the Berkman study are based on a shaky foundation.

There appear to be multiple problems in the underlying data used by the Berkman study which primarily cites data from the OECD, Ofcom, Speedtest.net and Jwire.  The data relies on reported advertised speeds and prices which are either not directly comparable or sometimes just plain wrong or the study relies on metrics that are arbitrary and useless.

The problem with measuring hotspots

The first problem we see is the use of the questionable metric of public hotspots per 100,000 inhabitants shown in figure 3.15.  The first question is how a hotspot is even defined.  Does the metric define it as a single Wi-Fi radio base station (such data would be difficult to gather), or does it define it by the number of visible Service Set Identifiers (SSID)?  In other words, does a hotspot covering one square mile with 20 radio base stations connected with an extensive infrastructure count the same as a hotspot in a coffee shop hanging off of a single wireless router?  Does a single Wi-Fi base station advertising 10 different virtual SSIDs count 10 times more than a jumbo hotspot using 10 physical Wi-Fi base stations?  If so, this is a very shaky metric because one robust hotspot is worth a lot more than 10 weak ones.  A far more useful metric would be the number of deployed hotspot Wi-Fi base stations per 100,000 inhabitants.

The problem with advertised speeds

The big problem with any advertised speed ranking is that the numbers are often meaningless.  Verizon for example which offers the fastest fiber connection service in the US advertises 50 Mbps but it typically delivers 50 Mbps or sometimes even slightly higher.  Broadband providers in South Korea, Japan, and France which advertise 100 Mbps often deliver less than 50 Mbps for any sustained file transfer yet they are ranked much higher.  From figure A below, we can see that OECD data for average advertised speed in South Korea and Japan are inflated by 5 to 13 times.

Inflated OECD speed rankings

Figure 3.17 in the Berkman study compares the average “speed” offered in each country.  The problem with this is that it is strictly based on advertised bandwidth and not what is actually delivered.  We have seen how inflated the OECD data is when we compare the OECD numbers to test samples from Speedtest.net and real-world data from Akamai.

Note: Akamai measurements are based on downloads of files that people actually use and not just some small chunk of data used by Speedtest.net used to measure performance.  For this reason, Akamai’s data is more representative of the broadband performance that people observe in real life.

Figure A: OECD broadband statistics versus Akamai measurements

Nation OECD 2008 average Mbps down Akamai Q4 2008 average Mbps down OECD Mbps inflation
Japan 93 7.0 13
S. Korea 81 15 5.4
United States 10 3.9 2.6
Sweden 13 5.6 2.3
Netherlands 19 4.9 3.9
France 52 < 4.5 > 11

In Figure A above, we can see just how inflated the OECD data is compared to real-world data from Akamai.  The OECD statistics inflate Japan and France performance by more than 11 to 13 times their actual performance while Sweden and United States is only inflated by 2.3 and 2.6 times.  What this tells us is that the OECD “speed” rankings are completely unreliable and have little to do with reality.  The OECD ranks France at #3 yet France doesn’t even make the top 10 on Akamai’s 2008-Q4 list.

Latency measurements are fundamentally flawed

Another chart that raised some alarm flags was the “average latency” data based on the Berkman Center’s analysis of Speedtest.net data.  As I pointed out last month, the use of latency as a basis of comparison between different nations is fundamentally flawed to begin with because latency within a nation is largely a function of its physical size.  Moreover, the Berkman data appears to be just as arbitrary as the data release by the University of Oxford last month.  The Berkman study puts Japan in the 160 millisecond latency range while the Oxford study put Japan at 51 millisecond latency yet both studies purport to be based on Speedtest.net data.  How two independent studies looking over the same data can get such diverging results is anyone’s guess, but it’s moot since the methodology is useless to begin with.

Problem with cost comparisons

Moving on the broadband cost comparisons, these statistics appear to be flawed in multiple ways.  First, the “average” cost numbers appear to be arbitrary and demonstrably inaccurate.  The second problem is the fact that the data doesn’t account for usage caps which indicate an explicit level of fractional ownership.  The third problem is that it does not account for the cost of “last-mile” infrastructure hidden by rent or home owner association fees.

In figure 3.25, the OECD claims that the average cost for “very high speed tier” in Japan is only $32 a month.  Based on this recent list of prices from NTT, the typical 100 Mbps connection for single unit homes ranges from 5986 yen to 6720 yen which based on current exchange rates is $65.76 to $73.90.  Even if we used a 2008 exchange rate of 109 yen per dollar when the dollar peaked, it would still cost $54.92 to $62.65 which is nothing like the OECD claims.  If we looked at multi-unit apartments and condos, the price is still 3622 yen to 4095 yen.  Using the 2008 peak dollar exchange rates, this converts to $33.23 to $37.57 which is still higher than the OECD claimed average.  It appears that the OECD cost data which the Berkman study relies on is completely unreliable.

The other problem with the Berkman study is that it does not factor in usage caps into the cost of broadband.  Since most of the OECD nations have average usage caps that are 10 times smaller than the United States, the OECD cost comparisons are simply invalid.  It is a fact broadband connections with no explicit usage caps (they all have implicit caps where overusage can result in account termination) or generous usage caps can easily cost twice as much as services with small usage caps.  Furthermore, even ISPs in Japan which have no explicit downstream usage caps are effectively capping their downstream though heavy oversubscription and a heavily capped upstreams with 2.78% duty cycles (30 GB per day upload on 100 Mbps upstream).  Between the fact that advertised speeds in Japan are far more inflated than the US and the fact that much of the cost data is wrong or misleading, the difference between the US and Japan is far smaller than the Berkman study claims.

The last problem with the cost analysis is that it fails to recognize the difference between single unit homes and multi unit apartments and condominiums.  It is possible to get very cheap 100 Mbps access for $30 in apartment complexes in American cities like San Francisco but none of these prices should be compared to single unit homes.  That’s because a significant portion of the “last mile” infrastructure is borne by the rent paid by tenants.  If we put cheap broadband in the context of paying $2300/month for a small 900 square feet apartment, paying $30 for heavily shared 100 Mbps Ethernet service that effectively performs at 30 Mbps doesn’t seem like such a great deal any more.  By comparison, the suburban customer paying $2000/month for a 2500 square foot home and $70 a month for a true 30 Mbps broadband access isn’t getting such a bad deal.


The underlying data cited by Berkman study is simply too flawed to be of any use.  And because the study bases its conclusions on flawed data, the conclusions drawn in the Berkman broadband study are equally unreliable.

Brett Swanson also provides great analysis of the Berkman study.

You can find the follow up to this post here.