Longer URL’s wasting 74Mbit/s of bandwidth ?
This should get the attention of all the web-developers and administraors out there (ours too !). John Buswell writes “Overlooking the obvious, web developers and content management systems using long file and path names are wasting bandwidth. While these are handy to identify and manage objects, they increase the length of HTTP requests, force extra processing on Application Delivery Controllers and waste bandwidth. We look at simple and yet creative ways to reduce bandwidth costs for high traffic sites.”
In his article he looks at techniques a web developer can use to improve the performance of their high traffic web site and how long URL’s are wasting bandwidth.
To understand this problem we need to understand the basics:
Basics: The HTTP GET Request
When a client visits a web site, regardless of whether they click on a link or type the URL directly into their browser, they generate what is called a GET request. The initial GET request pulls down the HTML file, which is then processed, and subsequent links generate new HTTP GET requests, such as CSS, Javascript or Images. The longer the PATH to those links, the larger the HTTP GET request will be. Using telnet we can simulate a HTTP GET request. Here is an example to o3magazine.com.
- telnet www.o3magazine.com 80
- Trying 38.106.106.237…
- Connected to www.o3magazine.com (38.106.106.237).
- Escape character is ‘^]’.
- GET /index.html HTTP/1.1
- host: www.o3magazine.com
- HTTP/1.1 200 OK
- Server: nginx/0.6.35
- Date: Fri, 27 Mar 2009 00:30:37 GMT
- Content-Type: text/html
- Content-Length: 15954
- Last-Modified: Fri, 27 March 2009 00:30:06 GMT
- Connection: keep-alive
- Accept-Ranges: bytes
- …. HTML document is returned ….
So the GET request consists of GET /URL HTTP/1.1. For o3magazine, the next GET request would be for /c/0.css. o3magazine already optimized its filenames, so it will produce a smaller GET request. Compare that to say techcrunch.com, the next GET request would be for /wp-content/themes/techcrunchmu/style.1238108540.css. Assuming UTF-8 encoding, and standard ASCII characters then each character is represented by one byte. The shorter o3magazine GET request will use four bytes for the GET and whitespace, then eight bytes for the /c/0.css and nine bytes for the trailing information. This is a grand total of 21 bytes. The longer techcrunch request will share the GET and trailer, so thats thirteen bytes, then it has 52 bytes for its request. This is a grand total of 65 bytes, over three times the size of the shorter request.
You may be thinking its only 65 bytes, who cares in these days of 2TB hard disk drives and >20 Mbps internet. Read on to find out what happens next when the author applies his logic to techcruch.com and Facebook.
Factoring in the masses
Using data from compete.com, techcrunch gets at least 7,650,594 visits a month. This is roughly 246,793 per day. So in a single day, techcrunch has wasted 2,531MB downstream, and 603MB of upstream. Over the space of a month, Techcrunch has wasted 78,461MB and 18,693MB in unnecessary data transfer. That is approximately 232k/sec of bandwidth (sustained). No big deal right? Its just 232k/sec.
Perhaps a valid point until the concept is applied to the 100+ character URL happy Facebook home.php page. There are roughly 150 source file references on this page, and rounding down to about 100 HREF requests for arguments sake. Being generous, assuming 80 bytes of waste per URL. Thats 12000 bytes of upstream, and 20000 bytes of downstream waste. So using data again from compete.com, facebook has 1,273,004,274 visits per month. This is roughly 41,064,654 requests per day. So on a single day, the folks over at facebook have wasted roughly 783GB downstream and 469GB upstream. This works out to be 74Mbit/sec downstream and 44MBit/sec upstream of bandwidth.
Read the full article here
Please join us in our Forums
Follow us on Twitter
For the latest updates from ABT, please join our RSS News Feed