Tuesday, February 25, 2014

Revisiting the "Cookieless Domain" Recommendation

For a long time one of the recommendations for a faster site has been to “serve your static content from a cookieless domain”.  This suggestion shows up in the Google best practices, the Yahoo! performance rules, and in Phil Dixon’s famous Shopzilla case study from Velocity 2009, where he states that implementing this one best practice resulted in a 0.5% improvement in top line revenue.  Case closed, right?  Well, due to some recent experimentation we have been doing at Etsy, we had reason to believe that people might be taking this too far.  Specifically, our testing indicated that serving CSS files from the same domain as the base page might be a performance win.  

Why is this a Good Idea?

CSS blocks page rendering, because if it didn’t then users would get the dreaded flash of unstyled content.  This is the reason why another performance recommendation is to put CSS in the head of the document, and to combine CSS files and minify them.  Getting CSS to start downloading as quickly as possible and downloading as little of it as possible is critical to a fast experience.  Here’s the rub: when you put CSS (a static resource) on a cookieless domain, you incur an additional DNS lookup and TCP connection before you start downloading it.  Even worse, if your site is served over HTTPS you spend another 1-2 round trips on TLS negotiation. Since the CSS blocks the rest of the page resources, this is extremely expensive.  In this case the entire page is waiting on DNS, TCP, and TLS before anything else happens.  On high latency networks this can take hundreds of milliseconds, and will ruin any chance of breaking the 1000ms time to glass mobile barrier.  This is why in that video Ilya suggests inlining the critical CSS that you need to render the page. 

The Current State of Affairs

All of this begs the question: “What is the current state of things?  Do big sites already know this, and put their CSS on the root domain?”  I set out to answer this question, and empirically prove which approach wins out - putting CSS on the root domain and incurring the overhead of some extra cookies, or putting it on a cookieless domain and suffering the extra DNS lookup and TCP connection.  I started by surveying the top 10 US Alexa sites (Google, Facebook, YouTube, Yahoo!, Wikipedia, Twitter, LinkedIn, Amazon, Wordpress, and eBay) to see where they stand.

Almost universally, the top sites (who are obeying this best practice and others) are putting their CSS on a cookieless domain.  Google is an outlier because the search page is simple and all CSS is inlined, most likely for the reasons that Ilya outlines in the video above.  Among the rest, Wordpress is the only site that isn’t serving static assets from a cookieless domain, and it’s not clear if this is a deliberate choice to improve performance, or if it is just to reduce complexity.  Armed with this knowledge, I set up a test to see if these sites could be hurting their performance with this approach.

Experimentation

To make the experiment as realistic as possible, I selected five sites from the ones above to test, and made sure that they all had real content on their homepages as opposed to just a login splash page:
  1. Amazon
  2. YouTube
  3. Yahoo!
  4. eBay
  5. Wikipedia
I wanted to eliminate as many external factors as possible, like server side load time, different locations, and network variations, so my methodology was the following:
  1. Save the HTML from each of the sites above to a VM that I control. The server is the smallest DigitalOcean VM, running nginx on CentOS 6.4.
  2. Ensure that all static resources are still coming from the original domains.
  3. Run each site through 9 WebPagetest runs using Chrome (first view only) with a cable connection, and take the median run as the “before” time.
  4. For the “after” times, manually download all of the CSS referenced by each site to my server, and reference it relatively (e.g. /css/amazon1.css).  Ensure that the same number of CSS files are being downloaded, and that the sites still look identical to the “before” runs.
  5. Use nginx to set a 500 byte cookie, to simulate the downside of having static assets on a cookied domain.  
  6. Run 9 more WebPagetest runs for each site.
This approach gives a pretty clear comparison between serving CSS from the root domain and from a cookieless domain.  I selected 500 bytes as the cookie size because it loosely matches the site that was setting the largest cookies from my test population (eBay).  The other four sites set significantly fewer cookie bytes.  Your results may vary if your site sets many kilobytes of cookies, but then again if that’s the case perhaps you should consider reducing the number of cookies you set. One approach that works well is to set a unique identifier in the cookie and store the data on the server, so you don’t need to ship those bytes back and forth on every request.

Results

The raw data is at the bottom of the post, but the results are conclusive for this particular test - putting CSS on the root domain is a clear win for all sites that I tested.  Here is the average improvement for the metrics that I measured:

Metric
Percentage Improvement
Load Time
10.3%
Start Render
27.8%
20.8%

The tests showed a significant decrease in both Start Render Time and Speed Index after moving CSS to the root domain, and the latter metric is rapidly becoming the go-to option for synthetic tests.  This is a huge improvement for an extremely small change.  It’s worth pointing out that the CSS file(s) on the root domain should still be cached at your CDN.  Even if you aren’t caching HTML, you can still put your root domain through a CDN and configure it to only cache certain content types (like text/css).  There is another benefit from doing this: putting your root domain through a CDN allows you to terminate client TCP/TLS connections at the edge, dramatically reducing the latency that your users experience.  

Moving your CSS to the root domain has other benefits in the world of SPDY/HTTP 2.0.  Header compression means that the cost of extra cookies is much lower, and HTTP multiplexing allows you to reuse TCP connections for multiple resources on the same domain.  As long as you keep your cookies to a reasonable size (ideally well under a kilobyte), they won't cause an extra round trip, and you will be much better off having your critical resources on the same domain as the base page, with or without SPDY.  

These results could be further validated by re-running this test with different network characteristics, browsers, and locations, but the numbers are so large that I would expect a similar directional change regardless of the specific test configuration.  The fact that I ran this on a cable connection is relevant as well - for people on mobile networks this should have an even bigger impact, since DNS lookups and TCP connections take much longer when latency is high.  I'm hoping to validate these assumptions with real user monitoring data soon.  

Conclusion

First of all, I want to be clear that this experiment only looked at the impact of moving CSS to the root domain.  It is almost certainly the case that keeping images on a cookieless domain is a performance win.  Images don’t block render, there are typically a lot of them on a page, and having them on another domain can make CDN configuration easier.  When it comes to JavaScript, assuming you are deferring JavaScript in some way, having it on a cookieless domain is probably a good thing as well.  If your JS is loaded via blocking script tags in the head, you might want to experiment with putting it on the root domain (or just move it out of the head).  

With that caveat out of the way, I believe that this experiment conclusively shows that every site currently loading its CSS from a cookieless domain should test loading it from the root domain, to see if it improves performance and business metrics.  Synthetic tests are useful, but it would be great to see some real world data on what kind of impact this has on actual users. There could be implementation details that made this test look more effective than it will be on actual sites, and Etsy will hopefully be publishing data on how this impacts production traffic in the near future.

To me this shows that we have to think logically about the rules we are following.  In this particular case we are dealing with two valid best practices: reduce the number of bytes on the wire, and reduce the number of DNS lookups on your page.  In cases like this where there are tradeoffs to be made, it’s worth testing which approach works best for your content.

Raw Data

AmazonBeforeAfterDifference% Difference
Load Time1.2820.956-0.32625.43%
First Byte0.110.1150.005-4.55%
Start Render0.6930.593-0.114.43%
Speed Index922706-21623.43%

YouTubeBeforeAfterDifference% Difference
Load Time1.1031.3840.281-25.48%
First Byte0.1120.109-0.0032.68%
Start Render1.0860.695-0.39136.00%
Speed Index71766406-77010.73%

Yahoo!BeforeAfterDifference% Difference
Load Time2.8282.33-0.49817.61%
First Byte0.1150.1170.002-1.74%
Start Render1.3910.797-0.59442.70%
Speed Index16331308-32519.90%

eBayBeforeAfterDifference% Difference
Load Time9.228.528-0.6927.51%
First Byte0.1190.118-0.0010.84%
Start Render1.190.889-0.30125.29%
Speed Index49353453-148230.03%

WikipediaBeforeAfterDifference% Difference
Load Time0.8940.659-0.23526.29%
First Byte0.1220.11-0.0129.84%
Start Render0.9930.787-0.20620.75%
Speed Index1000800-20020.00%