Tuesday, February 25, 2014

Revisiting the "Cookieless Domain" Recommendation

For a long time one of the recommendations for a faster site has been to “serve your static content from a cookieless domain”.  This suggestion shows up in the Google best practices, the Yahoo! performance rules, and in Phil Dixon’s famous Shopzilla case study from Velocity 2009, where he states that implementing this one best practice resulted in a 0.5% improvement in top line revenue.  Case closed, right?  Well, due to some recent experimentation we have been doing at Etsy, we had reason to believe that people might be taking this too far.  Specifically, our testing indicated that serving CSS files from the same domain as the base page might be a performance win.  

Why is this a Good Idea?

CSS blocks page rendering, because if it didn’t then users would get the dreaded flash of unstyled content.  This is the reason why another performance recommendation is to put CSS in the head of the document, and to combine CSS files and minify them.  Getting CSS to start downloading as quickly as possible and downloading as little of it as possible is critical to a fast experience.  Here’s the rub: when you put CSS (a static resource) on a cookieless domain, you incur an additional DNS lookup and TCP connection before you start downloading it.  Even worse, if your site is served over HTTPS you spend another 1-2 round trips on TLS negotiation. Since the CSS blocks the rest of the page resources, this is extremely expensive.  In this case the entire page is waiting on DNS, TCP, and TLS before anything else happens.  On high latency networks this can take hundreds of milliseconds, and will ruin any chance of breaking the 1000ms time to glass mobile barrier.  This is why in that video Ilya suggests inlining the critical CSS that you need to render the page. 

The Current State of Affairs

All of this begs the question: “What is the current state of things?  Do big sites already know this, and put their CSS on the root domain?”  I set out to answer this question, and empirically prove which approach wins out - putting CSS on the root domain and incurring the overhead of some extra cookies, or putting it on a cookieless domain and suffering the extra DNS lookup and TCP connection.  I started by surveying the top 10 US Alexa sites (Google, Facebook, YouTube, Yahoo!, Wikipedia, Twitter, LinkedIn, Amazon, Wordpress, and eBay) to see where they stand.

Almost universally, the top sites (who are obeying this best practice and others) are putting their CSS on a cookieless domain.  Google is an outlier because the search page is simple and all CSS is inlined, most likely for the reasons that Ilya outlines in the video above.  Among the rest, Wordpress is the only site that isn’t serving static assets from a cookieless domain, and it’s not clear if this is a deliberate choice to improve performance, or if it is just to reduce complexity.  Armed with this knowledge, I set up a test to see if these sites could be hurting their performance with this approach.

Experimentation

To make the experiment as realistic as possible, I selected five sites from the ones above to test, and made sure that they all had real content on their homepages as opposed to just a login splash page:
  1. Amazon
  2. YouTube
  3. Yahoo!
  4. eBay
  5. Wikipedia
I wanted to eliminate as many external factors as possible, like server side load time, different locations, and network variations, so my methodology was the following:
  1. Save the HTML from each of the sites above to a VM that I control. The server is the smallest DigitalOcean VM, running nginx on CentOS 6.4.
  2. Ensure that all static resources are still coming from the original domains.
  3. Run each site through 9 WebPagetest runs using Chrome (first view only) with a cable connection, and take the median run as the “before” time.
  4. For the “after” times, manually download all of the CSS referenced by each site to my server, and reference it relatively (e.g. /css/amazon1.css).  Ensure that the same number of CSS files are being downloaded, and that the sites still look identical to the “before” runs.
  5. Use nginx to set a 500 byte cookie, to simulate the downside of having static assets on a cookied domain.  
  6. Run 9 more WebPagetest runs for each site.
This approach gives a pretty clear comparison between serving CSS from the root domain and from a cookieless domain.  I selected 500 bytes as the cookie size because it loosely matches the site that was setting the largest cookies from my test population (eBay).  The other four sites set significantly fewer cookie bytes.  Your results may vary if your site sets many kilobytes of cookies, but then again if that’s the case perhaps you should consider reducing the number of cookies you set. One approach that works well is to set a unique identifier in the cookie and store the data on the server, so you don’t need to ship those bytes back and forth on every request.

Results

The raw data is at the bottom of the post, but the results are conclusive for this particular test - putting CSS on the root domain is a clear win for all sites that I tested.  Here is the average improvement for the metrics that I measured:

Metric
Percentage Improvement
Load Time
10.3%
Start Render
27.8%
20.8%

The tests showed a significant decrease in both Start Render Time and Speed Index after moving CSS to the root domain, and the latter metric is rapidly becoming the go-to option for synthetic tests.  This is a huge improvement for an extremely small change.  It’s worth pointing out that the CSS file(s) on the root domain should still be cached at your CDN.  Even if you aren’t caching HTML, you can still put your root domain through a CDN and configure it to only cache certain content types (like text/css).  There is another benefit from doing this: putting your root domain through a CDN allows you to terminate client TCP/TLS connections at the edge, dramatically reducing the latency that your users experience.  

Moving your CSS to the root domain has other benefits in the world of SPDY/HTTP 2.0.  Header compression means that the cost of extra cookies is much lower, and HTTP multiplexing allows you to reuse TCP connections for multiple resources on the same domain.  As long as you keep your cookies to a reasonable size (ideally well under a kilobyte), they won't cause an extra round trip, and you will be much better off having your critical resources on the same domain as the base page, with or without SPDY.  

These results could be further validated by re-running this test with different network characteristics, browsers, and locations, but the numbers are so large that I would expect a similar directional change regardless of the specific test configuration.  The fact that I ran this on a cable connection is relevant as well - for people on mobile networks this should have an even bigger impact, since DNS lookups and TCP connections take much longer when latency is high.  I'm hoping to validate these assumptions with real user monitoring data soon.  

Conclusion

First of all, I want to be clear that this experiment only looked at the impact of moving CSS to the root domain.  It is almost certainly the case that keeping images on a cookieless domain is a performance win.  Images don’t block render, there are typically a lot of them on a page, and having them on another domain can make CDN configuration easier.  When it comes to JavaScript, assuming you are deferring JavaScript in some way, having it on a cookieless domain is probably a good thing as well.  If your JS is loaded via blocking script tags in the head, you might want to experiment with putting it on the root domain (or just move it out of the head).  

With that caveat out of the way, I believe that this experiment conclusively shows that every site currently loading its CSS from a cookieless domain should test loading it from the root domain, to see if it improves performance and business metrics.  Synthetic tests are useful, but it would be great to see some real world data on what kind of impact this has on actual users. There could be implementation details that made this test look more effective than it will be on actual sites, and Etsy will hopefully be publishing data on how this impacts production traffic in the near future.

To me this shows that we have to think logically about the rules we are following.  In this particular case we are dealing with two valid best practices: reduce the number of bytes on the wire, and reduce the number of DNS lookups on your page.  In cases like this where there are tradeoffs to be made, it’s worth testing which approach works best for your content.

Raw Data

AmazonBeforeAfterDifference% Difference
Load Time1.2820.956-0.32625.43%
First Byte0.110.1150.005-4.55%
Start Render0.6930.593-0.114.43%
Speed Index922706-21623.43%

YouTubeBeforeAfterDifference% Difference
Load Time1.1031.3840.281-25.48%
First Byte0.1120.109-0.0032.68%
Start Render1.0860.695-0.39136.00%
Speed Index71766406-77010.73%

Yahoo!BeforeAfterDifference% Difference
Load Time2.8282.33-0.49817.61%
First Byte0.1150.1170.002-1.74%
Start Render1.3910.797-0.59442.70%
Speed Index16331308-32519.90%

eBayBeforeAfterDifference% Difference
Load Time9.228.528-0.6927.51%
First Byte0.1190.118-0.0010.84%
Start Render1.190.889-0.30125.29%
Speed Index49353453-148230.03%

WikipediaBeforeAfterDifference% Difference
Load Time0.8940.659-0.23526.29%
First Byte0.1220.11-0.0129.84%
Start Render0.9930.787-0.20620.75%
Speed Index1000800-20020.00%

Tuesday, December 10, 2013

Reducing Domain Sharding

This post originally appeared on the Perf Planet Performance Calendar on December 7th, 2013.

Domain sharding has long been considered a best practice for pages with lots of images.  The number of domains that you should shard across depends on how many HTTP requests the page makes, how many connections the client makes to each domain, and the available bandwidth.  Since it can be challenging to change this dynamically (and can cause browser caching issues), people typically settle on a fixed number of shards - usually two.  

An article published earlier this year by Chromium contributor William Chan outlined the risks of sharding across too many domains, and Etsy was called out as an example of a site that was doing this wrong.  To quote the article: “Etsy’s sharding causes so much congestion related spurious retransmissions that it _dramatically_ impacts page load time.”  At Etsy we're pretty open with our performance work, and we’re always happy to serve as an example.  That said, getting publicly shamed in this manner definitely motivated us to bump the priority of reinvestigating our sharding strategy.  

Making The Change

The code changes to support fewer domains were fairly simple, since we have abstracted away the process that adds a hostname to an image path in our codebase.  Additionally, we had the foresight to exclude the hostname from the cache key at our CDNs, so there was no risk of a massive cache purge as we switched which domain our images were served on.  We were aware that this would expire the cache in browsers, since they do include hostname in their cache key, but this was not a blocker for us because of the improved end result.  To ensure that we ended up with the right final number, we created variants for two, three, and four domains.  We were able to rule out the option to remove domain sharding entirely through synthetic tests.  We activated the experiment in June using our A/B framework, and ran it for about a month.

Results

After looking at all of the data, the variant that sharded across two domains was the clear winner.  Given how easy this change was to make, the results were impressive:

  • 50-80ms faster page load times for image heavy pages (e.g. search), 30-50ms faster overall.
  • Up to 500ms faster load times on mobile.
  • 0.27% increase in pages per visit.

As it turns out, William’s article was spot on - we were sharding across too many domains, and network congestion was hurting page load times.  The new CloudShark graph supported this conclusion as well, showing a peak throughput improvement of 33% and radically reduced spurious retransmissions:

Before - Four Shards


After - Two Shards

Lessons Learned

This story had a happy ending, even though in the beginning it was a little embarrassing.  We had a few takeaways from the experience:

  • The recommendation to shard across two domains still holds.
  • Make sure that your CDN is configured to leave hostname out of the cache key.  This was key to making this change painless.
  • Abstract away the code that adds a hostname to an image URI in your code.
  • Measure everything, and question assumptions about existing decisions.
  • Tie performance improvements to business metrics - we were able to tell a great story about the win we had with this change, and feel confident that we made the right call.
  • Segment your data across desktop and mobile, and ideally international if you can.  The dramatic impact on mobile was a huge improvement which would have been lost in an aggregate number.  

Until SPDY/HTTP 2.0 comes along, domain sharding can still be a win for your site, so long as you test and optimize the number of domains to shard across for your site.

Sunday, November 24, 2013

Pushupdate 2013

This is a follow-up to my "1,000,000 Push-ups" post.  Check that out for the reasoning behind this challenge.

November 19th was my birthday, and it also marked the end of the second year of my push-up challenge.  For the last year my goal was to average 100 push-ups per day, and I finished five push-ups ahead of that goal, with a running total of 54,805 push-ups.  This isn't a magical coincidence, there's a strong rubber banding effect with my color coded spreadsheet.  When I get behind, I do more push-ups to catch up.  When I get ahead, I know that I can take a day off if I get tired or lazy.  This kept me hovering right around my target throughout the year.  

If you look at the "Year 2" tab in the doc, you will see that there was a lot more yellow than in year one.  This is largely because I got sick at the end of April, and had two consecutive days with zero push-ups.  It took me quite a while to catch back up, because I didn't have any time pressure to do so.  Why is all of this relevant?  Because it has taught me a few things about my motivation and how I should structure the challenge going forward:

  1. I really only care about finishing the year on target, being slightly behind in the middle of the year doesn't bother me or motivate me all that much.
  2. I am unlikely to do more push-ups than the sheet requires, so thinking that I will get ahead naturally is silly.
  3. Everything is a lot easier if I spread the push-ups out during the day, instead of jamming them all in right before I go to sleep.

Because of these learnings, and specifically #2, I've decided to bump my goal for year 3 to 125 push-ups per day.  This will allow me to finish the year at over 100,000 push-ups, which is 10% of my goal.  If I maintain the 125 push-up average going forward, I will finish the challenge a little over two years early.  I also might continue to raise the daily total in future years, so I can shave a little more time off the end, and potentially finish in 20 years instead of 25.  

The beginning of this year was a bit of a struggle, but towards the end I got much better about doing my push-ups during the day, which had a dramatic positive impact on my attitude while doing them.  Since I work from home this is fairly easy to do, and I'm going to continue to push for that going forward (pun definitely intended).  

Questions?  Comments?  Ideas for future updates?  Let me know, below.  

Wednesday, November 13, 2013

A New Source For WPO Resources

** Update 12/03/13 ** - This article was deleted by Wikipedia moderators.  If you want the full background on why, check out the deletion talk page.  I have just moved the article to the Web Platform Wiki, which is a more appropriate place for it.

After my last post with its abundance of footnotes, some people asked for a place on the web where they can see an aggregated list of performance resources.  This has been tried in the past, and I believe that it has failed because it has always relied on a single person or entity to update the list.  Ideally we would like to have something that fulfills the following goals:

  • Hosted by a non-partial third party
  • Anyone can update - no single point of failure
  • An existing site that people trust, so we don't have to reinvent the wheel
  • A site with a bright future - we don't want this to be obsolete in a few months

This is starting to sound vaguely familiar...

That's right, the new home (hopefully) for Web Performance Optimization resources is Wikipedia.  This kind of article has some precedence, and is modeled after this List of JavaScript Libraries.  

I put together the initial resources on this page this morning, so it is far from exhaustive.  This is the beauty of Wikipedia - anyone can improve this list, and there will be no single "maintainer".  Go edit it now!  I've linked to this new list from the main WPO article, and my hope is that people will continue to keep it up to date.  I think this is something that our industry needs: a place to point newcomers where they can find links to all of the great content that our community produces.  

That's it - have at it.

Sunday, November 3, 2013

We Have a Long Way To Go

When you work on web performance every day, it is easy to assume that the best practices are widely known, widely understood, and widely followed.  Unfortunately, as I have learned through a couple of different experiences recently, this is not the case.  It is frankly astounding how many websites are still failing to implement the best practices that Steve Souders outlined in his book "High Performance Websites" over six years ago.  Six years is an eternity when it comes to web development, but I still regularly meet professional software engineers who are surprised when they hear about the rules that the WPO community evangelizes on a daily basis.

First, The Data

Let's try to scope the problem.  Using data from the HTTP Archive, we know that:

Via Radware we know that pages are getting bigger and more complex, and that adoption of best practices is inconsistent and fairly weak.  At this point people are quick to point to the Google study showing that the web is in fact getting faster, but this is primarily due to browser and network improvements, not due to website improvements.  To quote the study, "it is still impressive given that the size of the web pages have increased by over 56% during this period".  How much faster would the web be if we had browser, network, and content improvements?

I mentioned that I have had a few experiences recently that led me to write this post.  First, I was recently speaking on a panel at a "Web Performance Day" for portfolio companies of a prominent VC firm in San Francisco.  Steve Souders was giving the opening talk, and he opened with a scary comment.  He mentioned that as he was putting his talk together, he surfed around some of the portfolio company websites so he could target his material to the audience members.  He found that in the vast majority of cases, even the most basic optimizations were not being done.  He was forced to alter his talk and give a "Performance 101" talk during which he dredged up slides from decks that he presented over 5 years ago.  Yikes.

The second experience was during a recent episode of JavaScript Jabber.  Alex MacCaw was talking about how he optimized monocle.io, and he mentioned Google's PageSpeed Insights tool.  The tool was new to pretty much everyone on the show, and was a lot of "wow, this is so cool!".  Remember, these are professional front-end web developers that are recording a podcast about JavaScript and front-end best practices in their free time.  If they haven't heard of PageSpeed, and are getting surprised by some of the checks that it exposes, then we have a TON of work to do.

What Should We Expect?

Perhaps I am an idealist, but I think we should be able to get to 95%+ compliance with gzip.  I don't expect every site to sprite all of their images (a best practice that SPDY/HTTP 2.0 will negate anyway), but I do expect every single site on the internet to figure out how to compress text resources.  It's probably the easiest, most impactful, most well supported, and safest optimization that you can do.

This is just an example, but I think it is indicative of the problem, because we're only at 77% compliance with it.  There are enormous benefits with literally zero downsides, and we still can't get a quarter of the sites on the internet to do it (and don't talk to me about CPU usage on shared hosting - if a host doesn't let you turn on gzip you should find another host).

The more concerning thing for me is that people expect new standards and browser improvements to come in and save us, like the aforementioned HTTP 2.0.  While I'm extremely excited about HTTP 2.0, I worry that we will run into the same problem that we have today with gzip: it will take years to get up to a reasonable adoption point, even after the browser support is there.

When it comes to "what we should expect", I think understanding the core technologies behind the web and how to leverage them to build fast applications is a requirement for every single engineer.  After all, speed is more than a feature, it's the most important feature.  That's from the guy holding the purse strings, so it's worth taking to heart.  If someone can't easily explain at least 3-5 performance best practices, to me that's equivalent to not being able to write a line of JavaScript.  In other words, inexcusable if you work on the web as a developer.

Are There Enough Resources For Developers?

Yes.  Perhaps we could do a better job about getting them in front of people early and often, but the[1] number[2] of[3] tools[4] and[5] resources[6] is[7] staggering[8] (I[9] could[10] do[11] this[12] all[13] day[14]).  I'm obviously extremely biased, since I work on the performance team for one of the top technology companies on the web, and one that happens to care a lot about performance, but these resources are not hard to find.  In addition, with the recent healthcare.gov fiasco, web performance is much more in the public eye, which should provide even more motivation for software engineers to get on the bandwagon.

How Do We Fix This?

If you have made it this far, hopefully you believe that there is a problem and that it should be fixed.  The easy answer is to do more of what we are already doing - speaking, blogging, writing books, hosting meetups, and publishing case studies.  This is all great, and we should do more of it, but it's clearly not working as well as we might like.  I'd like to propose a couple of concrete things we could do differently that might help get the word out about performance:

  • Every tech conference should have at least one talk that focuses on performance.
  • Every company with an engineering blog should publish a post about performance (and ideally how it affects their bottom line).
  • Every college with a CS department should offer a course that focuses on web performance (or at least part of a course).  Kudos to Stanford for doing this.
  • We should write more books with the "High Performance" prefix.  Steve Souders has mentioned in the past that he would like to see this, and a few have come out recently (like Ilya's excellent book), but we should do even more.  I'd love to see a "High Performance Ruby on Rails" book, as one example.
If you have the ability to influence any of the points above, please do so.  With our mobile future rapidly approaching, performance is more important than ever.  We will be stuck with 3G and 4G for decades, so networks aren't going to save us.  There's a limit to what browsers can do given certain latency and bandwidth constraints, and when you look at how quickly page complexity is rising, it's not hard to imagine the web getting slower and slower over time.  That's not a future that I want, so let's work together to build the tools, resources, and infrastructure that's necessary to make the web faster.

[1] Google Best Practices
[2] WebPagetest Forums
[3] Yahoo Performance Rules
[4] Web Performance Today Best Practices
[5] Perf Planet Advent Calendar
[6] High Performance Browser Networking
[7] High Performance JavaScript
[8] Velocity Conference
[9] HTTP Archive
[10] The Top 22 Web Performance Posts of 2013 (New Relic)
[11] Web Performance Today Podcasts
[12] Even Faster Websites
[13] YSlow
[14] Compuware Ajax Edition