Thursday, July 19, 2012

Measuring CDN Performance With Real Users

This is cross posted on the Wayfair Engineering Blog

A couple of weeks ago I ran a test with WebPagetest that was designed to quantify how much a CDN improves performance for users that are far from your origin.  Unfortunately, the test indicated that there was no material performance benefit to having a CDN in place.  This conclusion sparked a lively discussion in the comments and on Google+, with the overwhelming suggestion being that Real User Monitoring data was necessary to draw a firm conclusion about the impact of CDNs on performance.  To gather this data I turned to the Insight product and its "tagging" feature.

Before I get into the nitty-gritty details I'll give you the punch line: the test with real users confirmed the results from the synthetic one, showing no major performance improvement due to the use of a CDN.

Implementation Details: 

Prior to this test we served our static content (CSS, JS, and images) from three domains:

common.csnimages.com
common1.csnimages.com
common2.csnimages.com


The first domain is where our CSS/JS comes from, and the other two are domains that we shard images across to boost parallel downloading.  To effectively test performance without a CDN we had to setup some new subdomains.  Luckily we already had two other subdomains set up from previous testing.  We configured these subdomains to always hit origin:

common3.csnimages.com
common4.csnimages.com


To ensure that this test was comparing apples to apples, I switched all requests hitting common.csnimages.com to common1.csnimages.com, so when a customer is hitting our CDN they only use the common1 and common2 domains.

Once these were set up I wrote some code to use our "feature knobs" to put someone either fully on the CDN or fully off the CDN, with the ability to adjust what percentage of overall users were using the new subdomains.  I also made the feature cookie based so once you got assigned to a group you stayed there for the remainder of your session (well, really for the life of the session cookie that I set).  Finally, I tagged each page with either "cdn" or "no_cdn" in Insight's page tracking tags (each page is also tagged with a page type):

TBRUM.q.push(['tags','Home, cdn']);

After some testing to make sure that everything worked, I cranked the "no_cdn" users up to 25% and let the test run for a few days.

Results:

As I mentioned above, we didn’t see any appreciable improvement in performance from the use of a CDN.  More specifically, the median load time, average load time, and 90th/99th percentiles were slightly worse with the use of a CDN (basically the same), while 95th percentile was marginally faster. Here are the full results:

Performance for users hitting our CDN

Performance for users hitting our origin


Notice that the difference in pageviews matches the percentage distribution quite closely, with a 75/25 split we would expect to see three times as many people in the "CDN" group, which we do.

What Does This Mean?

This is definitely a surprise, but before we get too down on CDNs, let's talk about the other value they provide:
  1. Origin bandwidth offload
  2. Ability to tolerate spikes in traffic
Both of these points are extremely important, and at Wayfair they justify the expense of a CDN by themselves.  That being said, we were also expecting to see some performance benefit from our CDN, and it is disappointing that we aren't getting one.

It is also important to note that these results are from our CDN and our application, and thus should not be extrapolated to the entire industry as a whole.  You should run your own tests if you think that your CDN could be out performing ours.

Conclusion:

I'm happy that we ran this test, because it gives us a better frame of reference to approach content delivery vendors with going forward.  It also forces us to focus on the real benefits we are getting for our dollar – speed not being one of them at the moment.

In my mind the major takeaways from this experience are the following:
  1. If you are really curious about how something is affecting your real users, you have to instrument your site, do the work to create a split test, and actually measure what's happening.
  2. Depending on the distribution of your users, a CDN may not provide a huge performance benefit.  Make sure that you know what you are paying for and what you are getting with a CDN contract.
  3. If you have the right tools and a decent amount of traffic these kinds tests can be run very quickly.
If you have any questions about methodology or results let me know in the comments!