Friday, July 6, 2012

Measuring CDN Performance With WebPagetest

When I was at Velocity I heard about a quick and useful trick you can do with WebPagetest to measure the effectiveness of your CDN.  The steps are pretty simple:
  1. Test a URL on your site from a location that is far from your origin.
  2. Using the scripting tab in WebPagetest, point your CDN domains at your origin IP, and run another test.
  3. Compare the results, and see how much your CDN is helping you!
Let's break this down for a Wayfair URL.

Step 1:  Test a URL on Your Site Normally

Since we only have our static content on our CDN, I chose a URL that has a lot of images on it, what we call a "superbrowse" page - http://www.wayfair.com/Outdoor-Wall-Lights-C416512.html.  Since our origin is in the Boston area, I picked the LA node in WebPagetest to give our CDN the best chance of success.  To try and smooth out the results, I configured the test to run 10 times.  It also helps to login and label your test so you can easily find it later.  


While this was running I moved on to step 2...

Step 2:  Use the WebPagetest Scripting Engine to Point to Your Origin

I set this test up almost the exact same way, except in the script tab I entered the following:

 setDns common.csnimages.com 209.202.142.30  
 setDns common1.csnimages.com 209.202.142.30  
 setDns common2.csnimages.com 209.202.142.30  
 navigate http://www.wayfair.com/Outdoor-Wall-Lights-C416512.html  

This points all of our image domains (we are using domain sharding for performance) at our origin IP address and bypasses our CDN entirely.

At this point I just had to wait for the tests to complete and compare the results.

Step 3:  Comparing and Interpreting Results

Here are the test results over 10 runs:

Test Type Mean Median Standard Deviation
With CDN 6.7055 5.744 1.74693
Without CDN 6.758 6.221 1.61

Based on these results, it appears that our CDN isn't actually providing much benefit! There is almost no difference in the mean for the two tests, and the difference in the median is well within one standard deviation.  Are we sure this worked?  Let's check the details:

With the CDN in place we get external IP addresses for every image (even though the location is an ambiguous "United States"):



With the scripted test we are hitting the origin in Boston, MA for every image:


So the test is working as expected.  Let's look a little closer...the column on the left is "Time to First Byte" (TTFB) and the next column over is "Content Download".  With a CDN you expect TTFB to be comparable or better than your origin, since CDNs should have highly optimized servers and the first packet shouldn't have too far to go.  The content download should be MUCH faster, since the edge nodes are closer to the end user than your origin is.  As we scan down these screenshots we can see that for the CDN the content download times vary widely, from 2ms to 1500ms, and for images that are less than 10KB!  This is not great performance.  I'm also surprised that the TTFBs are so high, I would expect the first byte of an image to be served off an edge node in well under 100ms.  The origin requests are slower on the whole, but more consistent.  These two factors combine to make the difference in the two tests negligible.  

Based on this brief analysis of one URL, it looks like our CDN isn't providing a whole lot of benefit.  Granted, these images are on the smaller side, and CDNs should perform much better for heavier content, but the reality is that most of our static content is product images that are less that 10KB, so this is our primary use case.  I'm going to take this data back to our CDN and have a conversation, but in the meantime I hope you use this technique to see how your CDN is impacting your site's performance.  

13 comments:

  1. Thanks. Trying out a similar test against our CDN.

    ReplyDelete
  2. Jonathan, this is awesome that you did this. It would be great if you extended the test to more locations, or better yet, did an A/B test with real user measurement. You could use tagging with Torbit Insight to measure the real world improvement gains you are getting (or not) with your CDN provider.

    ReplyDelete
    Replies
    1. That's funny, I actually already have a to-do item that involves using Insight tagging to test this exact thing, in addition to testing the optimal number of domains to shard on. I guess we're on the same page there!

      I would definitely like to test additional URLs/locations in the meantime, so I may have a follow up post soon.

      Delete
  3. Warning - Long comment ahead...

    I love WebPageTest, but frankly I think it's problematic to measure CDNs with it... I tried doing that in the past (as part of Blaze) and IMO it's not the right tool for this job.

    This is true for a few reasons, I'll mention the ones I think are most important.

    The main one is that WPT agents almost always run from a hosted server. These servers are on "the backbone of the web", and have low-latency routes to central peering spots. This isn't what a "real user" sees, which is why we use dummynet to throttle the network. So now you're assuming the same latency between client and server with and without your CDN, which means you're nullifying probably the core CDN value prop - being close to users...
    If you believe your CDN isn't actually closer to your users, that's a different story. I would suggest looking at your connect times in RUM to one page served via a CDN link one page that's not, as an approximation of comparing latency.

    Second, CDNs maintain a mapping from client to the closet edge. Those maps are not well maintained for accessing the CDN edge from a hosted server, since that's not what CDNs do. As a result, it's reasonably likely you'll get mapped to a far away server, just because the mapping algorithms didn't focus on your use-case. You can mitigate this problem by trying to explicitly find a CDN node in the region if you can, and set its IP explicitly.

    Third, many CDNs leverage the name-server location as a part of this mapping algorithms. Many of WPT.org's agents are using either a data-center Name-Server or the Google name-server, which again isn't representative of most users (who use their ISP's DNS). So again, this can result in bad mapping.

    Lastly, CDNs are designed to provide the best overall results, and not to accelerate an individual test case. There's logic involved in deciding which content is "hot" and prioritize it, there's caching logic and priorities, logic around keeping connections to origin open or not, etc. This means a synthetic test has a high chance of being treated in a less-than-optimal way by the CDN, compared to real traffic.

    Testing is important, and all of these are caveats to keep in mind. As I mentioned, I eventually gave up on using WPT to measure CDN performance as the results I got didn't at all reflect my RUM results. Right now I would suggest RUM as the best way to measure CDN performance (by turning it on/off for a few days, using DNS A/B testing or redirecting to a different URL with and w/out the CDN). If you can't do that, I would suggest a testing solution that has "Last Mile" coverage - they're not as good as WPT in many ways, and many CDNs try to "game" them, but their distribution is still the best way to synthetically test a CDN today.

    ReplyDelete
    Replies
    1. Hi Guypo,

      Thanks for the in-depth reply, these are all valid points. I'd like to think that even with a sub-optimal mapping there is a latency benefit from hitting an edge node instead of the origin, but I agree that it's not going to mirror a traditional use case (and I did only test one URL from one location after all, not exactly a representative sample).

      I'm convinced that to really test out CDN performance you have to look at RUM data, so I'm going to try to set something up this week to run a split test for our real users. I'll post the results here when I have them - I definitely appreciate the need for more rigorous/realistic testing.

      Delete
    2. This is an awesome discussion, thanks for running the test and posting the details. Just a couple of counter-points:

      The traffic shaping on the WebPagetest agents is for the last-mile ISP latency and not for the full client-server latency (which is why we have agents hosted around the world). The result is that the impact of CDN routing should be similar to an end-user though it's not as good as if you can get RUM data from a huge pool of users. You are still testing just one edge node though so it may not be representative.

      As far as the geo-location goes, most CDN's use the ISP's DNS server but not all of them - some use TCP anycast to guarantee the nearest edge regardless of DNS. Even when relying on DNS-server-based GEO-location some do better than others. In this case that appears to be moot though because the CDN correctly located you to LA-based edge nodes. Interestingly, in the median run (http://www.webpagetest.org/result/120707_RS_2d21b828bbbd1d69903d6d6510d03b37/2/details/ ) the RTT to 63.234.226.26 looked really good but it was taking 300ms to connect to 63.234.226.40 and 63.234.226.50 (those 3 IP's are what were resolved for the different shard CNAME's, all in LA). A traceroute to 63.234.226.40 from the LA agents shows it responding quickly now so there could have been a transient issue with the edge nodes.

      I actually think the most interesting point for discussion is the logic around "hot" content and how different CDN's handle it. The Wayfair shared content (css and some of the js) MAY get hit frequently enough to stay in a memory cache but product images are a long tail that are less likely to ever be hot, particularly on a given edge node (unless they are on the landing page). What happens when a cooler piece of content gets requested could be a very interesting difference between CDN's. If the CDN needs to hit magnetic media directly on the edge server it could be an extremely painful operation depending on how busy the disks are (single digit ms up to dozens of ms). CDN's that have moved to SSD's can deal with that kind of a miss much better. If it has to hit another server upstream then the times basically cascade. If it has to go all the way back to the origin then you have actually added a significant amount of time by using a CDN. Looking at your access logs will tell you how often full misses are happening but edge misses will be hidden inside of the CDN network.

      Can't wait to see what the results are for the RUM data. Are the origin servers tuned to serve the static content really quickly or are they basically assuming a CDN will take care of it?

      Delete
    3. Jonathan, great to hear you'll be looking at RUM data too, it'll be really interesting to try it out.

      Pat, good points about WPT. Aaron makes a good point below too about the Wayfair traffic likely being "hot", I overlooked the fact the simulation was turning the CDN "off", not on.

      To clarify one item, I do understand WPT throttling is meant to simulate ISP throttling, but the expectation is that the origin server would then add a bigger additional latency than the CDN Edge. It's theoretical for this case, but I've seen it happen often.

      Keep us posted on the RUM results!

      Delete
    4. RUM results are in - check out the new post on this blog: http://www.jonathanklein.net/2012/07/measuring-cdn-performance-with-real-users.html

      Delete
  4. You just gave me an idea for an awesome test to run. We should talk soon. :)

    ReplyDelete
  5. Hi Jonathan,

    I have a smile on my face reading this, as it was my presentation at Velocity (http://www.slideshare.net/turbobytes/getting-a-grip-on-cdn-performance-why-and-how) that mentioned this WebPagetest script. Happy to see you actually used it.

    Guypo's comment has some important points (yes, RUM data is best!), but there are a few things I want to add.

    > You should run more than 10 tests.
    10 is just too low. Do 50 or 100, that will give a better view.

    > Some WPT test machines/locations are in a data center, but not all. Email Pat Meenan to find out which ones are not and use those.
    Yes, throttling is important to simulate a real user connection.

    > "There's logic involved in deciding which content is "hot" and prioritize it, there's caching logic and priorities, logic around keeping connections to origin open or not, etc. This means a synthetic test has a high chance of being treated in a less-than-optimal way by the CDN, compared to real traffic. "

    Wayfair is a pretty high-traffic site, so the site-wide assets like CSS and JS should be in the caches of your CDN at all POPs in the US.
    Whatever POP you hit with the WPT tests, it should be a HIT.
    Maybe a HIT from a far-away POP, but a HIT nonetheless.
    If the WPT machine is in LA and your origin is in New York, the CDN should be at least slightly faster, as it should have a faster network path.
    For many CDNs you can see in the WPT results what POP you were hitting, in a custom response header the CDN sends.

    Also: you can analyze your origin access logs or CDN logs to get an understanding of the cache MISS rate.

    Looking forward to seeing the RUM results!

    ReplyDelete
  6. Hi Jonathan,

    First off: vendor alert. I'm promoting my product with this comment, but only because it is relevant and hopefully useful. My apologies if it is inappropriate.

    Great post, it is a lot of fun to see interesting ways to evaluation CDN performance. Unfortunately I missed Aaron's talk at Velocity, it looks like it was quite useful.

    Another option for doing comparative measurement of CDNs (and your origin) is Cedexis Radar. Radar takes about 1 billion measurements a day from web users around the world in order to compare CDN and cloud performance from roughly 32,000 ASs. Many of these measurements are of the major CDN and cloud vendors, but you can also measure your origin and the performance that your web visitors are getting from your own CDN accounts.

    People often use the data during vendor selection to find the one or several CDNs that best serve their visitors, and to inform real-time traffic management decisions to optimize the end user experience.

    You can see some of the results of all these measurements at
    www.cedexis.com/country-reports/
    and read more about how the measurements are taken at
    http://www.cedexis.com/products/methodology.html

    In some ways this methodology addresses Guypo's points about using WPT for this purpose. By taking enough measurements across all of your visitors browsers you can form a great comparative view into how your origin and CDNs are comparing with each other.

    Hope it helps,
    Greg
    Director of Products
    Cedexis, Inc.

    ReplyDelete
  7. Hi Jonathan,
    Great post and great initiative.
    I always like looking into real world examples.
    I got to it only now through your wayfair post.
    Beside some of the items that you, guypo and pat mentioned on the time to connect and other - there are other clear cases where a CDN doesn't help because of the way ou use it.
    for instance - the font files http://common.csnimages.com/st4/common/fonts/journal.eot are served with a no-store caching instructions. this means that basically the CDN server needs to go to your server on every such request.
    spriting or generating images and combining other resources on the fly, may result in highly non-cacheable content, as for every page you may generate a different resource. when done right you can improve performance, but if not properly tuned it can result in a cache miss on almost every resource request, and the server generating the resource (which is time consuming) over and over again.
    I am not sure this is the case in your example, as I don't know your implementation, but such things may definitely reduce cache-hit-ratios and result in more cache misses.

    These are all points to consider when designing your application, and potential outcomes of a test that shows that a CDN doesn't improve performance where you expect it to perform better.

    I am interested to know if you looked into that, and what the results will look like after tuning some of the parameters.

    - Ido

    ReplyDelete
    Replies
    1. I agree in the case of the font files, but we don't have any static content that is dynamically generated per request so our hit rate for our core CSS/JS files should be very close to 100%.

      At a high level I agree that there are things we could tune to improve cachability and thus boost the benefit we are getting from our CDN, but I was more interested in the current benefit we are seeing without putting additional time into optimization. Now that we have these results it might be worth taking some time to try and improve the situation.

      Delete