Tuesday, September 20, 2011

September 2011 Site Performance Report

This is a cross post from the Wayfair Engineering Blog.

I recently saw this post on Etsy’s blog and found it inspiring – you don’t typically see retailers (or any companies for that matter) sharing performance data.  Even more impressive is the fact that they posted this on their primary blog as opposed to their engineering blog.  To me this says that Etsy really cares about performance, and wants all of their customers to be aware that it is a priority.  After reading that post I immediately wanted to share some Wayfair data – and this is my attempt to do that (maybe I can get it cross-posted to our main blog :-) ).

Here at Wayfair we care a lot about performance, and the business case for having a faster site is well documented.  To keep tabs on our site performance we have synthetic monitoring tools like Catchpoint, real user monitoring tools like Coradiant (now a part of BMC software), and our own home-brew monitoring instrumented in our ASP and PHP code.  We have also had great success using StatsD in conjunction with GraphiteThough we use the ruby client, StatsD was started as an open source NodeJS project from Etsy (I promise we aren’t stalking you guys).  The numbers that I am sharing are from Coradiant, and measure “host time” which is defined as the time between the last packet of the request and the first packet of the response as measured by a network tap.  Without further ado, here are the numbers for our highest traffic pages.  These numbers are all in milliseconds, and were measured between 9AM and 5PM Eastern Time on a Wednesday, so they should be indicative of real world performance.

Page Average 95th Percentile
Homepage 245ms 540ms
Product Page 373ms 713ms
Product Browsing Page (example) 304ms 680ms
Search Results Page 448ms 1000ms

A graph of our product page load time from Coradiant

 These numbers are pretty good, but we definitely have room for improvement.  Universally our averages are quite low, but as you move up into the 95th percentile performance degrades more than I would like, especially on our search results page.  The content here is generated by Solr, an open source search engine project, and it returns the results very quickly.  Most of the time is NOT spent on Solr, so we need to look at the parsing code (in PHP) as well as the time it takes to get the Solr data from the search server to the webfarm over the network for possible improvements.

Server side generation time is one of the most important things to keep fast, since while the customer is waiting for the server they are just staring at a blank screen and watching the browser’s busy indicator spin.  The metric that a lot of performance tools watch is “time to first byte” (TTFB) and this limits the overall speed of the page.  On the plus side, TTFB is something that is almost 100% under your control.  We continually look for ways to improve these times and give our customers the best experience possible.  Future posts will go into specific techniques that we use to keep our servers blazing fast.

I plan on talking about front-end performance on another day, but for now I encourage you to share data from your company!