Tuesday, December 10, 2013

Reducing Domain Sharding

This post originally appeared on the Perf Planet Performance Calendar on December 7th, 2013.

Domain sharding has long been considered a best practice for pages with lots of images.  The number of domains that you should shard across depends on how many HTTP requests the page makes, how many connections the client makes to each domain, and the available bandwidth.  Since it can be challenging to change this dynamically (and can cause browser caching issues), people typically settle on a fixed number of shards - usually two.  

An article published earlier this year by Chromium contributor William Chan outlined the risks of sharding across too many domains, and Etsy was called out as an example of a site that was doing this wrong.  To quote the article: “Etsy’s sharding causes so much congestion related spurious retransmissions that it _dramatically_ impacts page load time.”  At Etsy we're pretty open with our performance work, and we’re always happy to serve as an example.  That said, getting publicly shamed in this manner definitely motivated us to bump the priority of reinvestigating our sharding strategy.  

Making The Change

The code changes to support fewer domains were fairly simple, since we have abstracted away the process that adds a hostname to an image path in our codebase.  Additionally, we had the foresight to exclude the hostname from the cache key at our CDNs, so there was no risk of a massive cache purge as we switched which domain our images were served on.  We were aware that this would expire the cache in browsers, since they do include hostname in their cache key, but this was not a blocker for us because of the improved end result.  To ensure that we ended up with the right final number, we created variants for two, three, and four domains.  We were able to rule out the option to remove domain sharding entirely through synthetic tests.  We activated the experiment in June using our A/B framework, and ran it for about a month.


After looking at all of the data, the variant that sharded across two domains was the clear winner.  Given how easy this change was to make, the results were impressive:

  • 50-80ms faster page load times for image heavy pages (e.g. search), 30-50ms faster overall.
  • Up to 500ms faster load times on mobile.
  • 0.27% increase in pages per visit.

As it turns out, William’s article was spot on - we were sharding across too many domains, and network congestion was hurting page load times.  The new CloudShark graph supported this conclusion as well, showing a peak throughput improvement of 33% and radically reduced spurious retransmissions:

Before - Four Shards

After - Two Shards

Lessons Learned

This story had a happy ending, even though in the beginning it was a little embarrassing.  We had a few takeaways from the experience:

  • The recommendation to shard across two domains still holds.
  • Make sure that your CDN is configured to leave hostname out of the cache key.  This was key to making this change painless.
  • Abstract away the code that adds a hostname to an image URI in your code.
  • Measure everything, and question assumptions about existing decisions.
  • Tie performance improvements to business metrics - we were able to tell a great story about the win we had with this change, and feel confident that we made the right call.
  • Segment your data across desktop and mobile, and ideally international if you can.  The dramatic impact on mobile was a huge improvement which would have been lost in an aggregate number.  

Until SPDY/HTTP 2.0 comes along, domain sharding can still be a win for your site, so long as you test and optimize the number of domains to shard across for your site.

Sunday, November 24, 2013

Pushupdate 2013

This is a follow-up to my "1,000,000 Push-ups" post.  Check that out for the reasoning behind this challenge.

November 19th was my birthday, and it also marked the end of the second year of my push-up challenge.  For the last year my goal was to average 100 push-ups per day, and I finished five push-ups ahead of that goal, with a running total of 54,805 push-ups.  This isn't a magical coincidence, there's a strong rubber banding effect with my color coded spreadsheet.  When I get behind, I do more push-ups to catch up.  When I get ahead, I know that I can take a day off if I get tired or lazy.  This kept me hovering right around my target throughout the year.  

If you look at the "Year 2" tab in the doc, you will see that there was a lot more yellow than in year one.  This is largely because I got sick at the end of April, and had two consecutive days with zero push-ups.  It took me quite a while to catch back up, because I didn't have any time pressure to do so.  Why is all of this relevant?  Because it has taught me a few things about my motivation and how I should structure the challenge going forward:

  1. I really only care about finishing the year on target, being slightly behind in the middle of the year doesn't bother me or motivate me all that much.
  2. I am unlikely to do more push-ups than the sheet requires, so thinking that I will get ahead naturally is silly.
  3. Everything is a lot easier if I spread the push-ups out during the day, instead of jamming them all in right before I go to sleep.

Because of these learnings, and specifically #2, I've decided to bump my goal for year 3 to 125 push-ups per day.  This will allow me to finish the year at over 100,000 push-ups, which is 10% of my goal.  If I maintain the 125 push-up average going forward, I will finish the challenge a little over two years early.  I also might continue to raise the daily total in future years, so I can shave a little more time off the end, and potentially finish in 20 years instead of 25.  

The beginning of this year was a bit of a struggle, but towards the end I got much better about doing my push-ups during the day, which had a dramatic positive impact on my attitude while doing them.  Since I work from home this is fairly easy to do, and I'm going to continue to push for that going forward (pun definitely intended).  

Questions?  Comments?  Ideas for future updates?  Let me know, below.  

Wednesday, November 13, 2013

A New Source For WPO Resources

** Update 12/03/13 ** - This article was deleted by Wikipedia moderators.  If you want the full background on why, check out the deletion talk page.  I have just moved the article to the Web Platform Wiki, which is a more appropriate place for it.

After my last post with its abundance of footnotes, some people asked for a place on the web where they can see an aggregated list of performance resources.  This has been tried in the past, and I believe that it has failed because it has always relied on a single person or entity to update the list.  Ideally we would like to have something that fulfills the following goals:

  • Hosted by a non-partial third party
  • Anyone can update - no single point of failure
  • An existing site that people trust, so we don't have to reinvent the wheel
  • A site with a bright future - we don't want this to be obsolete in a few months

This is starting to sound vaguely familiar...

That's right, the new home (hopefully) for Web Performance Optimization resources is Wikipedia.  This kind of article has some precedence, and is modeled after this List of JavaScript Libraries.  

I put together the initial resources on this page this morning, so it is far from exhaustive.  This is the beauty of Wikipedia - anyone can improve this list, and there will be no single "maintainer".  Go edit it now!  I've linked to this new list from the main WPO article, and my hope is that people will continue to keep it up to date.  I think this is something that our industry needs: a place to point newcomers where they can find links to all of the great content that our community produces.  

That's it - have at it.

Sunday, November 3, 2013

We Have a Long Way To Go

When you work on web performance every day, it is easy to assume that the best practices are widely known, widely understood, and widely followed.  Unfortunately, as I have learned through a couple of different experiences recently, this is not the case.  It is frankly astounding how many websites are still failing to implement the best practices that Steve Souders outlined in his book "High Performance Websites" over six years ago.  Six years is an eternity when it comes to web development, but I still regularly meet professional software engineers who are surprised when they hear about the rules that the WPO community evangelizes on a daily basis.

First, The Data

Let's try to scope the problem.  Using data from the HTTP Archive, we know that:

Via Radware we know that pages are getting bigger and more complex, and that adoption of best practices is inconsistent and fairly weak.  At this point people are quick to point to the Google study showing that the web is in fact getting faster, but this is primarily due to browser and network improvements, not due to website improvements.  To quote the study, "it is still impressive given that the size of the web pages have increased by over 56% during this period".  How much faster would the web be if we had browser, network, and content improvements?

I mentioned that I have had a few experiences recently that led me to write this post.  First, I was recently speaking on a panel at a "Web Performance Day" for portfolio companies of a prominent VC firm in San Francisco.  Steve Souders was giving the opening talk, and he opened with a scary comment.  He mentioned that as he was putting his talk together, he surfed around some of the portfolio company websites so he could target his material to the audience members.  He found that in the vast majority of cases, even the most basic optimizations were not being done.  He was forced to alter his talk and give a "Performance 101" talk during which he dredged up slides from decks that he presented over 5 years ago.  Yikes.

The second experience was during a recent episode of JavaScript Jabber.  Alex MacCaw was talking about how he optimized monocle.io, and he mentioned Google's PageSpeed Insights tool.  The tool was new to pretty much everyone on the show, and was a lot of "wow, this is so cool!".  Remember, these are professional front-end web developers that are recording a podcast about JavaScript and front-end best practices in their free time.  If they haven't heard of PageSpeed, and are getting surprised by some of the checks that it exposes, then we have a TON of work to do.

What Should We Expect?

Perhaps I am an idealist, but I think we should be able to get to 95%+ compliance with gzip.  I don't expect every site to sprite all of their images (a best practice that SPDY/HTTP 2.0 will negate anyway), but I do expect every single site on the internet to figure out how to compress text resources.  It's probably the easiest, most impactful, most well supported, and safest optimization that you can do.

This is just an example, but I think it is indicative of the problem, because we're only at 77% compliance with it.  There are enormous benefits with literally zero downsides, and we still can't get a quarter of the sites on the internet to do it (and don't talk to me about CPU usage on shared hosting - if a host doesn't let you turn on gzip you should find another host).

The more concerning thing for me is that people expect new standards and browser improvements to come in and save us, like the aforementioned HTTP 2.0.  While I'm extremely excited about HTTP 2.0, I worry that we will run into the same problem that we have today with gzip: it will take years to get up to a reasonable adoption point, even after the browser support is there.

When it comes to "what we should expect", I think understanding the core technologies behind the web and how to leverage them to build fast applications is a requirement for every single engineer.  After all, speed is more than a feature, it's the most important feature.  That's from the guy holding the purse strings, so it's worth taking to heart.  If someone can't easily explain at least 3-5 performance best practices, to me that's equivalent to not being able to write a line of JavaScript.  In other words, inexcusable if you work on the web as a developer.

Are There Enough Resources For Developers?

Yes.  Perhaps we could do a better job about getting them in front of people early and often, but the[1] number[2] of[3] tools[4] and[5] resources[6] is[7] staggering[8] (I[9] could[10] do[11] this[12] all[13] day[14]).  I'm obviously extremely biased, since I work on the performance team for one of the top technology companies on the web, and one that happens to care a lot about performance, but these resources are not hard to find.  In addition, with the recent healthcare.gov fiasco, web performance is much more in the public eye, which should provide even more motivation for software engineers to get on the bandwagon.

How Do We Fix This?

If you have made it this far, hopefully you believe that there is a problem and that it should be fixed.  The easy answer is to do more of what we are already doing - speaking, blogging, writing books, hosting meetups, and publishing case studies.  This is all great, and we should do more of it, but it's clearly not working as well as we might like.  I'd like to propose a couple of concrete things we could do differently that might help get the word out about performance:

  • Every tech conference should have at least one talk that focuses on performance.
  • Every company with an engineering blog should publish a post about performance (and ideally how it affects their bottom line).
  • Every college with a CS department should offer a course that focuses on web performance (or at least part of a course).  Kudos to Stanford for doing this.
  • We should write more books with the "High Performance" prefix.  Steve Souders has mentioned in the past that he would like to see this, and a few have come out recently (like Ilya's excellent book), but we should do even more.  I'd love to see a "High Performance Ruby on Rails" book, as one example.
If you have the ability to influence any of the points above, please do so.  With our mobile future rapidly approaching, performance is more important than ever.  We will be stuck with 3G and 4G for decades, so networks aren't going to save us.  There's a limit to what browsers can do given certain latency and bandwidth constraints, and when you look at how quickly page complexity is rising, it's not hard to imagine the web getting slower and slower over time.  That's not a future that I want, so let's work together to build the tools, resources, and infrastructure that's necessary to make the web faster.

[1] Google Best Practices
[2] WebPagetest Forums
[3] Yahoo Performance Rules
[4] Web Performance Today Best Practices
[5] Perf Planet Advent Calendar
[6] High Performance Browser Networking
[7] High Performance JavaScript
[8] Velocity Conference
[9] HTTP Archive
[10] The Top 22 Web Performance Posts of 2013 (New Relic)
[11] Web Performance Today Podcasts
[12] Even Faster Websites
[13] YSlow
[14] Compuware Ajax Edition

Sunday, October 6, 2013

30 Day Challenge: Vegetarianism

For context about 30 day challenges, watch this Ted Talk by Matt Cutts.

My 30 day challenge in September was to be a vegetarian - and a true vegetarian, so no meat whatsoever, including seafood.  I've thought about doing this challenge for a while, and this video pushed me over the edge.  My goals for this challenge were simple:

  • See what it's like to live as a vegetarian, and what options are available
  • Be more conscious about meat consumption in general
  • See if this is something that I could sustain long-term

I did pretty well throughout the month, and only slipped once.  During the first week of the challenge I was at my great-aunt's 100th birthday party, and it was at a restaurant with a fixed menu.  The only options were chicken, salmon, or steak.  The kitchen said that they could make me a plate of steamed vegetables, but I decided that I would make an exception and order the steak.  I don't feel like this means that I "failed" the challenge, it was a one-time conscious choice and I was still able to accomplish my goals during the rest of the month.

Obviously the hardest part of this challenge was eating out.  I did a fair amount of traveling in September, which meant finding restaurants with vegetarian options everywhere I went.  This was usually doable, but in many cases I only had one choice of entree.  Luckily I do like to eat tofu, veggie burgers, and almost all vegetables, so I almost always had decent options available.

Once the challenge was over I did go back to eating meat right away, and I had no adverse effects from the month hiatus.  I don't think that my eating habits are going to change dramatically after having done this challenge, but I do plan on being more conscious of the quality and source of the meat that I consume.  If I don't know its origin I think I will be more likely to skew towards vegetarian options.  At some point I might try to follow Mark Zuckerberg's lead and eat only meat that I've killed - but I think I'll save that for a point where I have the means to do so.

If you have done any 30-day challenges recently, or have ideas for future ones, let me know in the comments!

Sunday, September 8, 2013

What I Learned From 30 Days Without TV

My 30 day challenge for August was to avoid watching any television or movies, and I finished about a week ago.  This challenge was tough, primarily since my girlfriend Stephanie (who I live with) was not under the same restriction.  As a result I ended up catching a few minutes of TV and movies here and there, but I did manage to avoid watching a complete show or movie for the entire month.  Also as a clarification, I exempted things like TED talks, screencasts, and other educational videos from this challenge.  The focus was really to avoid "junk food" entertainment that doesn't provide any lasting value.


Two things surprised me about this challenge.  The first was how hard it was to go completely cold turkey.  In addition to the temptation at home, many of the bars and restaurants I went to in August (obviously) had TVs.  I also spent some time in airports, where TVs are never far away.  If you truly wanted to avoid even looking at a TV for a month it would be nearly impossible.  This is something we take for granted, but I was amazed at how hard it was to go even a single day without seeing something being broadcast.  On the flip side, the other surprise was how easy it was to avoid spending any significant amount of time doing it.  There were certainly a few instances where I wanted to sit down and watch a movie, but I was able to resist and (typically) spent my time on more productive work.

So What Did I Do?

I managed to get through a backlog of magazines that had been sitting on my dresser for months, and I finished three books that I was partway through at the start of the month.  I also made some good progress on a couple of side projects, caught up on RSS feeds, and felt a little more in control of my life overall.  As Clayton Christensen says, "it's easier to hold to your principles 100 percent of the time than it is to hold to them 98 percent of the time", and that was certainly true for me with this challenge.  If I had simply tried to watch fewer movies, or spend a little less time in front of the TV, it would not have been nearly as effective.  By having a firm rule I was able to able to avoid situations where it would have been easy to make an exception, like a rainy night at home with Steph, or an evening watching the game with friends.  I just foisted all of the blame for my strange behavior on "the challenge" and moved on to more high value activities.  This had the side benefit of making my interactions with people more meaningful - in situations where we might have had the TV on in the background we instead focused our full attention on each other.

Going Forward

After spending a month without TV or movies, I really got out of the habit of turning towards them as a source of relaxation.  I was forced to find alternate ways to unwind, and that has carried over into the first week or so of September.  Aside from a Labor Day binge on season 2 of The Wire, I've pretty much continued the habits that I developed in August - reading more, coding more, and having more 1-on-1 interactions with friends.  It feels great, and I plan on keeping this up as much as I can.  My loose goal for the rest of the year is to avoid watching any TV or movies alone.  I totally acknowledge that they can be social events, but when I am by myself there are always better things that I can be doing.  To be clear, I don't just mean better from a productivity point of view, I mean activities that I personally get more enjoyment out of.  For me it's all about finding the "Maximum Fun Quotient", and deliberately spending my time doing things that make me happy and enrich the lives of my friends and family.  This challenge helped me realize that movies and TV are not a critical part of that goal.

Friday, August 23, 2013

Northeast PHP Recap

Last weekend was the 2nd annual Northeast PHP conference in Boston.  I gave two talks, "Practical Responsive Web Design" (slides) and "Scaling PHP to 40 Million Uniques" (slides).  I had a lot of fun at the conference, and met some great people.  I didn't get to attend as many talks as I would have liked, since I spent some of the sessions preparing for my own, but there were a couple that I really enjoyed:

Saturday, July 27, 2013

30 Day Challenge: No TV or Movies

For context about 30 day challenges, watch this Ted Talk by Matt Cutts.

I'm trying an experiment for the month of August - not watching any television or movies.  I've never been someone who watches a ton of TV, but when I get into a show I can power through a ten episode season in a week.  Throughout July, I spent more time in front of the TV than usual, and some of my other activities have fallen by the wayside as a result.  I got to a point where I was in the middle of four books simultaneously, and working on three concurrent programming related side projects.  When you add that to everything else I have going on (Ultimate frisbee, Taekwondo, preparing talks for conferences, seeing friends, walking my dog, push-ups, work) it became unsustainable.  I would get a little overwhelmed by my schedule, and default to mindless activities to relax.

Monday, June 3, 2013

Cognitive Biases in Software Engineering

Human logic, unlike that of the machines which we program and use every day, isn't perfect.  We make mistakes, we establish bad mental habits, and we have many cognitive biases that negatively impact our ability to be successful engineers.  I want to go over five of the most common biases that I see on a regular basis as a software engineer.

Sunday, May 12, 2013

Sleep Testing

One of the most important things you can do for your health, happiness, and productivity is to get enough sleep.  You have probably heard this so many times that it feels like white noise at this point, but what if you could actually do it?  What if you could wake up refreshed and energetic every day, and not struggle to get out of bed?  I believe this is achievable for everyone - as long as you are willing to take the time to methodically test your sleep schedule, and make a few minor changes to your bedtime routine.

Thursday, April 4, 2013

14 Day Challenge - Information Diet

In the spirit of the 30-day challenges popularized by Matt Cutts (among others), I have spent the last 14 days on an "information diet".  This is not a new idea by any means, and in fact there was an entire book written about the concept.  I actually heard about it initially from Tim Ferris, and this post on the 4-Hour Life Blog.  Anyway, enough with the background - what the heck is this about?

Sunday, March 31, 2013

Coursera Review: Intro to Computer Networks

I recently finished my first Coursera class, Introduction to Computer Networks.  This was a 10 week long class, and I spent approximately 3-4 hours per week on it.  If you are thinking of taking a Coursera class, and this one in particular, then read on...

Wednesday, February 20, 2013

A Comprehensive Guide to WebP

WebP (pronounced "weppy") is an ambitious and promising new image format that was created by a team at Google back in September of 2010, but browser adoption has been slow.  At the time of this writing the only major browsers that support WebP are Chrome and Opera, even though support in native applications is strong (albeit with plugins in many cases).  In this article I want to present an unbiased, holistic view of the WebP landscape as it stands today.  My end goal is to further the conversation about alternate image formats and image compression on the web, because I believe it is one of the biggest opportunities for making web browsing faster across all devices and networks.

Tuesday, January 22, 2013

How Much CSS Should You Have?

I've noticed recently that it's becoming more and more common to see websites with WAY too much CSS.   I realized that I've never seen specific guidelines around how much CSS a website should have, and no clear narrative around the impact of total CSS transfer size.  Most people just say "as little as possible" or "just write what you need and then optimize".  On the whole I think these statements are a bit too vague to be useful.  In addition, there are a lot of articles about "optimizing" CSS (a.k.a  writing more efficient selectors), which has quickly diminishing returns.  In this article I will talk specifically about how big your CSS files should be, and talk about some other possible pitfalls when authoring CSS.