Monday, November 19, 2012

Push-up Challenge Update

This is a follow-up to my "1,000,000 Push-ups" post.  Check out that post for the reasoning behind this challenge.

Update on the Numbers

Today is my birthday, which marks the one year mark since I started my 1,000,000 push-up challenge. My goal for the first year was to average 50 push-ups a day, and I ended up averaging a little over 51, finishing the year 455 push-ups ahead of my goal.  You can see all of the numbers on the Google Doc.  That's great, but unfortunately I was taking it easy the first year, and if I want to hit my goal of 1,000,000 push-ups by the age of 50 I have to average 112 push-ups a day for the next 24 years.  As a result I'm setting my daily goal for this year at 100 push-ups - still below where I need to be, but I'm planning on continuing to raise that number in future years, which will allow me to catch up.

Thursday, September 27, 2012

Learning Strategies for Engineers - Part 2

In part 1 of this series I covered some strategies for building your technical skills as an engineer.  Part 2 is going to be more about soft skills and improving personal relationships.  Let's get into it.


Use GTD or Some Other System:

Getting Things Done is a personal management system that allows you to keep track of all your commitments, and "close the loop" on tasks you personally have to do as well as tasks that you have delegated to others.  The primary goal is to boost productivity, but just as important is the goal to relieve  stress by making you secure in the knowledge that all of your responsibilities are under control.

GTD centers around this flow chart for processing incoming items:

Thursday, September 20, 2012

Learning Strategies for Engineers - Part 1

In my last weeks at Wayfair I gave a series of talks titled "Learning Strategies for Engineers" that were designed to kick off a peer to peer training program within the engineering group.  They were targeted at both new and experienced engineers, with the goal of giving people another another perspective on how to stay current in our field.  People seemed to enjoy these sessions, so I am reproducing the key points here.  Part 1 will cover technical skills and possible ways to grow them, while part 2 will focus on "soft skills" and interpersonal relationships.  I'm sure that many of these points will seem obvious, but hopefully some of it is useful.


Read:

I subscribe to around 100 blogs in Google Reader, most of which are technical.  You can download the list with the non-technical blogs removed and import it directly into the RSS reader of your choice here.  I try to read everything that comes in, although I just scan some of the feeds (Slashdot in particular) for interesting items and then mark the rest as read.  My goal is to keep the unread count at zero, and though that is rarely the case I have found that I can stabilize it at under 100 unread items with a modest time investment (maybe an hour a day).  One important thing to note here is that I have made a rule with myself that when I add a feed I have to drop one - I don't want to fall into the trap where I spend all of my time reading technical literature and not enough time actually doing things.

Monday, August 20, 2012

Why I Left Wayfair

As many of you know by now, I recently accepted an offer to work on the web performance team at Etsy, and my last day at Wayfair was August 17th.  I'm taking this week off, and then heading down to Brooklyn to start the new job on Monday the 27th.  I'll be in Brooklyn until the end of September, and then working remotely from Boston for the most part after that.

I was going to write a long post about my decision to leave and the factors that went into it, but then I saw this Zen Pencils comic illustrating a Mark Twain quote, which I think sums it up well (minus the shouting boss part, that never happened, and my boss was awesome :-)):



In short, I see moving to Etsy as a great opportunity to work with amazing engineers on a product that is changing the world for the better.

I loved my time at Wayfair, and I will have friends from there for the rest of my life.  I learned a huge amount in the last four years, and I think it was the perfect place to start my career.  At the end of the day it just felt like the right time to sail away from the safe harbor and explore something new.

Wednesday, August 15, 2012

Northeast PHP Recap

Last weekend I spoke at Northeast PHP, a conference in Boston that was held for the first time this year.  It was a really fun time, and both days were packed with talks.  I appreciate all of the feedback I got on my presentation, and I had a lot of fun meeting everyone at the conference.

Michael Bourque and his team put on an awesome event, using their Boston PHP Meetup organizer skills to make sure that everything went off without a hitch.  The Microsoft NERD Center is always a great venue, and the fact that Microsoft provided it for free helped keep the cost of the conference extremely low (only $99!).

Wayfair was also a gold sponsor, which meant that we provided the t-shirts and paid for the Saturday night event (apps + beer at The Meadhall).  We ended up sending around 15 engineers to the conference, and everyone got a lot out of it.  Most of the speakers have posted their slides, so check out some of the talks that we really enjoyed:


Thanks to Michael, Matt, Heather, and everyone else for organizing the conference.  Hopefully I can participate again next year!

Thursday, July 19, 2012

Measuring CDN Performance With Real Users

This is cross posted on the Wayfair Engineering Blog

A couple of weeks ago I ran a test with WebPagetest that was designed to quantify how much a CDN improves performance for users that are far from your origin.  Unfortunately, the test indicated that there was no material performance benefit to having a CDN in place.  This conclusion sparked a lively discussion in the comments and on Google+, with the overwhelming suggestion being that Real User Monitoring data was necessary to draw a firm conclusion about the impact of CDNs on performance.  To gather this data I turned to the Insight product and its "tagging" feature.

Before I get into the nitty-gritty details I'll give you the punch line: the test with real users confirmed the results from the synthetic one, showing no major performance improvement due to the use of a CDN.

Implementation Details: 

Prior to this test we served our static content (CSS, JS, and images) from three domains:

common.csnimages.com
common1.csnimages.com
common2.csnimages.com


The first domain is where our CSS/JS comes from, and the other two are domains that we shard images across to boost parallel downloading.  To effectively test performance without a CDN we had to setup some new subdomains.  Luckily we already had two other subdomains set up from previous testing.  We configured these subdomains to always hit origin:

common3.csnimages.com
common4.csnimages.com


To ensure that this test was comparing apples to apples, I switched all requests hitting common.csnimages.com to common1.csnimages.com, so when a customer is hitting our CDN they only use the common1 and common2 domains.

Once these were set up I wrote some code to use our "feature knobs" to put someone either fully on the CDN or fully off the CDN, with the ability to adjust what percentage of overall users were using the new subdomains.  I also made the feature cookie based so once you got assigned to a group you stayed there for the remainder of your session (well, really for the life of the session cookie that I set).  Finally, I tagged each page with either "cdn" or "no_cdn" in Insight's page tracking tags (each page is also tagged with a page type):

TBRUM.q.push(['tags','Home, cdn']);

After some testing to make sure that everything worked, I cranked the "no_cdn" users up to 25% and let the test run for a few days.

Results:

As I mentioned above, we didn’t see any appreciable improvement in performance from the use of a CDN.  More specifically, the median load time, average load time, and 90th/99th percentiles were slightly worse with the use of a CDN (basically the same), while 95th percentile was marginally faster. Here are the full results:

Performance for users hitting our CDN

Performance for users hitting our origin


Notice that the difference in pageviews matches the percentage distribution quite closely, with a 75/25 split we would expect to see three times as many people in the "CDN" group, which we do.

What Does This Mean?

This is definitely a surprise, but before we get too down on CDNs, let's talk about the other value they provide:
  1. Origin bandwidth offload
  2. Ability to tolerate spikes in traffic
Both of these points are extremely important, and at Wayfair they justify the expense of a CDN by themselves.  That being said, we were also expecting to see some performance benefit from our CDN, and it is disappointing that we aren't getting one.

It is also important to note that these results are from our CDN and our application, and thus should not be extrapolated to the entire industry as a whole.  You should run your own tests if you think that your CDN could be out performing ours.

Conclusion:

I'm happy that we ran this test, because it gives us a better frame of reference to approach content delivery vendors with going forward.  It also forces us to focus on the real benefits we are getting for our dollar – speed not being one of them at the moment.

In my mind the major takeaways from this experience are the following:
  1. If you are really curious about how something is affecting your real users, you have to instrument your site, do the work to create a split test, and actually measure what's happening.
  2. Depending on the distribution of your users, a CDN may not provide a huge performance benefit.  Make sure that you know what you are paying for and what you are getting with a CDN contract.
  3. If you have the right tools and a decent amount of traffic these kinds tests can be run very quickly.
If you have any questions about methodology or results let me know in the comments!

Friday, July 6, 2012

Measuring CDN Performance With WebPagetest

When I was at Velocity I heard about a quick and useful trick you can do with WebPagetest to measure the effectiveness of your CDN.  The steps are pretty simple:
  1. Test a URL on your site from a location that is far from your origin.
  2. Using the scripting tab in WebPagetest, point your CDN domains at your origin IP, and run another test.
  3. Compare the results, and see how much your CDN is helping you!
Let's break this down for a Wayfair URL.

Step 1:  Test a URL on Your Site Normally

Since we only have our static content on our CDN, I chose a URL that has a lot of images on it, what we call a "superbrowse" page - http://www.wayfair.com/Outdoor-Wall-Lights-C416512.html.  Since our origin is in the Boston area, I picked the LA node in WebPagetest to give our CDN the best chance of success.  To try and smooth out the results, I configured the test to run 10 times.  It also helps to login and label your test so you can easily find it later.  


While this was running I moved on to step 2...

Step 2:  Use the WebPagetest Scripting Engine to Point to Your Origin

I set this test up almost the exact same way, except in the script tab I entered the following:

 setDns common.csnimages.com 209.202.142.30  
 setDns common1.csnimages.com 209.202.142.30  
 setDns common2.csnimages.com 209.202.142.30  
 navigate http://www.wayfair.com/Outdoor-Wall-Lights-C416512.html  

This points all of our image domains (we are using domain sharding for performance) at our origin IP address and bypasses our CDN entirely.

At this point I just had to wait for the tests to complete and compare the results.

Step 3:  Comparing and Interpreting Results

Here are the test results over 10 runs:

Test Type Mean Median Standard Deviation
With CDN 6.7055 5.744 1.74693
Without CDN 6.758 6.221 1.61

Based on these results, it appears that our CDN isn't actually providing much benefit! There is almost no difference in the mean for the two tests, and the difference in the median is well within one standard deviation.  Are we sure this worked?  Let's check the details:

With the CDN in place we get external IP addresses for every image (even though the location is an ambiguous "United States"):



With the scripted test we are hitting the origin in Boston, MA for every image:


So the test is working as expected.  Let's look a little closer...the column on the left is "Time to First Byte" (TTFB) and the next column over is "Content Download".  With a CDN you expect TTFB to be comparable or better than your origin, since CDNs should have highly optimized servers and the first packet shouldn't have too far to go.  The content download should be MUCH faster, since the edge nodes are closer to the end user than your origin is.  As we scan down these screenshots we can see that for the CDN the content download times vary widely, from 2ms to 1500ms, and for images that are less than 10KB!  This is not great performance.  I'm also surprised that the TTFBs are so high, I would expect the first byte of an image to be served off an edge node in well under 100ms.  The origin requests are slower on the whole, but more consistent.  These two factors combine to make the difference in the two tests negligible.  

Based on this brief analysis of one URL, it looks like our CDN isn't providing a whole lot of benefit.  Granted, these images are on the smaller side, and CDNs should perform much better for heavier content, but the reality is that most of our static content is product images that are less that 10KB, so this is our primary use case.  I'm going to take this data back to our CDN and have a conversation, but in the meantime I hope you use this technique to see how your CDN is impacting your site's performance.  

Sunday, April 22, 2012

1,000,000 Push-Ups

I've always enjoyed doing push-ups.  Okay, that's a lie - I used to hate doing push-ups and I was pretty bad at them.  At some point in high school I had done enough that all of the little stabilizer muscles that are required for good form began to get built up, and they stopped making me miserable.  Once I got into martial arts they became a regular exercise for me, and I eventually came to like them.

One summer in college I was off campus doing physics research, and I had planned on testing for my Black Belt once I got back from my program.  Black Belt tests are notorious for being extremely grueling, in fact I remember my instructor saying "first we physically exhaust you, then we try to mentally break you, and then we see if you actually know the techniques you are supposed to know".  I wanted to be prepared for my test, so I set a goal for myself to do 20,000 push-ups in the 9 weeks I had to train.  I didn't have much time to get to that goal, so I had to start doing a lot of push-ups every day, and increase it rapidly.  I began with 200 per day of various types, and by the 9th week I was doing 400+ push-ups every night, cramming it into 1-2 hours before bed.  There were some rough nights, getting back to my room at 1:00 AM and having 400 push-ups to do wasn't exactly enjoyable, but in the end I hit my goal.  My Black Belt test went smoothly, and the 350 or so push-ups that I did during the 4 hour test felt like a break from the rest of the exercises I had to perform.

More recently I got the idea in my head that it would be cool to do 1,000,000 push-ups in my lifetime.  Push-ups are one of the best exercises you can do, especially as you get older, so I figured it made sense to do them consistently.  Since I didn't have a good way to track the ones I had done so far in my life (aside from my spreadsheet from the summer of 2006), I decided that I would have to start with a clean slate to make it a truly fair challenge.  After doing some quick math I realized that this worked out to a little more than 100 push-ups a day for 25 years.  Luckily my 25th birthday was approaching, so I decided to start on that date, with the goal of finishing by my 50th birthday.

My 25th birthday has come and gone (November 19th, 2011) and I am now more than 5 months into my challenge.  For the first year I decided to shoot for an average of 50 push-ups a day, with the plan to increase it in subsequent years.  I wanted to start with a lower target so I could get in the groove and have a better chance of success.  As of this writing I am 235 push-ups ahead of my target, with 7985 under my belt.  I'll hit 10,000 in early June - 1/100th of my goal.

This is the first time I've ever created a serious goal with this kind of time horizon, and I have to admit that it is a little scary.  What if I get injured?  What if I get sick of this halfway through and bail?  I have to constantly try to be ahead of my target so I can tolerate missing a day because of some unforeseen complication.  I have also tried to structure it so they are always doable - compressing my daily routine into 2-3 sets before bed, and trying to finish in under 10 minutes.  Doing them at night removes a lot of excuses that could come up if I did them when I woke up, e.g. "well, I have to wake up early to get on a flight today", or "crap, I slept later than I had planned".

I track all of my push-ups in a Google Doc that tells me how I am doing relative to my goal (when it says "perfect push-ups" it means I used these).  Perhaps sharing it publicly will keep me even more on target, since no one likes to declare a goal and then not hit it.  This has worked in the past for people who post every meal they eat on Twitter, or simply post their weight every day.  I also track all of my push-ups on Fitocracy, for the achievements/quests/levels of course.

At 7985 push-ups I have quite a long way to go, so wish me luck.  If you have similar goals, or if you have done anything that took you 25 years to finish, I would love to hear about it in the comments.