Amazon, welcome to the world of the mortals

Did the e-commerce God of Uptime Stability meet its match with its Prime Day promotion this week? It depends on your point of view.

Amazon tape

For years, has been the e-commerce God of Uptime Stability. Other e-tailers — including some much larger ones — have taken serious hits on Black Friday and Cyber Monday or have been taken offline by DDoS attacks, but Amazon has never suffered any meaningful outages.

But did that impressive record get overturned on Wednesday (July 15)? It depends on your point of view. Wednesday was the day the company launched an incredibly hyped promotion called Prime Day, complete with what Amazon promised would be virtual aisles full of special discounts, bigger than Black Friday. The hype worked, and consumers stampeded to to see whether the bargains were truly compelling.

One point of view was that of unhappy customers. CNN found the following Twitter tags: #unhappyPrimeDay, #AmazonFail, #gobacktosleep, and #PrimeDayFail. But to be fair, let's look at whether that likely means much.

Let's start with a bit of e-commerce uptime reality. It's very difficult for e-tail executives to believe that their site was down — or even very slow — when they are seeing huge or even record-breaking sales. They take those numbers as evidence that the site is working perfectly, blind to the fact that many shoppers found the site extremely slow or couldn't get through at all. Record-breaking sales look great unless you know that, for example, for every 1,000 customers who got through, 50,000 were locked out.

Another shortcoming of site analysis: Even if you can see that thousands of people were locked out, they all look the same. A customer who was about to spend $10,000 on your site looks exactly like a tire-kicker who would have bought nothing. If you can only see the sales that happened and not the ones that were lost, then you are deprived of a highly motivational metric: Your site's slowness cost you x amount of dollars in sales.

What Amazon could see, though, had to be very cheering. The company said worldwide growth increased 266% over the same day last year and 18% over Black Friday 2014. It reported that shoppers "ordered 34.4 million items across Prime-eligible countries, breaking all Black Friday records with 398 items ordered per second."

I don't care how many massive server farms Amazon is using, it's amazing that any site can handle those numbers and stay up.

Staying up and doing record business doesn't mean that Amazon didn't experience some difficulties. But here's a harsh reality: No matter how many servers (within reason) and what level of other resources an e-tailer might put in place before a big sales event, a large enough influx of site visitors can overwhelm it, especially if the timing of the sale encourages that avalanche of shoppers to mostly visit at the same instant. The best an e-tailer can do is make the best guess possible as to the number of expected simultaneous visitors, increase it by a reasonable percentage and have lots of IT staff standing by to try and add more resources if necessary.

My inclination is to say that Amazon did the best it could. But what about all of those unhappy customers venting their spleens on social media? Pointing to social media complaints about Prime Day, lots of consultants were sending messages on Thursday, implying that Amazon had somehow failed or, at least, underperformed. But none of the ones I spoke with could point to a single thing that Amazon could have reasonably been expected to do differently.

But interesting stats still made the rounds. Dynatrace tested Amazon's site and found that visits to the homepage were solid.

"It looks good as people are coming in," referring to Amazon activity at 8 a.m. Wednesday, but "things started to come off the rails" when shoppers tried using "transactional components," said David Jones, the sales engineering director at Dynatrace. "When they started to do something — such as trying to sign up for Prime — there were lots of these errors being seen" including "long response times when adding things to the cart."

By 9 a.m., things started to get worse, Jones said, when he recorded transactions that usually take about 20-30 seconds on Amazon clocking in at more than 120 seconds. Dynatrace's analytics are based on software performing scripted tasks in the exact same way so, in theory, the comparisons should be valid.

"There wasn't a network problem, there wasn't a third-party issue," Jones said. "It was pure and simple infrastructure and applications." Even though Dynatrace didn't find third-party services a big problem for Amazon, one server — which turned out to be an ad server—did repeatedly cause problems. That's not surprising. Amazon can throw a ton of extra resources onto the site, but that doesn't help any other service that is tied in. A few minutes of undiluted Amazon special-sale traffic is enough to short-circuit almost any unprepared site.

Jones also found that Amazon significantly lightened its homepage early Wednesday morning, reducing the number of objects from 298 Tuesday night to 137 on Wednesday morning and dropping the size of the page from 5.5MB on Tuesday to 4MB on Wednesday. (We asked Jones for the number of Amazon objects on several other days over the last week and found that Tuesday's 298 was typical, with other days ranging from 289 to 301.

Jones' point was that Amazon was doing what it could to make its page slimmer and, therefore, able to stay up a little longer in an onslaught. I'm not so sure, though, that that was Amazon's intent. Instead of having a large numbers of products promoted on its homepage — which is Amazon's norm — it replaced almost all of the page with a huge graphic ad saying "Happy Prime Day. More deals than Black Friday." By putting little else on the homepage beyond that ad, Amazon was trying to make it seem like even more of a big deal. That would account for the reduction in the number of objects. It would also explain why the size of the page (that 5.5MB dropping to 4MB) didn't drop as much as the reduction in objects would suggest (it was a really big ad).

Computerworld's IT Salary Survey 2017 results
Shop Tech Products at Amazon