Skip the navigation

Microsoft blames WGA meltdown on human error

It wasn't a server outage or blackout after all, says company

August 29, 2007 12:00 PM ET

Computerworld - Microsoft Corp. said late yesterday that last weekend's failure of the antipiracy process it requires of Windows XP and Vista was due to "human error" and shouldn't be called an "outage" since the servers didn't go off-line. The company also promised that changes have been made to avoid a repeat.

In an earlier statement, Microsoft had downplayed the scope of the problem, saying that fewer than 12,000 systems worldwide had been affected.

In a post to the Windows Genuine Advantage (WGA) blog, program manager Alex Kochis, normally the public voice for the team, explained the malfunction of the company's validation servers in the greatest detail so far.

"Nothing more than human error started it all," said Kochis. "Preproduction code was sent to production servers. The production servers had not yet been upgraded with a recent change to enable stronger encryption/decryption of product keys during the activation and validation processes. The result of this is that the production servers declined activation and validation requests that should have passed."

Microsoft's anticounterfeit measures come in two flavors: activation and validation. The former requires users to enter a valid 25-character product key to prove they've paid for a license; the latter is the term used for all subsequent proof-of-purchase demands and engages, for instance, before users are allowed to download most software from the company's Web site.

The problem affected both the activation and validation servers, but while a quick rollback solved the activation servers' problems within 30 minutes, according to Kochis, it failed to reset the validation servers. "We now realize that we didn't have the right monitoring in place to be sure the fixes had the intended effect," he said.

From the timeline he offered up in earlier postings, the failure started on Friday, Aug. 24 about 6:30 p.m. EDT. It's unclear how long Microsoft was unaware of the problem, although it was presumably measured in hours rather than minutes. "Through a combination of posts to our forum and customer support, the issue was discovered by [Friday] evening," Kochis said. By 2:15 p.m. Saturday, Aug. 25, the servers were again validating Windows correctly. The total time of the malfunction: 19 hours, 45 minutes.

Kochis also took pains to set the record straight about how the problem had been characterized. "It's important to clarify that this event was not an outage," he said. "This event was not the same as an outage because in this case the trusted source of validations itself responded incorrectly."

Contrary to what most users believed when they lit up Microsoft's support forums Friday night and Saturday, Kochis said PCs running XP or Vista automatically default to "genuine" -- the term Microsoft applies to machines running legitimate copies of its operating systems -- if Microsoft's validation servers are off-line. "In other words, we designed WGA to give the benefit of the doubt to our customers. If our servers are down, your system will pass validation every time."

Our Commenting Policies