Brain behind IBM's Watson not unlike a human's

Like humans, Watson only uses a fraction of its memory to generate answers to Jeopardy questions

1 2 Page 2
Page 2 of 2

The clustered storage model provides massive throughput because of an increased port count that comes from cobbling many storage servers together into a single pool of disks and processors all working on a similar task and all able to share the same data through a single global name space. In other words, all of the disk drives appear as one big pool of storage capacity from which Watson can draw.

Watson's SONAS is populated with 48 450GB serial ATA (SATA) hard drives for a total of 21.61TB of capacity in a RAID 1 (mirrored) configuration; that leaves 10.8TB of raw data that is used by Watson every time it's booted up. Three terabytes of that, however, is used for the operating system and applications.

But it's not SONAS's disk-based storage that makes Watson so darned fast; it's the CPUs and memory. Every time Watson boots, the 10.8TB of data is automatically loaded into Watson's 15TB of RAM, and of that, only about 1TB is scanned for use in answering Jeopardy questions, Pearson said.

In case you're wondering, 1TB of capacity is still quite significant; it can hold 220 million pages of text or 111 DVDs.

"The amazing thing is that you can get all those answers with such a small data set," said John Webster, an analyst with the research firm Evaluator Group. "After multiple iterations of loading and testing and loading and testing and updating the database on SONAS, IBM came up with a version of the database that would generate the data set that got loaded into memory."

Enter Australian computer programmer and SAMBA developer Andrew Tridgell.

Tridgell created the computer algorithm running on top of Watson's hardware that culls out the data set. Tridgell developed the open-source Clustered Trivial Database (CTDB), which the SAMBA file protocol uses to simultaneously access the memory across Watson's 90 servers.

More importantly, the CTDB makes sure none of the servers are stepping on each other as they also update information after a Jeopardy show.

During the show, Watson is read-only - meaning nothing gets written to its backend SONAS. After the show, Watson is powered down and the computer scientists go to work updating information and debugging it -- trying to figure out why it gave erroneous answers, such as choosing Toronto as the answer for a question about an American city.

"I'm sure they're scratching their head on that one," Pearson said.

Lucas Mearian covers storage, disaster recovery and business continuity, financial services infrastructure and health care IT for Computerworld. Follow Lucas on Twitter at @lucasmearian, or subscribe to Lucas's RSS feed . His e-mail address is lmearian@computerworld.com.

Copyright © 2011 IDG Communications, Inc.

1 2 Page 2
Page 2 of 2
 
Shop Tech Products at Amazon