GRAPEVINE, Texas -- When it came to managing most of AOL's 6 petabytes of data, a Fibre Channel SAN sufficed. But for its most critical relational database, AOL found that the SAN was too constrained and caused its IT shop to fail to live up to its service level agreements with business units more than 50% of the time.
After investigating what may have been causing I/O bottlenecks, AOL found the problem was back-end storage. To fix the problem, AOL decided to build a 50TB storage-area network (SAN) from solid-state technology.
The upgrade worked, delivering four times more throughput to the SQL database than the Fibre Channel SAN was capable of, while still giving storage admins the flexibility to migrate data between storage systems because the NAND flash memory sat behind an existing virtualization appliance, which aggregates all of the back-end storage and serves it up as if it were from a single pool.
AOL engineers found that the throughput problems didn't appear to be based on hardware. The company has five large high-end arrays with 15,000- or 10,000-rpm drives for primary storage and two lower-end arrays with serial ATA (SATA) drives for nearline backup. But the storage arrays could only feed data to internal drives as fast as the SAS backbone allowed, which was about 6Gbit/sec.
AOL's storage infrastructure supports about 4,000 servers that feed information to both online users and to the company's own back-end applications.
Dan Pollack, senior operations architect at AOL, said he considered adding solid-state drives to the company's existing storage arrays, but decided against it because they would be held back by the array's controller bandwidth.
"So you find you can take a high-performance SSD device and half the performance is lost at the head of the array," Pollack told an audience at the Storage Networking World conference here this week.
Pollack also considered using solid-state drives in servers, but didn't because he would not have been able to cluster them into his SAN, they don't offer the required capacity, such drives could not be nondisruptively swapped out and data couldn't be migrated between systems.
Pollack settled on a solid-state array from Mountain View, Calif.-based Violin Memory. It could plug directly into his Fibre Channel network, and it could sit behind his storage virtualization appliance from YottaYotta, which was purchased last year by storage giant EMC.
Pollack said the rollout, including planning and testing, took only eight weeks. The system has been live since July 1. Since then, the SSD array has had zero downtime and it has allowed his shop to live up to its SLAs with business units without exception. I/O response times are typically less than 1 millisecond and there was no impact in his IT team's ability to manage the back-end storage.
"It's very easy to fall in love with this stuff once you're on it," he said.
But love can be expensive. Without offering an exact price tag, Pollack said the solid-state array cost AOL about $20 per gigabyte, which adds up to about $1 million with 50TB of capacity.
Pollack said that money was less of an obstacle than the alternative -- throwing more manpower at the database problems.
The flash array has RAID 5 internal protection, so the drives are hot swappable. For added protection, Pollack configured the "preliminary" installation so that it mirrored data across two clusters of Violin appliances -- with each cluster made up of six Violin appliances.
"Because this is all new and unproven, and this is a Tier 1 heavily visible application, we felt it was prudent to spend the extra money and time and provide that additional protection. In the future we won't do that," he said.
And, while the array didn't offer the 1 million-plus I/Os per second (IOPS) that vendor marketing material boasts, it does come in at around 250,000 IOPS, which Pollack said is more than enough for his purposes.
There were also significant cost savings associated with the flash memory, he added.
For one, Pollack said, Fibre Channel arrays often only use about 10% of the capacity in their top-tier drives because storage admins often short-stroke the drives, using only the outer sectors of drives to increase I/O response times. That can eat as much as 20 kilowatts of power per array, he said.
In comparison, the flash array only uses 2 kilowatts and 90% of its capacity is utilized. On top of that, the Violin storage array takes up less floor space and produces less heat than the previous setup, and that helps to reduce power required to run HVAC systems.
Violin doesn't use NAND flash in hard drive form factors, as many solid-state companies do today. Instead it uses flash chips directly on cards called Violin Intelligent Memory Modules (VIMM). VIMMs are similar to DIMMs, only they' built out of flash instead of dynamic RAM.
"So you're getting the 4GB/sec. of PCIe bandwidth, not the 5Gbit/sec. or 6Gbit/sec. SAS bandwidth. You're getting almost an order of magnitude of bandwidth to the storage internally just because you're using an interface that's capable of it," Pollack said
Lucas Mearian covers storage, disaster recovery and business continuity, financial services infrastructure and health care IT for Computerworld. Follow Lucas on Twitter at @lucasmearian, or subscribe to Lucas's RSS feed . His e-mail address is firstname.lastname@example.org.