Case study Q&A: Genome sequencing demands high-performance storage infrastructure

Kelly Carpenter is senior technical manager for the Genome Sequencing Center at Washington University in St. Louis. During a recent interview with SNW Online, Carpenter described his complex storage networking environment and how it meets the high-performance requirements of genome sequencing.
Describe your storage networking infrastructure in terms of products and vendors. On the SAN side, all of our disk is Fibre Channel. We have 18TB of Clariion 4700 disk on one Fibre Channel fabric and 22TB of STK D280, which is LSI controller-based, the same as IBM FAStT. And we have 12TB of Hitachi 9980 on another Fibre Channel fabric, and two STK L700 libraries with 20 SDLT 320s — 256TB total capacity with our data compression characteristics — in each on a separate tape fabric. The Hitachi 9980V and STK D280 are in the same Fibre Channel fabric because they're compatible. The Clariions are on their own Fibre Channel fabric because we had to for EMC to support it, no other reason.

The Fibre Channel SAN is connected with 16-port 2Gbit Brocade Silkworm 3800s. Each server has two 2Gbit connections per fabric and has only a single connection to the tape fabric, since the software and hardware can't handle fail-over well yet. The SAN disks are hooked up to four Sun V880s that are clustered with Veritas Cluster Server. This storage is served out to the client computers via NFS.
We also have an instance of Oracle/Veritas RAC connected to 12TB of Hitachi 9980 with four 2Gbit connections into a Fibre Channel fabric. This Oracle RAC cluster is running on two quad-CPU, 64G-byte RAM Sun V880s running Solaris. The SAN fabrics were designed with the help of Datalink.

What kind of NAS do you have in place? We have around 50TB of NetApp, mostly on 980s that are clustered. There are 50TB of BlueArc Titan 32, and I believe it's 30TB of SATA and 20TB of Fibre Channel.

The NetApps are clustered so that each head has 3Gbits channeled together for certain subnets, and 1Gbit for another subnet, so it guarantees that the people with the 3Gbit connection can't swamp the whole box. The BlueArc has currently 4Gbits all channeled together in one big pipe for people to access.

You're putting those SATA drives into pretty stressful conditions, and they're not known for excelling under those circumstances. We haven't hit it super-duper hard in production. We have done it, and it works. We've used the SATA disks mostly as backup for the Fibre Channel side of the BlueArc. One of the cool things about this BlueArc box is that it can do NDMP backups, but you don't have to go to the network. It can do that, but it can do an NDMP backup internally over the backplane from your Fibre Channel disks to your SATA disks.

What unique storage networking needs do you have, and how are you satisfying them with BlueArc? In terms of stuff on the SAN, we have pretty good performance. It's kind of like we have a SAN for NFS stuff and a SAN for Oracle because of the different classes of disk that you're servicing. You have $100,000 a terabyte as it started out, like with the Hitachi, and then it tiers down to like $20,000 a terabyte for Clariion. And it's working out OK, but the Suns serving the storage on the SAN currently have only a single-gigabit Ethernet in them.

We could put more than a gigabit into the Suns, but we haven't done that because until a couple of years ago, we didn't have any NAS in-house at all, and we've been testing Network Appliance since 1997. In terms of performance features, NAS had almost everything we wanted. A couple of years ago, NAS hit the threshold — there was absolutely nothing we wanted to do that NAS couldn't do — so we started with some Network Appliance, and we had good performance with them.

The bottom line is, NAS now actually has the performance and is much easier to set up and provision, as well as easily add additional storage and network bandwidth. We also look at how the NAS is going to behave when running at maximum capacity — does it behave strangely? Which way are you going to saturate the NAS — the NAS head with IOPS, the disk array with a huge number of files or streaming very large files, or are you just filling up the available network capacity?

How much capacity is left when the NAS is busy handling your actual workload? If you're going to the point where you're saturating 3G to 4Gbits of traffic, depending on what the traffic is, and how many IOPS, you may or may not have any CPU left on your NAS head to do anything else. From my own personal observations, the BlueArc had more CPU capacity remaining at network bandwidth saturation than NetApp.
Our peak workload here is — when you're hitting storage with several hundred blades at a time, you can at least slow storage performance down or melt even fast storage. The BlueArc Titan handles this situation very well.
Currently, NetApp has great storage, and it's very quick; however, BlueArc happens to be faster. And they are both very scalable, but BlueArc is currently more scalable.
Here's the other thing: When we're buying stuff, I'm always Mr. Negative. If we end up getting lemons, how do I make lemonade out of it? We nearly bought $2 million worth of Alphas when they announced the Compaq merger. Not that the Alpha would have instantly gone away, but what was the long-term corporate strategy then? Well, the BlueArc uses Engenio storage, or its LSI-based controllers, as on the StorageTek D280, which we have, and IBM FAStT, which we have, so worst case, if BlueArc ever went out of business, we could redeploy that storage somewhere else in our SAN and wouldn't have any problems.

How were you first introduced to BlueArc? When we were taking a look at getting some more storage, we saw BlueArc at Supercomputing 2003, and it looked like they had a better idea in doing it through hardware. We generally are early adopters of new technology only if this new technology would benefit us greatly. We generally are pretty bleeding-edge, and we tested a whole lot of things that are really brand-new out-of-the-gate — Gigabit Ethernet, Fibre Channel on SAN, wireless, Itaniums, Opterons, blade servers, dual cores, whatever. Overall, this strategy has helped us to continually scale our capacity here and keep at the front of the pack for what we do.

Let's back up and set the stage a little bit for the application. You're doing genomic sequencing. Traditionally, when you've done genomic sequencing stuff, there are quite a few small free applications that people use, stuff like BLAST, things like Hammer, things like Phred and Phrap — the obviously vertical-market stuff for genomics.

We serve our storage out to clients via NFS to do our work. This is easy and effective for us. Most of the time, what people have done in the past — because it was easier to do, and understandable — was to write things out in bunches of small files. So we literally had millions and millions of files on a file server. The most we had was 400 million files. One of the problems when you're dealing with all these tiny files is that backup is a horrible thing, especially if you're doing something like NDMP, which is a file-by-file backup. This can translate into being in a constant 24-hour backup! This is true of anything that uses NDMP for NAS backup.

How do you get around this problem? Solutions include backing up with snapshots, which take up disk space but take almost no time — very nice — or backing up from the high-performance Fibre Channel disk to inexpensive SATA disk with NDMP — which is pretty fast — and then NDMP the inexpensive SATA disk to tape. If the SATA disk is for backup and not in production, you can back this up during business hours and not take a hit on performance. If the window does exceed 24 hours, you can then adjust things to fit with what you can actually do and still be protected.

So things like snapshots come into play, or if it's on a Veritas Volume Manager volume, you have to use something like Veritas Net Backup with a Flash backup option.

So we have millions and millions of small files. What we have tried to do is re-architect stuff so that there are fewer files. It's not as big a deal, and we can have the data go straight into a database, although there's still some stuff that's put into a file system, and you can basically run a script to take stuff out of the database and put it back into files — like it would be on a file system — to work with it. Previously, we would keep all these files online until the project was finished, and then archive them off. Well, now, with what we've been working on, like the chimpanzee and other genomes, they wanted to have access to all the files created during the process during the whole time. So, it was an ever-growing number of files. But we put more stuff in the database, we kept the number of files down, and we're in the hundreds of millions now, but we're not 400 million.

How many users do you have accessing your network, and how widely are they distributed? We have an FTP site where there is some data that people can FTP down. We have in the area of 275 local users, and probably another 50 from around the world who have accounts that get in and access the data and work on it locally on our network. Externally, we have a big pipe around 90M bits into our network. We've got great internal network bandwidth, and we have big switches, Cisco 6500 series with multigigabit channels connecting them.

Do you have any significant volumes of e-mail? Our BlueArc is not involved in our e-mail. We do have a fairly high volume of e-mail, but we have just re-architected our mail. Richard Wohlstadter did it. He split our mail into three different distinct servers to handle the load, including an MTA Postfix server, a Cyrus IMAP server and message store, and a SquirrelMail Web-based e-mail server. Our mail is actually stored on a direct-attached Fibre Channel Clariion CX300 on the Cyrus IMAP server and message store server.

What other vendors did you consider besides BlueArc, and why did you reject them? We already had Network Appliance in-house. Competition is always good if we have to negotiate later on. We looked at NetApp, Panasas, IBM, EMC and DataDirect. When you're evaluating theoretical good ideas, it's like if everyone put together their ideas, it would be a pipe dream. But the way that BlueArc did it in hardware was a monster leap, much like what NetApp did with NFS years ago.

Copyright © 2005 IDG Communications, Inc.

7 inconvenient truths about the hybrid work trend
Shop Tech Products at Amazon