When it comes to file systems, scale is the enemy, according to Andres Rodriguez, CEO of Nasuni. And the best weapon in the battle for scale is the cloud. Nasuni claims to have developed the first cloud-native file system, delivering not only virtually unlimited scale in the cloud but rapid access to files from locations around the world.
Instead of deploying more and more on-site storage – and dealing with costly, painful upgrades – Nasuni delivers a virtual machine (or appliance) that handles local needs while using the cloud for the heavy storage lifting. You buy capacity, not boxes, and that makes it easier to budget and plan for growth, per Rodriguez.
In this installment of the IDG CEO Interview Series, Rodriguez spoke with Chief Content Officer John Gallant about how the Nasuni UniFS file system works, how customers deploy it and what kind of savings and flexibility they can expect. He also talked about what it means for your existing EMC and NetApp systems, and how those competitors are responding to the Nasuni challenge. Rodriguez also explored why it’s not easy to try to replicate what Nasuni does on your own.
What does Nasuni do and why do you do it?
We handle all file workloads for enterprise customers. I started the company because I was the CTO of the New York Times and we had very big file problems that were getting worse. I felt, 15 years ago, the architecture for file storage was going to run out of steam. So I set out to build. In most businesses, files are the work product. If you’re talking about an architecture firm or an engineering firm, you’re talking about design documents. If you’re talking about a media firm you’re talking about movies they may be working on. If you’re talking about a software development house you’re talking about the software itself. That’s all unstructured file data.
I felt that there was a looming problem of scale around file data. The architectures of the time did not scale to what I thought file systems were going to have to reach to handle future workloads. The files were getting bigger and everything was being captured in digital form in some file format or another and there wasn’t an integrated approach to addressing that issue.
Nasuni asks: Where are your files? What kind of pain are you having around files? It’s typically: We’re having a hard time storing them because the scale is killing us. We’re having a hard time protecting them. (That’s a byproduct of scale as well because the backup systems are getting too large.) We’re having a very hard time moving these files around. The files are so big and they need to be accessed from so many places around the world and at various degrees of performance. Bad performance is always easy. It’s getting great performance around the world that’s really difficult.
We help clients address that and we do it all from an integrated platform. We’ll take any file workload. We’ll take your Office documents, your design files, your movie files. Every file workload you have, we’ll take it off the traditional storage system but we will give you something that is fully compatible to anything above the file layer - all the applications, all the security access control systems – and it will scale forever. The protection will be integrated in the system and you will be able to move and access these files around the world at any level of performance, no matter how high.
We do it all as an integrated capacity license so that a CIO can look at this and say: We’re getting a 100TB license from Nasuni this year. Next year we need another 100TB from Nasuni. It’s incredibly clean and predictable, like the way you would budget for and purchase something like Salesforce or a SaaS application. It’s not done by selling you more boxes at your data center, it’s done as a service.
As I understand it there are two critical components to the Nasuni offering. One is an onsite device and then there’s the cloud storage capability that you offer. What does the onsite device do?
A good analogy for this is the Nest thermostat. The Nest thermostat serves two functions. The first is it needs to be an awesome thermostat. It needs to be able to control your HVAC system. That on-prem appliance is essentially doing the job of an equivalent thermostat, which - in the data center for files – is a network-attached storage device or NAS appliance. When you put one of our appliances in your data center you ensure complete compatibility with anything that was attached before to your file systems. It will attach to Active Directory, DFS. It will have NFS, all of the file verticals that exist in the data center will be part of what this device can talk to.
The magic then happens inside the device, just like the Nest thermostat, as it creates the file system that’s going to live in the cloud. With Nest, the magic is that you can control your thermostat remotely. The magic is that the thermostat is learning because there is intelligence in the cloud that is figuring out when you are out and when you’re not. Our file system resolves all of the issues that have been plaguing the data center in file systems for decades by removing the file storage from the data center.
We’re creating this massively scalable file system in the cloud that can scale forever. It is completely versioned and protected and you can access it from anywhere in the world from other appliances that look just like the appliance you first deployed in your data center. The cloud is where all the heavy lifting is actually getting done but you need this local appliance that sits in your data center so you can talk to the things that already exist in the data center and have the performance levels that are required in the data center.
How do you ensure performance where you have the local device but the files really are stored up in the cloud?
The wonderful thing about file systems is that they can be improved a great deal with caching. When I was looking at this problem, my biggest realization was that most file systems in the enterprise are largely unused. Most of the data that accumulates in big companies is almost never touched. Companies are typically working on a high-performance edge of that data called a working set, which can be as little as 5% of what a company is actually storing. This data is incredibly active, incredibly high performance, read, write all the time.
We leveraged that observation to build cache into the appliance that goes into the data center. The single most important thing those appliances are doing is, essentially, continuously evicting, continually getting rid of the data that’s not being actively used. All of the data moves to the cloud and what remains local in the data center is that very-high-performance layer that needs to be accessed by applications locally. There is an incredible leverage. That leverage is as dramatic as a 5% to 95% ratio in terms of what you need in your data center versus what you could have stored in the back end.
Under circumstances where, for instance, you need that data accessible in multiple locations around the world, you can synchronize the caching in the devices so that everywhere in the world they’re looking at the same working set. You can play a trick around globalizing the cache across multiple geos. When that data is required and say it doesn’t exist there, you also have an advantage because file systems - unlike traditional databases - have built-in resiliency against latency issues. Other than the fact that you have to wait, the applications will not produce errors or hang because a file needs to be streamed from the back end.
Think about the way movie streaming has changed for consumers: It’s the same principle. The old TV streaming devices had tons and tons of storage in them and they were terrible to use because the movies had to be downloaded before you could watch the movies. All modern streaming devices use the fact that movies have been optimized for download streaming and then they have a tiny little cache that allows you to basically feel like your movie is there even though it’s being streamed from the cloud. That’s a very narrow use case. We typically have larger caches than that and account for the fact that you need to be able to read and write over the files.
+ MORE FROM RODRIGUEZ: How to use Cloud Integrated Storage to support the distributed enterprise +
But that experience is basically the experience that people get when the files don’t exist in the local appliance. The device pauses before it gives you the first bit but as soon as it gets it, it just begins streaming like crazy because we’ve done all the work necessary before we put it in the cloud to compress the files, deduplicate them and make them very stream-friendly. They can just shoot right back to wherever they’re needed. That helps the system overall perform better.
Putting a price on Nasuni technology
How does somebody buy it? How do you price this?
One of my biggest frustrations when I used to buy storage was having to deal with all the nickel-and-diming features of the storage company. When I started this company I decided we were going to take a really straightforward approach. Basically, I’m selling you capacity and integrated protection. Protection is really two parts. I’ve versioning my file systems so that customers - say they get attacked with ransomware - can always go back to pristine versions of their file systems that are untouched by the malware attack.
The other thing is you need to be able to replicate the back end. We use the cloud to replicate the data asset. When we create this file system in the cloud it is one logical file system, if that’s what the customer wants, but it is many, many physical file systems that are distributed throughout our partners, cloud providers, companies like Microsoft with Azure and AWS with S3. Then we allow you to access that file system from many, many locations around the world. You want two locations? That’s great. You want 20 locations? That’s great too. We charge for one thing in that whole equation: How much usable capacity is in the file system. We don’t charge you for the replicated copy. We don’t charge you for how many points of access you have to the file system and we don’t charge you for how many versions you want to keep in the file system.
That’s truly revolutionary. Most every storage company, because they were selling you storage in a box, thinks they were charging you for raw storage capacity and then you were compromising because you wanted to keep a thousand versions of your file system to be able to go back in time. We give our customers infinite versions of the file systems and we don’t charge them anything extra for it. Most of our clients are operating in an infinite retention mode, meaning if they’ve been our clients for five years they can go back in time in five-minute intervals for five years and restore at any point in time a file, a directory structure, a complete file system. That whole equation is licensed just on the usable capacity, just on how much storage the users or the applications have direct access to.
That pricing includes the cost of the device onsite as well?
That device is an edge appliance and it’s a virtual appliance which runs on Hyper-V, on VMware. It can also run - and this is important for the access part of the equation - natively in AWS or in Microsoft Azure. That means you have access to your file systems wherever you want them. This is one of the big transformations that is happening now with the cloud. Customers want to be able to have their data in a kind of data Switzerland. They don’t want the data to be captured in any one provider because they want to bring the best-in-class services to that file data.
If you want to run a VDI environment or if you want to do transcoding for movies you may want to have access in AWS or Azure or in other providers that are specific to those services. You’re able to run on virtual appliances that are native to those environments and have full access to your file system even if your file system is hundreds of terabytes or tens of petabytes large. Regardless of the size of the file system, I want to be able to access that file system from anywhere, from any service provider.
The virtual machines allow you to do that and those are licensed at no additional cost. If you want a bare metal, dedicated appliance we have an OEM partnership with Dell where, basically, customers can buy a Dell server that comes preloaded with our software for data center. Our goal there is like a cable company. We want to give you the highest quality hardware appliance at the lowest cost because that’s going to make you consume more of the service. You’re going to be a happier client. It’s a completely non-hardware model for how you pay for this. What you’re really paying for is the subscription service based on the capacity. That’s how my sales reps are compensated. That’s how the company makes money and that’s how we add value to our customer.
Andres Rodriguez, CEO of Nasuni
Are there situations where this solution is really applicable and are there situations where it’s not as applicable? Are there any specific instances where this is not a great solution for a customer problem?