Is big data a big drain on your network?

There's a lot of talk right now in the industry about big data and the Business Intelligence (BI) applications that are being used to wrangle it. However, very few people are talking about the impact that big data can have on the network.

Before we get started, let's discuss what big data is. Effectively, we've engineered ourselves into a problem. For many years we've wished that we had more data to use for analysis. This is true for use cases across many fields - medical sciences, marketing, chemical engineering, and network engineering.  So, over time we've developed technology and applications that do just that - lots and lots of them and we've taken those applications and found ways to enhance them to create even more data. Cheap storage systems, both in our own datacenters and in the cloud, coupled with distributed computing systems like Hadoop and what many of us create using virtualization, have encouraged us to keep very large chunks of this data.

For years now we've had to carefully schedule dumps so that the network traffic that they create doesn't interfere with production applications. This is why best practices dictate doing all of your database backups and migrations during off hours. However, now that big data has entered the picture, not only are thes dumps massive but dumps aren't our only problem. Remember, we wanted this data for analysis so someone is going to be using a business intelligence application to pull together, massage, normalize, and visualize all of that data. Chances are that this analysis will be done during business hours and that some of the data they will need to be analyzed will reside in the cloud. This creates some interesting problems for us network administrators.

We have to remember that the three most significant factors that will affect performance are processing power, the size of the dataset, and the performance of the network. This just seems to me like another area where we may find ourselves playing the blame game between the network team, the database team, the application support team, and the server infrastructure team if we're not really careful

So, as a network engineer how do you plan for and deal with big data? First off, you're going to need some big data of your own. For most areas of the network you're going to need to implement some sort of flow-based traffic analysis. For super critical areas you're probably going to be doing some Deep Packet Inspection (DPI). You're also going to need to be sure that you're a part of any big data project so that you can update network designs and configs as necessary. This may include upgrading bandwidth, implementing WAN acceleration technology, or doing traffic shaping.

Big data offers us insights that we've never had before and some of the possibilities that it opens up are truly amazing. Just be sure that you stay in front of it because at the end of the day how the network is performing is really less important than how the users perceive that it is.

Flame on...


Josh Stephens is the founder and CEO of Bearded Dog, an Austin-based strategy consulting and development company, specializing in tech innovation and IT management best practices. Follow Josh on Twitter @josh_stephens.

Copyright © 2012 IDG Communications, Inc.

It’s time to break the ChatGPT habit
Shop Tech Products at Amazon