Fishing for Better Big Data Insights with an Intelligent Data Lake

Fishing in a lake and a data lake are much the same. Data scientists must not only go where the fish are for big data insights, but also find a way to quickly build the data pipeline that turns raw data into business results.


I learned at an early age when fishing with my buddies that it doesn’t matter how good of a fisherman you are—you’re not going to catch anything if you’re not fishing where the fish are. This same bit of advice extends to data lakes. 

Not even the best data scientists in the world can find insights in data lakes that are nothing but data swamps. But that’s what most data analysts are using today—swamps filled with databases, file systems, and Hadoop clusters containing vast amounts of siloed data, but no efficient way to find, prepare, and analyze that data. 

That is why Informatica introduced the Intelligent Data Lake (IDL) to provide collaborative self-service data preparation capabilities with governance and security controls.

Last year, Informatica launched Big Data Management v10, which included Live Data Map (LDM) to collect, store, and manage the metadata of many types of big data and deliver universal metadata services to power intelligent data solutions, such as the Intelligent Data Lake and Secure@Source. IDL leverages the universal metadata services of LDM to provide semantic and faceted search and a 360-degree-view of data assets such as end-to-end data lineage and relationships.

In addition to smart search and a 360-degree-view of your data, IDL provides analysts with a project workspace, schema-on-read data preparation tools, data profiling, automated data discovery, user annotation and tagging, and data set recommendations based on user behavior using machine learning. These capabilities make it much easier for analysts to “fish where the fish are” for big data insights.  

In order to “land the fish” and turn these insights into big value, there needs to be a way to quickly build the data pipeline that turns raw data into business results. IDL does this automatically by recording all the actions of a data analyst as they prepare data assets in what is called a “recipe.” These recipes then generate data pipelines (called mappings in Informatica) that IT can automatically deploy into production. What better way to turn insights into business value and fry up those fish you just caught?

If you want to see how an Intelligent Data Lake works through live demos, please visit our booth #1321 at Strata + Hadoop World San Jose, March 29 – 31. And, if you can’t make it in person to Strata Hadoop World you can view IDL demos online during the event at http://www.informatica.com/bigdataready#demo. Good luck fishing for big data insights!


Copyright © 2016 IDG Communications, Inc.