Skip the navigation
Hadoop
Continuing coverage of Hadoop Hadoop RSS

Hadoop gets real
Its fast and robust data processing and storage power make Hadoop both wildly popular and wildly complex. Here's how four IT leaders have managed to bring Hadoop systems from the sandbox into production.

Apache Hive brings real-time queries to Hadoop
Hive's SQL-like query language and vastly improved speed on huge data sets make it the perfect partner for an enterprise data warehouse

Hadoop's success drives efforts to make it more secure
Talk about big data and it won't take long for Hadoop to appear in the conversation. The Apache open source software is used to orchestrate clusters of commodity computers to crunch information from mountains of data.

Gazzang buy provides end-to-end encryption for Cloudera Hadoop
Cloudera will incorporate technology from its acquisition of encryption software provider Gazzang into Apache Hadoop so that industries with stringent security regulations can use the big-data processing platform.

10 Hot Hadoop Startups to Watch
As data volumes grow, figuring out how to unlock value becomes vastly important. Hadoop enables the processing of large data sets in a distributed environment and has become almost synonymous with big data. Here are 10 startups with solutions for unlocking big data value.

Big data funding spree continues
Venture capital firms continue to funnel big sums of money to big data startups. Hadoop player Cloudera announced $160 million in new financing, and analytics startup Platfora raised $38 million.

Agile Comes to Data Integration
Informatica is aiming to ease the pain of data integration with a new platform designed to allow businesses to rapidly prototype and validate before sending projects to development.

MapR's New Hadoop Distribution Promises No-Risk Updgrade
MapR's latest Hadoop distribution includes support for Hadoop 2.2 with YARN, but is also backward compatible with the MapReduce 1.x scheduler, promising organizations a risk-free upgrade path to the latest Hadoop architecture.

How Many Data Scientists Does the World Really Need?
The buzz is all about 'Big Data' and how best to use it to generate actionable intelligence. To do this, companies will need to hire loads of highly trained, highly paid data scientists -- or will they?

Enterprises Confident About Tackling Big Data Initiatives
CompTIA's Big Data Insights and Opportunities study finds that a majority of organizations feel more positive about big data as a business initiative. They also see significant costs associated with falling behind in managing and using data.

IT careers: Does it pay to become a brand specialist?
Hitching your wagon to the latest 'it' technology can lead to lucrative pay and compelling job opportunities, but it's not without risk. dBase developer, anyone?

Splunk woos Hadoop users
The need by corporate IT operations to enable easier interaction with massive -- and fast growing -- data sets in Hadoop environments is driving a flurry of vendor activity.

Teradata expands Hadoop support
Teradata's enterprise customers have a fresh set of options for integrating Hadoop into their environments.

Lessons Learned at Invite-Only Performance Testing Conference
Starting with data and using it to get to the heart of the matter isn't the way IT conference sessions go. Then again, the invitation-only Workshop on Performance and Reliability isn't your typical IT conference. The real value is what happens when it's 'open season' on the presenter and the real 'sense making' can begin.

How big data will save your life
Big data analytics is creating a world where doctors will eventually be able to do a Google-like query on a patients illness and instantly discover how 100,000 other doctors treated their patients. It's also driving new treatments through genomic profiling.

5 strategic tips for avoiding a big data bust
Failed expectations, increased costs, unnecessary legal risks -- going blind into a big data project doesn't pay

Forget big data, the value is in 'big answers'
The CTO of Barack Obama's reelection campaign says we need to stop worrying about big data and turn our attention instead to the 'big answers' the data provides. EMC, which made big data news this week, is one tech company that plans to translate big answers into big customer wins.

Top 12 Big Data Stories of 2012
The holidays are here and 2012 is on its way out, ending a huge year for Big Data. It's time to reflect on the most popular Big Data stories and tips of the year.

Cloudera CEO: We're taking Hadoop beyond MapReduce
In an exclusive interview, the voluble CEO of Cloudera, Mike Olson, holds forth on the company's new Impala project and the boundless potential of Hadoop.

IT shops will become consultants instead of tech managers, says EMC's CIO
An EMC executive said customers should have a sense of urgency about deploying IT as a service via a cloud infrastructure, but not everyone at the data storage company's user forum agreed with him.

SAS extends analytics support for unstructured data
SAS Institute this week unveiled tools it says makes it easier for its enterprise customers to use the company's business analytics software to analyze data stored in Hadoop environments.

Beware of BI vendor hype about Hadoop
Many BI vendors claim that their products support Hadoop, but Forrester says customers should find out what that support really entails.

Forrester: Push BI vendors for details on Hadoop integration
Enterprises should ask a lot of questions when a vendor touts its business intelligence products as being fully integrated with Hadoop, Forrester analyst Boris Evelson warned.

Big data, big jobs?
As companies embrace big data, they're in the market for high-level strategists and communicators. Do you have the chops to snag a big data job?

Hadoop becomes critical cog in the big data machine
As more and more companies use Hadoop to handle big data, anticipation for forthcoming Version 2.0 grows

Ease Big Data Hiring Pain With Cascading
Finding developers with the skills to create MapReduce jobs in Apache Hadoop is challenging, but you can ease that hiring pain with Cascading, an open source Java application framework for building enterprise Big Data applications on Hadoop.

Yahoo's Genome highlights hosted big data analytics trend
Yahoo joined a growing list of companies offering big data analytics as a service with its Genome offering this week.

Enterprise BI models undergo radical transformation
There's a dramatic transformation going on in business intelligence practices at many companies, prompted by growing interest in analyzing large, diverse data sets and better tools for completing such tasks.

Investors are pouring funds into big data
Surging enterprise demand for big data tools that can manipulate and analyze massive volumes of structured and unstructured data has caught investor attention in a big way.

Best Practices for Selecting Storage Services for Big Data
Big data is fueling the need for ever-growing storage repositories. If you're looking to meet scalability concerns without breaking the bank, selecting a storage platform that can meet the needs of big data can be a challenge--but it doesn't have to be an overwhelming one.

How to Use Hadoop to Overcome Storage Limitations
Big data is all about storing and accessing large amounts of structured and unstructured data. However, where to put that data and how to access it have become the biggest challenges for enterprises looking to leverage the information. If you haven't yet considererd the open source Hadoop platform, now's the time.

Shell Oil targets hybrid cloud as fix for energy-saving, agile IT
To address soaring data storage and major power consumption issues, Shell Oil has turned to the public cloud to become more agile in deploying application and development services.

As 60th anniversary nears, tape reinvents itself
Digital tape media will turn 60 in May, and while tape sales are on the decline, new open file specifications like LTFS and new markets could revive tape for the long term.

How to get a hot job in big data
The big data revolution is creating a new breed of business-IT jobs -- and threatening to destabilize dyed-in-the-wool IT careers

Can big data nab network invaders?
The buzz in security circles about "big data" goes something like this: If the enterprise could only unite its security-related event data with a warehouse of business information, it could analyze this Big Data to catch intruders trying to steal sensitive information.

Get Hadoop certified ... fast
IT professionals are scrambling to get trained and certified in what's expected to be the hottest new high-tech skill for 2012: Hadoop.

Supersize me: Hadoop upgrade will handle even bigger data
Apache aims to have the upcoming 0.23 release this year be able to run on 6,000-node clusters

Spring Java developers get Hadoop integration
VMware's Spring Hadoop offers link between Spring development framework and Hadoop distributed processing platform

Look Before You Leap Into Hadoop
Analysts and early users warn that companies should move slowly if they want to take advantage of the open-source Hadoop technology, noting that it requires extensive training along with analytics expertise not seen in many IT shops.

Zettaset to offer role-based access control for Hadoop
Zettaset, which makes tools for managing big data, has unveiled its SHadoop security initiative to help companies better control access to data in Hadoop.

Teradata partners with Hortonworks on Hadoop
Growing enterprise interest in Big Data analytics is beginning to drive partnerships between vendors of traditional relational database management technologies and purveyors of Apache Hadoop.

What's the big deal about Hadoop?
Hadoop is all the rage, but it requires expertise that's beyond the ken of many IT shops, customers say.

Hadoop wins over enterprise IT, spurs talent crunch
Hadoop is coming out of the shadows and into production in enterprise IT shops. But the relative newness of the open-source platform and a shortage of experienced Hadoop talent pose hurdles.

Your Big Data To-Do List
Ready or not, big data is coming. Here are 5 things IT managers can do today to prepare for the data deluge of tomorrow.

2012: The year storage becomes a celebrity
This promises to be a break-out year for storage technology with the use of more NAND flash in devices and smarter storage that can be tailored to applications.

CIA-backed Cleversafe announces 10-exabyte storage system
Object-based storage vendor Cleversafe today unveiled a storage system that can hold 1 billion gigabytes of data under a single domain name.

Oracle Move Could Push Rivals Toward Big Data Bundles
The shipping of Oracle's Big Data Appliance earlier this month could pressure major rivals like IBM, Hewlett-Packard and SAP to come up with Hadoop offerings that tightly bundle hardware and software products, analysts say.

RainStor launches Hadoop version of enterprise database
Online database repository provider RainStor unveiled what it is calling the industry's first enterprise-class database that runs natively on Hadoop.

Enterprise Hadoop: Big data processing made easier
Amazon, Cloudera, Hortonworks, IBM, and MapR mix simpler setup of Hadoop clusters with proprietary twists and trade-offs

Oracle's Big Data Appliance brings focus to bundled approach
Oracle's Big Data Appliance product, which shipped Tuesday, gives enterprises another option for deploying projects based on Apache Hadoop open source technology.

CommVault to combine backup, archive functions
CommVault plans to announce an upgrade to its flagship Simpana software in the next several weeks that will allow backed-up data to be archived while still leaving end users will an easy way to retrieve that data.

Hadoop challenger works to add developers
LexisNexis has worked for more than a decade to develop a large scale system for Big Data manipulation, and it believes that it has produced something that's better and more mature than the better known Hadoop technology.

The Grill: Doug Cutting
Hadoop creator Doug Cutting says he expects the surge in interest in the big-data storage and analytics framework to continue.

Hadoop Is Ready for the Enterprise, IT Execs Say
Despite some lingering user concerns about security and technological issues, Hadoop is ready for enterprise use, according to IT executives at the Hadoop World conference in New York earlier this month.

Hadoop skills are in high demand
Growing enterprise interest in Hadoop and related technologies is driving demand for professionals with big data skills.

DataDirect Network releases array with massive 40GB/sec performance
DataDirect Network's new SFA12K series storage array represents a new high-water mark for networked storage performance with the ability to scale to 6.7 petabytes in two racks and offer up to 40GB/sec performance.

IT must prepare for Hadoop security issues
Corporate IT executives need to pay attention to numerous potential security issues before using Hadoop to aggregate data from multiple, disparate sources, analysts and IT executives said at the Hadoop World conference here this week.

Hadoop ready for corporate IT, execs say
Despite some lingering technology issues, Hadoop is ready for enterprise use, IT executives said Tuesday at the Hadoop World conference here.

Q&A: Hadoop creator expects surge in interest to continue
Doug Cutting, the creator of the open-source Hadoop framework that allows enterprises to store and analyze petabytes of unstructured data, is bullish on the future. Insider (registration required)

'Big data' prep: 5 things IT should do now
Ready or not, big data is coming. Here are 5 things IT managers can do today to prepare for the data deluge of tomorrow.

Oracle boosts enterprise search with Endeca purchase
Oracle said it will acquire Endeca Technologies, a Cambridge, Mass.-based vendor of software for unstructured data analytics and business intelligence, for an undisclosed sum.

Microsoft climbs onto Hadoop bandwagon
Microsoft Wednesday announced it will collaborate with Yahoo spin-off Hortonworks to develop a Apache Hadoop implementation for its Windows Server and Windows Azure platforms.

Don't get carried away by Hadoop's 'gee whiz' factor
Companies should take a pragmatic approach to implementing Hadoop for their "big data" requirements, a new report released Tuesday by analyst firm Forrester Research urges.

Oracle does about-face on NoSQL
Oracle's introduction of its Big Data Appliance at the OpenWorld conference this week is an indication of the attention it is being forced to pay to NoSQL database technology.

EMC adds unstructured big-data analytics to Greenplum platform
EMC announced new software capability in its Hadoop Data Computing Appliance that allows users to mix and match unstructured and structured data analytics platforms.

Hadoop Works Alongside RDBMS
Hadoop, the open-source software used for crunching petabytes of data, isn't replacing conventional database management systems but is instead being used to tackle different problems.

Facebook moves 30-petabyte Hadoop cluster to new data center
To accommodate the surging data volumes, Facebook has moved its Hadoop cluster to a new and bigger data center.

Hadoop growing, not replacing RDBMS in enterprises
The growing need for companies to manage surging volumes of structured and unstructured data is continuing to propel enterprise use of open-source Apache Hadoop software.

'Hadoop alternative' to be open sourced
LexisNexis is planning to release its internally developed supercomputing platform as open source, providing developers with an alternative to the Hadoop framework for large-scale data processing, the company said Wednesday.

Oracle Now Avoiding Big Acquisitions
Oracle this year has dramatically slowed its growth-by-acquisition strategy to concentrate instead on integrating Sun into the company, finishing work on the long-awaited Fusion Applications and filling gaps in its product portfolio.

EMC joins forces with Hadoop distributor MapR Technologies
EMC today formally announced a reseller partnership with MapR, which makes a proprietary MapReduce file system based on Apache Hadoop.

As 'big data' grows, IT job roles, technology must change
As companies look to keep every bit of data generated in-house and by customers for analytics as well as legal and regulatory compliance, the roles of those who manage it are changing, as are the tools they use.

EMC's Tucci sees hybrid cloud becoming de facto standard
EMC has planted its development and acquisition future in the cloud, calling for increased development of open-source Web-based applications and MapReduce technologies to help mine unstructured data.

EMC unveils Hadoop appliance, BI software
Among a flurry of announcements today at its annual user conference, EMC announced it will be distributing it's a free version of Apachee Hadoop and a licensed version for enterprises as well as a pre-configured appliance for big data analytics tasks.

Yahoo working on Hadoop MapReduce 2
Yahoo is close to releasing the next generation of big data engine Hadoop that will offer higher level management functionality.

Big data to drive a surveillance society
Vendors and users of big data analytics gathered in New York this week to discuss the latest developments in a technology that they say will offer Web users and their customers a far more personalized experience while alleviating the need to throw away useful data.

Hadoop Goes Mainstream for Big BI Tasks
Companies seeking to glean insights from terabytes or even petabytes of data are turning to open-source Hadoop software to do the job.

Big Data mining: Who owns your social network data?
An attractive application of Hadoop and other Big Data technologies is to analyze users' social activities, sometimes without their express knowledge

Massive data volumes making Hadoop hot
Rapidly growing sores of structured and unstructured data are prompting IT executives to turn to open source Hadoop technology for storage and analysis efforts.

Pervasive pairs parallel development API with Hadoop MapReduce
DataRush 5.0, which helps developers without parallel development experience create multithreaded apps, also backs new JVM languages

IBM develops new clustered analytics processing platform
IBM said it has created a new distributed computing architecture that is twice as fast as existing clustered file systems and that provides management and advanced data-replication techniques.

N.C. State turns to smart data analytics to find research partners
N.C. State University has signed up IBM to help its technology transfer office speed up the process of matching university research projects with potential investors and industry partners.

Gosling: Oracle gets server-side Java, but confused about desktops, cell phones
Java founder offers mixed outlook for Oracle's handling of the technology

Startup pushes Hadoop via spreadsheet
A startup called Datameer is offering a simpler way for business analysts to use Hadoop, the open-source framework for large-scale data processing on clusters of commodity hardware.

Twitter growth prompts switch from MySQL to 'NoSQL' database
Twitter Inc. is slowly moving off the MySQL database for so-called 'NoSQL' open-source database technology that's already been embraced by Web 2.0 counterparts, Facebook Inc. and Digg.

Gartner lists 3 challenges for rebounding Teradata
Teradata still faces multiple challenges, even though it reported improved financials in 2009's fourth quarter and Gartner has ranked it at the top of the data warehousing segment.

How Hadoop startup Cloudera is evolving
A data integration app will be formally released this quarter as part of the overarching Cloudera Dta Platform.

Big three database vendors diverge on Hadoop
The three leaders of the relational database market are responding to the sudden mania for the data processing technology Hadoop in three very different ways.

Sybase is latest RDB maker to embrace MapReduce
Sybase CTO Irfan Khan said that adding MapReduce functionality to the Sybase IQ analytic should significantly boost its performance.

Online Matchmaker Won't Settle Down With Just One BI Tool
EHarmony uses a variety of data-crunching applications to keep members of its online matchmaking service happy.

The tech behind 236 eHarmony members getting hitched daily
While eHarmony Inc.'s goal is to get its 20 million members married or into long-term relationships, the online matchmaker is a downright commitment-phobe in its use of technology.

Hive: Large-scale, distributed data processing
Suppose you want to run regular statistical analyses on your Web site's traffic log data -- several hundred terabytes, updated weekly. (Don't laugh. This is not unheard of for popular Web sites.) You're already familiar with Hadoop (see InfoWorld's review), the open source distributed processing system that would be ideal for this task. But you don't have time to code Hadoop map/reduce functions? Perhaps you're not the elite programmer that everyone in the office thinks you are.

Yale researchers create database-Hadoop hybrid
Yale University researchers on Monday released an open-source parallel database that they say combines the data-crunching prowess of a relational database with the scalability of next-generation technologies such as Hadoop and MapReduce.

Amazon automates Hadoop use for developers
Amazon.com has launched a hosted service designed to simplify for developers the use of the Hadoop implementation of the MapReduce programming model for processing large data sets in processor clusters.

Microsoft Reverses Course, Becomes More Open to Open-Source Community
Microsoft has softened its "us vs. them" stance on open source to the point that it's now contributing code to open-source projects -- although the vendor still thinks its software is best.

Yahoo offers free supercomputing to Indian Hadoop developers
Yahoo aims to get more developers to research and develop applications that can scale around Hadoop, and will likely offer the same deal in other countries.

Our bloggers on Hadoop Hadoop blog RSS

In era of sequestration, data storage optimization key for government agencies

Today, many government agencies – civilian and defense – find themselves in a technology quandary: the volume of data that must be stored is growing rapidly, while shrinking budgets are limiting capital expenditures (i.e. – servers, storage devices, etc.) required to store all of this data.

Time for the financial industry to contribute more to open source projects

In the financial industry, software is largely considered a trade secret. Speed is everything in the trading environment - so how an application performs can make or break competitive advantage. But there is a delicate balance of giving back to the open source community while also maintaining competitive advantage and trade secrets. I encourage the financial industry to continue to find that balance, investing and supporting open source projects that can help the industry overall. Insider (free registration required).

Is big data a big drain on your network?

There's a lot of talk right now in the industry about big data and the business intelligence applications that are being used to wrangle it. However, very few people are talking about the impact that big data can have on the network.

The government and big data: Use, problems and potential

When it comes to managing data, government agencies have always had the same issue. From national intelligence to the IRS, the U.S. Census to local municipalities, there are massive amounts of data in agency computer systems. Much of that information is unstructured, meaning it does not fit into a pre-defined data model.

Hadoop hype and data Yodas: Tales from Predictive Analytics World

Here are a few takeaways, interesting comments and other tidbits from the Predictive Analytics World conference.

IBM has Hadoop cloud for big, unstructured data

Hadoop logo IBM (NYSE:IBM) has launched its unstructured-data cloud service, based on Hadoop. Called BigInsights, it's essentially MapReduce for Dummies, which is no bad thing. In IT Blogwatch, bloggers welcome their new pachydermic overlords. Your humble blogwatcher curated these bloggy bits for your entertainment. Not to mention: Fight For The Future...

Big data SMAQ-down

The term "big data," is getting thrown around a lot these days, and in certain circles it is threatening to overtake "cloud" as the most overused and misused term in IT.

Interestingly, some of the large, traditional storage vendors are embracing the term big data, using it as an umbrella term for all large collections of data and hence an umbrella term for all of their offerings. A more nuanced understanding of big data actually shows it to be antithesis of both the technology and the business models of the traditional storage vendors.

IBM's big, fluffy, Blue Cloud (and UF iBrick)

It's IT Blogwatch: in which IBM announces its Blue Cloud effort. Not to mention User Friendly's take on bricked iPhones...

Todd R. Weiss and James Niccolai tag-team:

In a move to create more robust, scalable computing systems that can power the expanding needs of new Web 2.0 and mobile applications, IBM today said it will unveil Europe's most powerful computer (Forschungszentrum Jülich)its first enterprise-ready cloud computing hardware in the first quarter of next year ... blade servers running x86 and IBM Power processors, followed later by System z mainframes and a cloud environment based on highly dense rack clusters ... to link together large pools of systems that specifically are aimed at handling the design and performance needs of emerging Web 2.0 and mobile applications.