Continuing coverage of Hadoop 
Lessons Learned at Invite-Only Performance Testing Conference Starting with data and using it to get to the heart of the matter isn't the way IT conference sessions go. Then again, the invitation-only Workshop on Performance and Reliability isn't your typical IT conference. The real value is what happens when it's 'open season' on the presenter and the real 'sense making' can begin. How big data will save your life Big data analytics is creating a world where doctors will eventually be able to do a Google-like query on a patients illness and instantly discover how 100,000 other doctors treated their patients. It's also driving new treatments through genomic profiling. 5 strategic tips for avoiding a big data bust Failed expectations, increased costs, unnecessary legal risks -- going blind into a big data project doesn't pay Forget big data, the value is in 'big answers' The CTO of Barack Obama's reelection campaign says we need to stop worrying about big data and turn our attention instead to the 'big answers' the data provides. EMC, which made big data news this week, is one tech company that plans to translate big answers into big customer wins. Top 12 Big Data Stories of 2012 The holidays are here and 2012 is on its way out, ending a huge year for Big Data. It's time to reflect on the most popular Big Data stories and tips of the year. Cloudera CEO: We're taking Hadoop beyond MapReduce In an exclusive interview, the voluble CEO of Cloudera, Mike Olson, holds forth on the company's new Impala project and the boundless potential of Hadoop. IT shops will become consultants instead of tech managers, says EMC's CIO An EMC executive said customers should have a sense of urgency about deploying IT as a service via a cloud infrastructure, but not everyone at the data storage company's user forum agreed with him. SAS extends analytics support for unstructured data SAS Institute this week unveiled tools it says makes it easier for its enterprise customers to use the company's business analytics software to analyze data stored in Hadoop environments. Beware of BI vendor hype about Hadoop Many BI vendors claim that their products support Hadoop, but Forrester says customers should find out what that support really entails. Forrester: Push BI vendors for details on Hadoop integration Enterprises should ask a lot of questions when a vendor touts its business intelligence products as being fully integrated with Hadoop, Forrester analyst Boris Evelson warned. Big data, big jobs? As companies embrace big data, they're in the market for high-level strategists and communicators. Do you have the chops to snag a big data job? Hadoop becomes critical cog in the big data machine As more and more companies use Hadoop to handle big data, anticipation for forthcoming Version 2.0 grows Ease Big Data Hiring Pain With Cascading Finding developers with the skills to create MapReduce jobs in Apache Hadoop is challenging, but you can ease that hiring pain with Cascading, an open source Java application framework for building enterprise Big Data applications on Hadoop. Yahoo's Genome highlights hosted big data analytics trend Yahoo joined a growing list of companies offering big data analytics as a service with its Genome offering this week. Enterprise BI models undergo radical transformation There's a dramatic transformation going on in business intelligence practices at many companies, prompted by growing interest in analyzing large, diverse data sets and better tools for completing such tasks. Investors are pouring funds into big data Surging enterprise demand for big data tools that can manipulate and analyze massive volumes of structured and unstructured data has caught investor attention in a big way. Best Practices for Selecting Storage Services for Big Data Big data is fueling the need for ever-growing storage repositories. If you're looking to meet scalability concerns without breaking the bank, selecting a storage platform that can meet the needs of big data can be a challenge--but it doesn't have to be an overwhelming one. How to Use Hadoop to Overcome Storage Limitations Big data is all about storing and accessing large amounts of structured and unstructured data. However, where to put that data and how to access it have become the biggest challenges for enterprises looking to leverage the information. If you haven't yet considererd the open source Hadoop platform, now's the time. Shell Oil targets hybrid cloud as fix for energy-saving, agile IT To address soaring data storage and major power consumption issues, Shell Oil has turned to the public cloud to become more agile in deploying application and development services. As 60th anniversary nears, tape reinvents itself Digital tape media will turn 60 in May, and while tape sales are on the decline, new open file specifications like LTFS and new markets could revive tape for the long term. How to get a hot job in big data The big data revolution is creating a new breed of business-IT jobs -- and threatening to destabilize dyed-in-the-wool IT careers Can big data nab network invaders? The buzz in security circles about "big data" goes something like this: If the enterprise could only unite its security-related event data with a warehouse of business information, it could analyze this Big Data to catch intruders trying to steal sensitive information. Get Hadoop certified ... fast IT professionals are scrambling to get trained and certified in what's expected to be the hottest new high-tech skill for 2012: Hadoop. Supersize me: Hadoop upgrade will handle even bigger data Apache aims to have the upcoming 0.23 release this year be able to run on 6,000-node clusters Spring Java developers get Hadoop integration VMware's Spring Hadoop offers link between Spring development framework and Hadoop distributed processing platform Look Before You Leap Into Hadoop Analysts and early users warn that companies should move slowly if they want to take advantage of the open-source Hadoop technology, noting that it requires extensive training along with analytics expertise not seen in many IT shops. Zettaset to offer role-based access control for Hadoop Zettaset, which makes tools for managing big data, has unveiled its SHadoop security initiative to help companies better control access to data in Hadoop. Teradata partners with Hortonworks on Hadoop Growing enterprise interest in Big Data analytics is beginning to drive partnerships between vendors of traditional relational database management technologies and purveyors of Apache Hadoop. What's the big deal about Hadoop? Hadoop is all the rage, but it requires expertise that's beyond the ken of many IT shops, customers say. Hadoop wins over enterprise IT, spurs talent crunch Hadoop is coming out of the shadows and into production in enterprise IT shops. But the relative newness of the open-source platform and a shortage of experienced Hadoop talent pose hurdles. Your Big Data To-Do List Ready or not, big data is coming. Here are 5 things IT managers can do today to prepare for the data deluge of tomorrow. 2012: The year storage becomes a celebrity This promises to be a break-out year for storage technology with the use of more NAND flash in devices and smarter storage that can be tailored to applications. CIA-backed Cleversafe announces 10-exabyte storage system Object-based storage vendor Cleversafe today unveiled a storage system that can hold 1 billion gigabytes of data under a single domain name. Oracle Move Could Push Rivals Toward Big Data Bundles The shipping of Oracle's Big Data Appliance earlier this month could pressure major rivals like IBM, Hewlett-Packard and SAP to come up with Hadoop offerings that tightly bundle hardware and software products, analysts say. RainStor launches Hadoop version of enterprise database Online database repository provider RainStor unveiled what it is calling the industry's first enterprise-class database that runs natively on Hadoop. Enterprise Hadoop: Big data processing made easier Amazon, Cloudera, Hortonworks, IBM, and MapR mix simpler setup of Hadoop clusters with proprietary twists and trade-offs Oracle's Big Data Appliance brings focus to bundled approach Oracle's Big Data Appliance product, which shipped Tuesday, gives enterprises another option for deploying projects based on Apache Hadoop open source technology. CommVault to combine backup, archive functions CommVault plans to announce an upgrade to its flagship Simpana software in the next several weeks that will allow backed-up data to be archived while still leaving end users will an easy way to retrieve that data. Hadoop challenger works to add developers LexisNexis has worked for more than a decade to develop a large scale system for Big Data manipulation, and it believes that it has produced something that's better and more mature than the better known Hadoop technology. The Grill: Doug Cutting Hadoop creator Doug Cutting says he expects the surge in interest in the big-data storage and analytics framework to continue. Hadoop Is Ready for the Enterprise, IT Execs Say Despite some lingering user concerns about security and technological issues, Hadoop is ready for enterprise use, according to IT executives at the Hadoop World conference in New York earlier this month. Hadoop skills are in high demand Growing enterprise interest in Hadoop and related technologies is driving demand for professionals with big data skills. DataDirect Network releases array with massive 40GB/sec performance DataDirect Network's new SFA12K series storage array represents a new high-water mark for networked storage performance with the ability to scale to 6.7 petabytes in two racks and offer up to 40GB/sec performance.
IT must prepare for Hadoop security issues Corporate IT executives need to pay attention to numerous potential security issues before using Hadoop to aggregate data from multiple, disparate sources, analysts and IT executives said at the Hadoop World conference here this week. Hadoop ready for corporate IT, execs say Despite some lingering technology issues, Hadoop is ready for enterprise use, IT executives said Tuesday at the Hadoop World conference here. Q&A: Hadoop creator expects surge in interest to continue Doug Cutting, the creator of the open-source Hadoop framework that allows enterprises to store and analyze petabytes of unstructured data, is bullish on the future. Insider (registration required) 'Big data' prep: 5 things IT should do now Ready or not, big data is coming. Here are 5 things IT managers can do today to prepare for the data deluge of tomorrow. Oracle boosts enterprise search with Endeca purchase Oracle said it will acquire Endeca Technologies, a Cambridge, Mass.-based vendor of software for unstructured data analytics and business intelligence, for an undisclosed sum. Microsoft climbs onto Hadoop bandwagon Microsoft Wednesday announced it will collaborate with Yahoo spin-off Hortonworks to develop a Apache Hadoop implementation for its Windows Server and Windows Azure platforms. Don't get carried away by Hadoop's 'gee whiz' factor Companies should take a pragmatic approach to implementing Hadoop for their "big data" requirements, a new report released Tuesday by analyst firm Forrester Research urges. Oracle does about-face on NoSQL Oracle's introduction of its Big Data Appliance at the OpenWorld conference this week is an indication of the attention it is being forced to pay to NoSQL database technology. EMC adds unstructured big-data analytics to Greenplum platform EMC announced new software capability in its Hadoop Data Computing Appliance that allows users to mix and match unstructured and structured data analytics platforms. Hadoop Works Alongside RDBMS Hadoop, the open-source software used for crunching petabytes of data, isn't replacing conventional database management systems but is instead being used to tackle different problems. Facebook moves 30-petabyte Hadoop cluster to new data center To accommodate the surging data volumes, Facebook has moved its Hadoop cluster to a new and bigger data center. Hadoop growing, not replacing RDBMS in enterprises The growing need for companies to manage surging volumes of structured and unstructured data is continuing to propel enterprise use of open-source Apache Hadoop software. 'Hadoop alternative' to be open sourced LexisNexis is planning to release its internally developed supercomputing platform as open source, providing developers with an alternative to the Hadoop framework for large-scale data processing, the company said Wednesday. Oracle Now Avoiding Big Acquisitions Oracle this year has dramatically slowed its growth-by-acquisition strategy to concentrate instead on integrating Sun into the company, finishing work on the long-awaited Fusion Applications and filling gaps in its product portfolio. EMC joins forces with Hadoop distributor MapR Technologies EMC today formally announced a reseller partnership with MapR, which makes a proprietary MapReduce file system based on Apache Hadoop. As 'big data' grows, IT job roles, technology must change As companies look to keep every bit of data generated in-house and by customers for analytics as well as legal and regulatory compliance, the roles of those who manage it are changing, as are the tools they use. EMC's Tucci sees hybrid cloud becoming de facto standard EMC has planted its development and acquisition future in the cloud, calling for increased development of open-source Web-based applications and MapReduce technologies to help mine unstructured data. EMC unveils Hadoop appliance, BI software Among a flurry of announcements today at its annual user conference, EMC announced it will be distributing it's a free version of Apachee Hadoop and a licensed version for enterprises as well as a pre-configured appliance for big data analytics tasks. Yahoo working on Hadoop MapReduce 2 Yahoo is close to releasing the next generation of big data engine Hadoop that will offer higher level management functionality. Big data to drive a surveillance society Vendors and users of big data analytics gathered in New York this week to discuss the latest developments in a technology that they say will offer Web users and their customers a far more personalized experience while alleviating the need to throw away useful data. Hadoop Goes Mainstream for Big BI Tasks Companies seeking to glean insights from terabytes or even petabytes of data are turning to open-source Hadoop software to do the job. Big Data mining: Who owns your social network data? An attractive application of Hadoop and other Big Data technologies is to analyze users' social activities, sometimes without their express knowledge Massive data volumes making Hadoop hot Rapidly growing sores of structured and unstructured data are prompting IT executives to turn to open source Hadoop technology for storage and analysis efforts. Pervasive pairs parallel development API with Hadoop MapReduce DataRush 5.0, which helps developers without parallel development experience create multithreaded apps, also backs new JVM languages IBM develops new clustered analytics processing platform IBM said it has created a new distributed computing architecture that is twice as fast as existing clustered file systems and that provides management and advanced data-replication techniques. N.C. State turns to smart data analytics to find research partners N.C. State University has signed up IBM to help its technology transfer office speed up the process of matching university research projects with potential investors and industry partners. Gosling: Oracle gets server-side Java, but confused about desktops, cell phones Java founder offers mixed outlook for Oracle's handling of the technology Startup pushes Hadoop via spreadsheet A startup called Datameer is offering a simpler way for business analysts to use Hadoop, the open-source framework for large-scale data processing on clusters of commodity hardware. Twitter growth prompts switch from MySQL to 'NoSQL' database Twitter Inc. is slowly moving off the MySQL database for so-called 'NoSQL' open-source database technology that's already been embraced by Web 2.0 counterparts, Facebook Inc. and Digg. Gartner lists 3 challenges for rebounding Teradata Teradata still faces multiple challenges, even though it reported improved financials in 2009's fourth quarter and Gartner has ranked it at the top of the data warehousing segment. How Hadoop startup Cloudera is evolving A data integration app will be formally released this quarter as part of the overarching Cloudera Dta Platform. Big three database vendors diverge on Hadoop The three leaders of the relational database market are responding to the sudden mania for the data processing technology Hadoop in three very different ways. Sybase is latest RDB maker to embrace MapReduce Sybase CTO Irfan Khan said that adding MapReduce functionality to the Sybase IQ analytic should significantly boost its performance. Online Matchmaker Won't Settle Down With Just One BI Tool EHarmony uses a variety of data-crunching applications to keep members of its online matchmaking service happy. The tech behind 236 eHarmony members getting hitched daily While eHarmony Inc.'s goal is to get its 20 million members married or into long-term relationships, the online matchmaker is a downright commitment-phobe in its use of technology. Hive: Large-scale, distributed data processing Suppose you want to run regular statistical analyses on your Web site's traffic log data -- several hundred terabytes, updated weekly. (Don't laugh. This is not unheard of for popular Web sites.) You're already familiar with Hadoop (see InfoWorld's review), the open source distributed processing system that would be ideal for this task. But you don't have time to code Hadoop map/reduce functions? Perhaps you're not the elite programmer that everyone in the office thinks you are. Yale researchers create database-Hadoop hybrid Yale University researchers on Monday released an open-source parallel database that they say combines the data-crunching prowess of a relational database with the scalability of next-generation technologies such as Hadoop and MapReduce. Amazon automates Hadoop use for developers Amazon.com has launched a hosted service designed to simplify for developers the use of the Hadoop implementation of the MapReduce programming model for processing large data sets in processor clusters. Microsoft Reverses Course, Becomes More Open to Open-Source Community Microsoft has softened its "us vs. them" stance on open source to the point that it's now contributing code to open-source projects -- although the vendor still thinks its software is best. Yahoo offers free supercomputing to Indian Hadoop developers Yahoo aims to get more developers to research and develop applications that can scale around Hadoop, and will likely offer the same deal in other countries.
| Our bloggers on Hadoop 
In era of sequestration, data storage optimization key for government agencies
Today, many government agencies – civilian and defense – find themselves in a technology quandary: the volume of data that must be stored is growing rapidly, while shrinking budgets are limiting capital expenditures (i.e. – servers, storage devices, etc.) required to store all of this data.
Time for the financial industry to contribute more to open source projects
In the financial industry, software is largely considered a trade secret. Speed is everything in the trading environment - so how an application performs can make or break competitive advantage. But there is a delicate balance of giving back to the open source community while also maintaining competitive advantage and trade secrets. I encourage the financial industry to continue to find that balance, investing and supporting open source projects that can help the industry overall. Insider (free registration required).
Is big data a big drain on your network?
There's a lot of talk right now in the industry about big data and the business intelligence applications that are being used to wrangle it. However, very few people are talking about the impact that big data can have on the network.
The government and big data: Use, problems and potential
When it comes to managing data, government agencies have always had the same issue. From national intelligence to the IRS, the U.S. Census to local municipalities, there are massive amounts of data in agency computer systems. Much of that information is unstructured, meaning it does not fit into a pre-defined data model.
Hadoop hype and data Yodas: Tales from Predictive Analytics World
Here are a few takeaways, interesting comments and other tidbits from the Predictive Analytics World conference.
IBM has Hadoop cloud for big, unstructured data
IBM (NYSE:IBM) has launched its unstructured-data cloud service, based on Hadoop. Called BigInsights, it's essentially MapReduce for Dummies, which is no bad thing. In IT Blogwatch, bloggers welcome their new pachydermic overlords. Your humble blogwatcher curated these bloggy bits for your entertainment. Not to mention: Fight For The Future...
Big data SMAQ-down
The term "big data," is getting thrown around a lot these days, and in certain circles it is threatening to overtake "cloud" as the most overused and misused term in IT.
Interestingly, some of the large, traditional storage vendors are embracing the term big data, using it as an umbrella term for all large collections of data and hence an umbrella term for all of their offerings. A more nuanced understanding of big data actually shows it to be antithesis of both the technology and the business models of the traditional storage vendors.
IBM's big, fluffy, Blue Cloud (and UF iBrick)
It's IT Blogwatch: in which IBM announces its Blue Cloud effort. Not to mention User Friendly's take on bricked iPhones...
Todd R. Weiss and James Niccolai tag-team:
In a move to create more robust, scalable computing systems that can power the expanding needs of new Web 2.0 and mobile applications, IBM today said it will unveil its first enterprise-ready cloud computing hardware in the first quarter of next year ... blade servers running x86 and IBM Power processors, followed later by System z mainframes and a cloud environment based on highly dense rack clusters ... to link together large pools of systems that specifically are aimed at handling the design and performance needs of emerging Web 2.0 and mobile applications.
|