Amazon launches workflow orchestration service
The AWS Data Pipeline can stream big data analysis jobs
IDG News Service - Users of Amazon Web Services will soon be able to orchestrate workflows across different AWS services and their own internal resources, using a new orchestration engine called the AWS Data Pipeline.
Amazon Chief Technology Officer Werner Vogels introduced the technology at the company's Re:Invent conference, held this week in Las Vegas. The service is now available in limited beta preview, though Vogels did not say when it would be commercially available, nor what the price would be.
The service can "automate the movement and processing of any amount of data using data-driven workflows and built-in dependency checking," according to a blog post AWS issued that further explained the technology.
Amazon designed the service to automate the process of parsing large sets of data. For example, one pipeline can move log data from an AWS EC2 (Elastic Cloud Compute) instance to the AWS S3 (Simple Storage Service) once a day, and then, once a week, evoke an analysis job on the data on an AWS Elastic MapReduce cluster.
To set up a workflow pipeline, the user identifies some data sources and describes the steps that AWS should take to process the data. The user would also identify the destination for the processed data as well as a schedule for when the pipeline should be executed. Preconditions can also be established that the service will check before executing a job, such as checking if a file that is needed for the operation exists.
Pipelines can run across EC2, Elastic MapReduce clusters, and the user's own hardware. Pipelines can be set up in the AWS Management Console or by writing a script.
This is not the first workflow engine on AWS. The company also launched the Amazon Simple Workflow in February. However, AWS Data Pipeline is more focused on executing data-driven jobs.
The AWS Data Pipeline is one of a number of announcements Amazon made at the conference. The company also unveiled a data warehouse service and an auto-discovery service to ease the management of its ElastiCache. It has also cut the prices of some of its storage services and created two new EC2 instance types, for high-memory usage and large data usage.
- 15 Non-Certified IT Skills Growing in Demand
- How 19 Tech Titans Target Healthcare
- Twitter Suffering From Growing Pains (and Facebook Comparisons)
- Agile Comes to Data Integration
- Slideshow: 7 security mistakes people make with their mobile device
- iOS vs. Android: Which is more secure?
- 11 sure signs you've been hacked
- ESG: The IBM FlashSystem 840: Technical Evolution to Deliver Business Value In this whitepaper, you will learn how this high-speed storage technology has tremendous potential to support I/O-intensive and/or latency-sensitive applications.
- Choosing an MDM Platform: Where to Start the Conversation If you're in the early stages of choosing an MDM solution, or you're considering switching vendors, here are seven critical questions to ask...
- Axeda Platform Technical Overview This paper summarizes the major features of an IoT platform and explains how they simplify and speed the process of developing and deploying...
- Stock Shock: The effect of project and portfolio management on share price In this independent report, you'll see the intrinsic connection between long-term capital investment and short term market performance -- and how this can...
- Meg Whitman presents Unlocking IT with Big Data During this Web Event you will hear Meg Whitman, President and CEO, HP discuss HAVEn - the #1 Big Data platform, as well...
- Cloud Knowledge Vault Learn how your organization can benefit from the scalability, flexibility, and performance that the cloud offers through the short videos and other resources... All Cloud Computing White Papers | Webcasts