Amazon Web Services LLC (AWS), a subsidiary of Amazon.com, Inc., today announced the public beta of Amazon Elastic MapReduce, a web service that enables businesses, researchers, data analysts and developers to easily and cost-effectively process vast amounts of data.
It utilizes a hosted Hadoop framework running on the web-scale infrastructure of Amazon Elastic Compute Cloud (Amazon EC2) and Amazon Simple Storage Service (Amazon S3). Using Amazon Elastic MapReduce, you can instantly provision as much or as little capacity as you like to perform data-intensive tasks for distributed applications such as web indexing, data mining, log file analysis, machine learning, financial analysis, scientific simulation, and bioinformatics research. As with all AWS services, Amazon Elastic MapReduce customers will still only pay for what they use, with no up-front payments or commitments. To sign up for Amazon Elastic MapReduce and other AWS services, go to http://aws.amazon.com.
Prior to Amazon Elastic MapReduce, running Hadoop or other MapReduce-based clusters required time-consuming set-up, management, and cluster tuning. Now, Amazon Elastic MapReduce makes it more affordable and less time consuming to run parallel compute jobs, building on top of the on-demand, resizable compute capacity of Amazon EC2. Using this service, customers can spin up and tear down Hadoop clusters on Amazon EC2 on a moment’s notice. To assist customers in executing these highly distributed applications, AWS is providing a number of sample applications and tutorials to get started using Amazon Elastic MapReduce.
“Some researchers and developers already run Hadoop on Amazon EC2, and many of them have asked for even simpler tools for large-scale data analysis,” said Adam Selipsky, Vice President of Product Management and Developer Relations for Amazon Web Services. “Amazon Elastic MapReduce makes crunching in the cloud much easier as it dramatically reduces the time, effort, complexity and cost of performing data-intensive tasks.”
Amazon Elastic MapReduce creates data processing job flows that are executed by Hadoop software on the web-scale infrastructure of Amazon EC2. The service automatically launches and configures the number and type of Amazon EC2 instances specified by customers. It then kicks off a Hadoop implementation of the MapReduce programming model, which loads large amounts of user input data from Amazon S3 and then subdivides it for parallel processing using Amazon EC2 instances. As processing completes, data is re-combined and reduced into a final solution, and the results deposited back into Amazon S3. Users can configure, manipulate, and monitor job flows through web service APIs or via the AWS Management Console.
“Netflix is continually pursuing new technologies that extend our ability to deliver the best movie rental experience to our more than 10 million subscribers. Amazon Elastic MapReduce provides a powerful capability on top of the already robust Amazon Web Services technology platform. We’re enthused about the potential for this new technology to provide an even better experience to our members,” said Netflix Chief Product Officer Neil Hunt.
"MapReduce is a key component of our matching infrastructure," said eHarmony Vice President of Technology Joseph Essas. "Amazon Elastic MapReduce cuts down on configuration and management time, making the entire process much more efficient."