Killer Instinct?
That`s not to say that Hadoop is without warts. For the past three years of formal development, the project has consistently broken backward compatibility, and many users have cited security as an ongoing concern. But Hadoop creator Doug Cutting, also an employee of Yahoo, says that both of these issues should see solutions in the next two releases.
Mike Fitzgerald, COO of Adknowledge, said that his company has been using Hadoop for almost a year now. His team runs Hadoop in Amazon`s EC2 cloud, but it uses its own implementation rather than Amazon`s official Hadoop services.
Adknowledge uses its Hadoop cluster to sift through customer data to determine which ads are best suited to which customers. He said that, on average, his team`s Hadoop cluster sifts through approximately 40 terabytes of data at a time in a batch job.
Fitzgerald said that developing applications to run on Hadoop requires the understanding of some new concepts. “You need to understand the concepts of map/reduce, and distributed computing," he said.
"We use Java and have found it relatively easy to write Java code that leverages the Hadoop framework. The most important thing to ask is, `What are the problems you`re trying to solve with Hadoop?` The best times to use it are when you`re doing things that require very large-scale computation with a lot of data.
“We have in the past used big iron databases like Netezza, and we have a lot of Oracle. When you reach that scale, you really challenge what those things can handle. You`re better off in an environment where you`re adding commodity hardware to a cluster."
Hadoop`s history
Hadoop began life when Cutting started to build Nutch, an open-source search engine application. He had previously created the Apache Lucene project, which produced an open-source information-retrieval library written in Java. Based on that project, He began working on Nutch around 2004 with Mike Cafarella. Cutting said that a great deal of the work involved in Nutch was creating the underlying cluster infrastructure for physically scaling the platform.
“The only people who could scale to the size of the Web were Google, Microsoft and Yahoo,” said Cutting. “Google and Microsoft have similar technology, presumably, that they use internally, but those are special. There`s also database technologies, which purport to scale. I don`t think they scale as far, or as easily. But they also have different performance comparisons, so it`s apples to oranges.
"Hadoop is designed for much more generic data processing. It doesn`t require an extensive indexing or data-loading step. It`s presenting all of your data ahead of time. All that classic database analysis isn`t required.”
Cutting eventually found that the infrastructure beneath Nutch was becoming more powerful and elaborate, especially after he read Google`s paper on map/reduce. In 2006, he joined Yahoo, and the infrastructure project was officially named after a stuffed elephant: Hadoop. Today, Yahoo houses the world`s largest Hadoop cluster, coming in at 4,000 nodes. This cluster contributes to every Yahoo search performed.
With a full team working on Hadoop and its supporting tools and projects, Yahoo and Cutting have pushed the project to version 0.20.0. While there is no set date for the release of version 1.0, the Hadoop team is striving to release it before the end of the year.
Hadoop is made up of a number of subprojects. These include a distributed file system (HDFS), the HBase database, and the Pig language for building data queries. As an Apache Foundation project, however, Hadoop is surrounded by alternative tools. Amazon substitutes its own S3 storage services for HDFS, and Facebook has constructed its own data warehouse infrastructure (Hive) with a SQL-like substitute for Pig.
Ashish Thusoo, engineering manager at Facebook, said his team uses a 600-node Hadoop cluster. He said that Hadoop is useful for business intelligence and summarization applications.
“Our ad insight numbers are generated in Hadoop and Hive. It`s a widely published system here, and we get 3,000 jobs a day with more than 100 users using it internally. It`s useful for analytics on all sorts of structured data, as well as unstructured data,” he said.
Hot property
So compelling is the Hadoop story that Christophe Bisciglia, founder of Cloudera, said that he had to “fend off investors with a stick.” Cloudera packages Hadoop into numerous forms for use on the various Linux distributions and within Amazon`s EC2. The company also offers numerous training......
Source:
http://www.sdtimes.com/IS_HADOOP_THE_CLOUD_S_KILLER_APP_/By_Alex_Handy/About_APA
Search News
News Categories
What's the News?
Post a link to something interesting from another site, or submit your own original writing for the JOSO community to read.
Most Popular News
-
SATYAM Techies in the firing line
Published about 13-11-2008 | Rated -2 -
How to Write a Resume - 7 Tips to Make it a Great One!
Published about 13-11-2008 | Rated +1 -
Satyam shows door to 200 employees
Published about 13-11-2008 | Rated 0 -
Need CV for grand opening in all sectors!!!
Published about 17-04-2009 | Rated +1
Most Recent User Submitted News
- Good cash management is key to growth now: Deloitte
Published about 17-11-2008 | Rated 0 - HTC launches Touch Pro & Touch Viva in India
Published about 23-11-2008 | Rated 0 - YouTube changes cookie use policy on Whitehouse Government
Published about 24-06-2009 | Rated 0 - MNCs are looking at India, China for growth
Published about 07-04-2009 | Rated 0







