Data....Loaded .
That was an issue for ImageShack, a photo and video hosting site in Los Gatos, California. ImageShack serves up images and other content billions of times each day. It records information about its visitors, such as their locations and which websites led them to ImageShack, and then uses that data to deliver other relevant images, which keeps users on the site longer, clicking links and generating more ad revenue. Making those calculations isn`t simple. "Imagine writing a database that gets refreshed and processes three billion records every day," says Jack Levin, ImageShack`s founder and CEO.
To analyze all this data, ImageShack decided to tap the same technology developed by search-engine companies to index the Web: Hadoop, a program designed to process massive amounts of data. Inspired by Google`s MapReduce technology, Hadoop is an open-source project developed primarily by engineers at Yahoo. Hadoop takes massive amounts of data, breaks it into smaller chunks, and distributes the pieces across a cluster of computers. In other words, instead of using one computer to analyze data, Hadoop lets you spread the task over several machines, with each one analyzing a portion of the information. Generally, the more computers in the Hadoop cluster, the faster it works.
The professional networking site LinkedIn found it could use Hadoop to speed up an important feature by a factor of 10. LinkedIn now employs Hadoop for its "people you may know" feature, which uses complex formulas to suggest possible acquaintances who aren`t yet in users` networks. Hadoop`s efficiency allowed LinkedIn to use a more sophisticated algorithm that required more computing power but improved results, says engineer Jay Kreps. "We saw a dramatic increase in the number of LinkedIn connections," he says.
Hadoop`s usefulness isn`t limited to Internet companies. Cloudera, a software start-up, aims to bring Hadoop`s processing power to a variety of industries. Cloudera`s CEO, Mike Olson, says even old-line businesses can amass terabytes of useful data. For instance, some retailers ask for customers` phone numbers at the register as a way of identifying them and tracking their purchases. Using these customer logs, stores could, say, look for shoppers who bought diapers six years ago and target them with back-to-school promotions.
Setting up Hadoop does require technical prowess. LinkedIn and ImageShack were able to turn to their own engineers, but other businesses may find they need to hire a programmer or consultant familiar with Hadoop. Some businesses may also need more hardware. Cloudera`s rule of thumb is at least one server per terabyte of data, but the required computing power varies depending on the complexity of the analysis. ImageShack ended up buying 10 top-of-the-line servers for its Hadoop cluster. Companies can avoid buying equipment by.........
Source:
http://www.inc.com/magazine/20091001/data-analysis-overload.html
Search News
News Categories
What's the News?
Post a link to something interesting from another site, or submit your own original writing for the JOSO community to read.
Most Popular News
-
SATYAM Techies in the firing line
Published about 13-11-2008 | Rated -2 -
How to Write a Resume - 7 Tips to Make it a Great One!
Published about 13-11-2008 | Rated +1 -
Satyam shows door to 200 employees
Published about 13-11-2008 | Rated 0 -
Need CV for grand opening in all sectors!!!
Published about 17-04-2009 | Rated +1
Most Recent User Submitted News
- Oracle to expand Web-based software offerings: Source
Published about 30-04-2009 | Rated 0 - Infosys in Global Dow index
Published about 19-11-2008 | Rated 0 - Making Ecommerce Website Design For Masses
Published about 03-08-2009 | Rated 0 - BSNL to launch 3G service in Chennai in January
Published about 24-11-2008 | Rated 0







