Featured Job: Social Media Executives / Managers / Online Community Executives - Inventcorp, Hyderabad
News »Browse Articles » Data Analysis Overload?
0
Vote Vote

Data Analysis Overload?

Views 0 Views    Comments 0 Comments    Share Share    Posted 27-10-2009  

Data....Loaded .

It`s easier than ever for companies to collect all kinds of data about their customers. Often, the hard part is figuring out how to analyze it all. What starts as a useful database of customer information can become a slow or unresponsive monster when it grows to more than a terabyte -- or about 1,000 gigabytes -- of data. In some cases, a database can be so enormous that no single computer is capable of processing the information.

That was an issue for ImageShack, a photo and video hosting site in Los Gatos, California. ImageShack serves up images and other content billions of times each day. It records information about its visitors, such as their locations and which websites led them to ImageShack, and then uses that data to deliver other relevant images, which keeps users on the site longer, clicking links and generating more ad revenue. Making those calculations isn`t simple. "Imagine writing a database that gets refreshed and processes three billion records every day," says Jack Levin, ImageShack`s founder and CEO.

To analyze all this data, ImageShack decided to tap the same technology developed by search-engine companies to index the Web: Hadoop, a program designed to process massive amounts of data. Inspired by Google`s MapReduce technology, Hadoop is an open-source project developed primarily by engineers at Yahoo. Hadoop takes massive amounts of data, breaks it into smaller chunks, and distributes the pieces across a cluster of computers. In other words, instead of using one computer to analyze data, Hadoop lets you spread the task over several machines, with each one analyzing a portion of the information. Generally, the more computers in the Hadoop cluster, the faster it works.

The professional networking site LinkedIn found it could use Hadoop to speed up an important feature by a factor of 10. LinkedIn now employs Hadoop for its "people you may know" feature, which uses complex formulas to suggest possible acquaintances who aren`t yet in users` networks. Hadoop`s efficiency allowed LinkedIn to use a more sophisticated algorithm that required more computing power but improved results, says engineer Jay Kreps. "We saw a dramatic increase in the number of LinkedIn connections," he says.

Hadoop`s usefulness isn`t limited to Internet companies. Cloudera, a software start-up, aims to bring Hadoop`s processing power to a variety of industries. Cloudera`s CEO, Mike Olson, says even old-line businesses can amass terabytes of useful data. For instance, some retailers ask for customers` phone numbers at the register as a way of identifying them and tracking their purchases. Using these customer logs, stores could, say, look for shoppers who bought diapers six years ago and target them with back-to-school promotions.

Setting up Hadoop does require technical prowess. LinkedIn and ImageShack were able to turn to their own engineers, but other businesses may find they need to hire a programmer or consultant familiar with Hadoop. Some businesses may also need more hardware. Cloudera`s rule of thumb is at least one server per terabyte of data, but the required computing power varies depending on the complexity of the analysis. ImageShack ended up buying 10 top-of-the-line servers for its Hadoop cluster. Companies can avoid buying equipment by.........

Source:
http://www.inc.com/magazine/20091001/data-analysis-overload.html
0
Vote  Vote
Enter your comment:
No Comments For This News

Search News

What's the News?

Post a link to something interesting from another site, or submit your own original writing for the JOSO community to read.

Most Popular News

Most Recent User Submitted News