Telecom operators get the scoop on Hadoop
Cloud computing is all the rage, but it may be a while before some key elements reach the mainstream. That makes it a good time to delve into Hadoop, a grand-scale storage and processing platform just entering the telecom radar screen.
In many ways, cloud computing looks like not much more than glorified Web site and application hosting. Yet in other, probably more important ways, it really is something altogether new. It’s with this dichotomy in mind that we introduce Hadoop, which squarely fits in the altogether new category and holds the promise of turning tomorrow’s clouds into giant brains capable of analyzing huge sets of data to uncover heretofore-unknowable truths.
Industry News
Blogs
Briefing Room
advertisement
What kinds of truths?
- Google uses Hadoop (actually its own proprietary version, called MapReduce) to help it swallow the entire Web, not to mention massive map/satellite databases, to produce elegantly useful products such as Google Maps, Google Earth and, of course, the Google search engine itself.
- Yahoo! uses Hadoop to analyze and optimize how its 20 million visitors consume its home page content.
- The New York Times set a Hadoop-powered cloud against an 11-million story archive dating back to 1851 to make it instantly searchable.
- Facebook uses Hadoop to analyze interactions and social graph links on its site — growing at a rate of more than 15 terabytes of new data per day — powering the friend connections and personalization that drives the social networking site.
And while these Silicon Valley, start-up–centric stories today would seem to position Hadoop well out on computing’s cutting-edge, Hadoop is easily accessible (it’s open source at heart), cheap (it runs on cheap computer clusters) and useful (see above) enough that it appears poised to rapidly move into the mainstream. For instance, at October’s inaugural Hadoop World conference in New York City, corporate-minded presenters — such as JP Morgan Chase, Visa and Booz Allen Hamilton (on behalf of the medical industry) — sat alongside the tech elite, making the event feel very much like a coming out party for the mainstreaming of Hadoop.
“Our view is that Google, Yahoo! and Facebook are not different than other companies, they just represent the future,” said Mike Olson, CEO of Cloudera, which is trying to commercialize Hadoop via a software and services model. “It used to be hard to get your hands on a terabyte of data, but not anymore. In the future, the companies that win will be the ones that understand data the best.”
And what does this all mean for the traditional telecom service provider, not to mention those companies’ hosting/cloud computing groups? A lot. Because not only is Hadoop processing likely to be an important application for cloud providers, telecom operators — sitting on reams of network and customer data — are prime candidates to become Hadoop users. China Mobile, for instance, presented at that same conference on using Hadoop as a telecom data mining platform, showing how operators can tap this powerful technology to better understand their networks, services and customers, finding new patterns and revelations that can help them compete in the digital future.
Hadoop began as a Google project called MapReduce. Google developed the technology to help its search engine store and analyze all the information it was spidering for its search engine. Google kept this secret sauce private, but it did publish a technical paper describing how it worked. Yahoo!, thinking it could use something similar, threw its support behind Hadoop, an open-source project trying to emulate Google’s work. As Hadoop has matured it’s also broadened to become less focused on just Web search applications, positioning it for mainstream adoption.
MapReduce remains a Google jewel, but a growing community focused on Hadoop makes it the technology to watch. At least one vendor, the privately funded Cloudera, has emerged to make a Red Hat–style play at popularizing Hadoop via support services and regularly updated distributions. Hadoop creator Doug Cutting recently moved from Yahoo! to Cloudera, joining other Facebook, Google and Yahoo! exports there and further boosting the company’s profile. Today, Amazon and Google are the biggest public cloud implementers of Hadoop — though offering Hadoop-as-a-service is on the radar of many cloud providers. Meanwhile, Hadoop is also relatively easy to deploy in private clouds as well, and that’s where its biggest growth may be.
The evolution of Hadoop is an interesting one but more to the point, what is it and how does it work? Essentially, Hadoop is SQL writ large, a way to store and query against extremely large — we mean extremely, like the entire Internet — data sets. Today’s relational databases and business intelligence tools are powerful, but what if you have 100, or 1000 times the data? Those tools begin to break down, Hadoop backers say.
The power of Hadoop is that it is engineered to spread out that processing across hundreds if not thousands of plain vanilla servers (and eventually, in Google’s vision, millions of machines) arranged in a cluster, rather than relying on super-expensive proprietary machines. At the start, doing analysis in Hadoop was difficult and best left to the experts, but Hadoop additions with whimsical names such as Hive and Pig have brought simpler, SQL-style capabilities to Hadoop — positioning it for further mainstream growth.
While Hadoop grew out of a Web start-up and research-oriented focus, its core proposition is fit for any enterprise: the ability to cheaply analyze massive amounts of data. The alternatives are expensive data warehouse-style technologies only available to the largest enterprises. But as with the adoption of other Web-based and open source–driven technologies before it, Hadoop’s biggest use may be by those very enterprises. Who doesn’t like more open, less expensive IT alternatives?
But what impact will Hadoop have on traditional telecom providers? For starters, their hosting/cloud divisions could one day offer Hadoop-as-a-service as part of their cloud apps offerings in competition with Amazon, Google and others. Not only would Hadoop processing — especially apps tuned to specific vertical markets — be a potential revenue producer itself, but Hadoop requires reams of processing power and data storage to work, essentially driving demand for large-scale cloud infrastructure services.
But for now, telecom operators may be bigger users than deployers of Hadoop services. “When I go around to ISPs, here on the West Coast especially, virtually all of them are using Hadoop to monitor the behavior of their network and software infrastructure, and they are beginning to use it to analyze their customers as well,” said Cloudera’s Olson. “We believe the telecom industry has lots of problems precisely like that.”
Hosting provider Rackspace, for instance, is considering offering its own instance of Hadoop-as-a-service to its customers. But for now, it is content to run Hadoop internally, using it to analyze massive stores of e-mail log archives to find ways to fine-tune its hosted e-mail offerings.
“We recently needed to run a statistics job — how many messages sent from a whitehouse.gov address were marked as spam? In the past, it would have been impossible to do that without much more expensive [database] platforms,” said Stu Hood, architecture software developer for Rackspace’s e-mail and applications group, adding that his team is now focused on making it easier to make queries against its data stores using Hadoop. “Any time we have a usage question now we can query our logs,” he said. “That’s very powerful.”
Such applications may seem pedestrian at first, but gaining the ability to analyze such large data sets opens up entirely new applications not only in telecom, but in other verticals as well, such as finance, health care and more. And that makes Hadoop a technology to watch as cloud computing evolves into something beyond simple Web and application hosting.
Want to use this article? Click here for options!
© 2010 Penton Media Inc.
advertisement
Learning Library
Webcasts
Trends in Customer Activation
Join us Thursday, February 25 for a look at emerging trends and technologies for more efficient, effective activation of customer accounts and services.
- Connected Business Models Series: The Innovation Engine
- Connected Business Models Series: The New Solution - sponsored by Motorola
- No Spectrum, No Problem: Learn the Potential of WiMAX on the Unlicensed Bands – sponsored by Alvarion
- Inside Telecom LIVE, Best Practices in IMS and NGN Deployment – sponsored by EXFO
White Papers
IPv6 Visibility and Protection: Best Practices for Managing and Securing IPv6 Traffic
Network operators need the same management and security capabilities for their IPv6 traffic that they are accustomed to today for their IPv4 traffic. Download this white paper to learn more...
Featured Content
Special Report: Making Quality King
Read how changing technology and changing requirements have made it essential for providers to monitor, test, manage and measure the Quality of Experience of their subscribers. DOWNLOAD NOW
of interest
The Latest
News
From the Blog
Briefingroom
Join the Discussion
Resources
Get more out of Connected Planet by visiting our related resources below:
Connected Planet highlights the next generation of service providers, as well as how their customers use services in new ways.
Subscribe Now






