Solutions to help your business Sign up for our newsletters Join our Community
  • Share

Telecom operators get the scoop on Hadoop

Cloud computing is all the rage, but it may be a while before some key elements reach the mainstream. That makes it a good time to delve into Hadoop, a grand-scale storage and processing platform just entering the telecom radar screen.

In many ways, cloud computing looks like not much more than glorified Web site and application hosting. Yet in other, probably more important ways, it really is something altogether new. It’s with this dichotomy in mind that we introduce Hadoop, which squarely fits in the altogether new category and holds the promise of turning tomorrow’s clouds into giant brains capable of analyzing huge sets of data to uncover heretofore-unknowable truths.

More on this Topic

Industry News

Blogs

Briefing Room

What kinds of truths?

  • Google uses Hadoop (actually its own proprietary version, called MapReduce) to help it swallow the entire Web, not to mention massive map/satellite databases, to produce elegantly useful products such as Google Maps, Google Earth and, of course, the Google search engine itself.
  • Yahoo! uses Hadoop to analyze and optimize how its 20 million visitors consume its home page content.
  • The New York Times set a Hadoop-powered cloud against an 11-million story archive dating back to 1851 to make it instantly searchable.
  • Facebook uses Hadoop to analyze interactions and social graph links on its site — growing at a rate of more than 15 terabytes of new data per day — powering the friend connections and personalization that drives the social networking site.

And while these Silicon Valley, start-up–centric stories today would seem to position Hadoop well out on computing’s cutting-edge, Hadoop is easily accessible (it’s open source at heart), cheap (it runs on cheap computer clusters) and useful (see above) enough that it appears poised to rapidly move into the mainstream. For instance, at October’s inaugural Hadoop World conference in New York City, corporate-minded presenters — such as JP Morgan Chase, Visa and Booz Allen Hamilton (on behalf of the medical industry) — sat alongside the tech elite, making the event feel very much like a coming out party for the mainstreaming of Hadoop.

“Our view is that Google, Yahoo! and Facebook are not different than other companies, they just represent the future,” said Mike Olson, CEO of Cloudera, which is trying to commercialize Hadoop via a software and services model. “It used to be hard to get your hands on a terabyte of data, but not anymore. In the future, the companies that win will be the ones that understand data the best.”

And what does this all mean for the traditional telecom service provider, not to mention those companies’ hosting/cloud computing groups? A lot. Because not only is Hadoop processing likely to be an important application for cloud providers, telecom operators — sitting on reams of network and customer data — are prime candidates to become Hadoop users. China Mobile, for instance, presented at that same conference on using Hadoop as a telecom data mining platform, showing how operators can tap this powerful technology to better understand their networks, services and customers, finding new patterns and revelations that can help them compete in the digital future.

Hadoop began as a Google project called MapReduce. Google developed the technology to help its search engine store and analyze all the information it was spidering for its search engine. Google kept this secret sauce private, but it did publish a technical paper describing how it worked. Yahoo!, thinking it could use something similar, threw its support behind Hadoop, an open-source project trying to emulate Google’s work. As Hadoop has matured it’s also broadened to become less focused on just Web search applications, positioning it for mainstream adoption.

MapReduce remains a Google jewel, but a growing community focused on Hadoop makes it the technology to watch. At least one vendor, the privately funded Cloudera, has emerged to make a Red Hat–style play at popularizing Hadoop via support services and regularly updated distributions. Hadoop creator Doug Cutting recently moved from Yahoo! to Cloudera, joining other Facebook, Google and Yahoo! exports there and further boosting the company’s profile. Today, Amazon and Google are the biggest public cloud implementers of Hadoop — though offering Hadoop-as-a-service is on the radar of many cloud providers. Meanwhile, Hadoop is also relatively easy to deploy in private clouds as well, and that’s where its biggest growth may be.

The evolution of Hadoop is an interesting one but more to the point, what is it and how does it work? Essentially, Hadoop is SQL writ large, a way to store and query against extremely large — we mean extremely, like the entire Internet — data sets. Today’s relational databases and business intelligence tools are powerful, but what if you have 100, or 1000 times the data? Those tools begin to break down, Hadoop backers say.

The power of Hadoop is that it is engineered to spread out that processing across hundreds if not thousands of plain vanilla servers (and eventually, in Google’s vision, millions of machines) arranged in a cluster, rather than relying on super-expensive proprietary machines. At the start, doing analysis in Hadoop was difficult and best left to the experts, but Hadoop additions with whimsical names such as Hive and Pig have brought simpler, SQL-style capabilities to Hadoop — positioning it for further mainstream growth.

While Hadoop grew out of a Web start-up and research-oriented focus, its core proposition is fit for any enterprise: the ability to cheaply analyze massive amounts of data. The alternatives are expensive data warehouse-style technologies only available to the largest enterprises. But as with the adoption of other Web-based and open source–driven technologies before it, Hadoop’s biggest use may be by those very enterprises. Who doesn’t like more open, less expensive IT alternatives?

But what impact will Hadoop have on traditional telecom providers? For starters, their hosting/cloud divisions could one day offer Hadoop-as-a-service as part of their cloud apps offerings in competition with Amazon, Google and others. Not only would Hadoop processing — especially apps tuned to specific vertical markets — be a potential revenue producer itself, but Hadoop requires reams of processing power and data storage to work, essentially driving demand for large-scale cloud infrastructure services.

But for now, telecom operators may be bigger users than deployers of Hadoop services. “When I go around  to ISPs, here on the West Coast especially, virtually all of them are using Hadoop to monitor the behavior of their network and software infrastructure, and they are beginning to use it to analyze their customers as well,” said Cloudera’s Olson. “We believe the telecom industry has lots of problems precisely like that.”

Hosting provider Rackspace, for instance, is considering offering its own instance of Hadoop-as-a-service to its customers. But for now, it is content to run Hadoop internally, using it to analyze massive stores of e-mail log archives to find ways to fine-tune its hosted e-mail offerings.

“We recently needed to run a statistics job — how many messages sent from a whitehouse.gov address were marked as spam? In the past, it would have been impossible to do that without much more expensive [database] platforms,” said Stu Hood, architecture software developer for Rackspace’s e-mail and applications group, adding that his team is now focused on making it easier to make queries against its data stores using Hadoop. “Any time we have a usage question now we can query our logs,” he said. “That’s very powerful.”

Such applications may seem pedestrian at first, but gaining the ability to analyze such large data sets opens up entirely new applications not only in telecom, but in other verticals as well, such as finance, health care and more. And that makes Hadoop a technology to watch as cloud computing evolves into something beyond simple Web and application hosting.

Want to use this article? Click here for options!
© 2012 Penton Media Inc.

Learning Library

Featured Content

A time and money saving approach to fiber deployment

Service providers are under tremendous pressure to turn up new services faster then before and, at the same time, to do it at less expense - and intra-office fiber is one of the biggest challenges in terms of both cost and service turn-up.

The Latest

News

From the Blog

Briefingroom

Join the Discussion

Resources

Get more out of Connected Planet by visiting our related resources below:

Connected Planet highlights the next generation of service providers, as well as how their customers use services in new ways.

Subscribe Now

Back to Top