JavaZone 2010 - Hadoop: Divide and Conquer Gigantic Datasets
Moore's law has finally hit the wall and CPU speeds have actually decreased in the last few years. The industry is reacting with hardware with an ever-growing number of cores and software that can leverage "grids" of distributed, often commodity, computing resources. But how is a traditional Java developer supposed to easily take advantage of this revolution? The answer is the Apache Hadoop family of projects. Hadoop is a suite of Open Source APIs at the forefront of this grid computing revolution and is considered the absolute gold standard for the divide-and-conquer model of distributed problem crunching. The well-travelled Apache Hadoop framework is curently being leveraged in production by prominent names such as Yahoo, IBM, Amazon, Adobe, AOL, Facebook and Hulu just to name a few.
In this session, you'll start by learning the vocabulary unique to the distributed computing space. Next, we'll discover how to shape a problem and processing to fit the Hadoop MapReduce framework. We'll then examine the incredible auto-replicating, redundant and self-healing HDFS filesystem. Finally, we'll fire up several Hadoop nodes and watch our calculation process get devoured live by our Hadoop cluster. At this talk's conclusion, you'll understand the suite of Hadoop tools and where each one fits in the aim of conquering large data sets.
Matthew J. McCullough
Matthew McCullough is an energetic 14 year veteran of enterprise software development, open source education, and co-founder of Ambient Ideas, LLC, a Denver consultancy. Matthew currently is a member of the JCP, reviewer for technology publishers including O'Reilly, author of the upcoming Presentation Patterns & Anti-Patterns book, multi-year speaker on the No Fluff Just Stuff tour, author of the DZone Maven, Git & Google App Engine RefCards, and President of the Denver Open Source Users Group.
His experience includes successful JEE, SOA, and Web Service implementations for real estate, finance and telecommunications firms in addition to publishing several open source libraries. Matthew jumps at opportunities to mentor and educate teams on how to leverage open source. His current topics of R&D are Cloud Computing, Service Integrations, Maven, Git, and Hadoop.
Matthew resides in Denver with his beautiful wife and 1.5 year old daughter, who are active in nearly every outdoor activity Colorado offers.
