JavaZone 2011 - Cascading through Hadoop: A DSL for Simpler MapReduce
Hadoop is a MapReduce framework that has literally sprung into the vernacular of "big data" developers everywhere. But coding to the raw Hadoop APIs can be a real chore. Data analysts can express what they want in more English-like vocabularies, but it seems the Hadoop APIs require us to be the translator to a less comprehensible functional and data-centric DSL.
The Cascading framework gives developers a convenient higher level abstraction for querying and scheduling complex jobs on a Hadoop cluster. Programmers can think more holistically about the questions being asked of the data and the flow that such data will take without concern for the minutia.
We'll explore how to set up, code to, and leverage the Cascading API on top of a Hadoop cluster for a more effective way to code MapReduce applications all while being able to think in a more natural (less than fully MapReduce) way.
During this presentation, we'll also explore Cascading's Clojure-based derivative, Cascalog, and how functional programming paradigms and language syntax are emerging as the next important step in big-data thinking and processing.
Matthew J. McCullough
Open Source Application Architect at Ambient Ideas
Matthew McCullough is an energetic 12 year veteran of enterprise software development, open source education, and co-founder of Ambient Ideas, LLC, a Denver consultancy. He is an outspoken advocate for the use of open source libraries in enterprise applications. Matthew currently is a member of the JCP, reviewer for technology publishers including O'Reilly, President of the Denver Open Source Users Group, and speaker on the No Fluff Just Stuff 2009 tour.
His experience includes successful J2EE, SOA, and Web Service implementations for real estate, financial management, and telecommunications firms, and development of several open source libraries. Matthew jumps at opportunities to evangelize, present, and educate teams on the benefits of open source. His current focuses are Maven, iPhone and Android applications, and OSS debugging tools.
Matthew currently resides in beautiful Denver, Colorado, USA with his wife and baby daughter, who all are active in nearly every outdoor activity Colorado offers.
