Originally posted on: http://geekswithblogs.net/Compudicted/archive/2014/10/24/getting-started-with-impala-by-john-russell-orsquoreilly-media-book.aspx
Impala is a recent, but very valuable addition to the Hadoop ecosystem. I must say (after reading the book) Cloudera made a big step forward in the right direction.
The rational behind bringing Impala to life is the proliferation of SQL. SQL as a language has many flavours, but in one form or another is already known to data practitioners coming to Hadoop from various platforms and DBMS. Impala implements a subset of ANSI-92 SQL specification, regardless, even the subset is powerful enough to make a developer productive. In my opinion, since SQL it is based on algebra and sets, and because HDFS (Hadoop) is just able to expose datasets Impala is the right choice for MDL and DDL even for the Big Data projects.
At 110 pages the book is not terribly long, but bear in mind Impala as a product is still under active development, as a bonus, the author has a close relationship with the product working at Cloudera, this is a big plus resulting in top professional content. John structured the book so it is basically divided into two parts: 1st and the largest is on Impala implementation and its role in data analysis and processing, the 2nd part covers most commonly used tasks, pitfalls or simply advice and techniques.
What I did not find is more on how to use it with Hive, Scoop, HBase and Pig, I will take a star out of my rating for this.
Let me reiterate, the book covers the Cloudera’s Hadoop Impala distribution, if you are using a different distribution, Impala is not part of it.
Like I said, I am giving this book a 4 out of 5 stars. Good work John!
Disclaimer: the book was provided to me for free as part of O’Reilly’s blogger reviewer programme.