Apache Impala (curently an ASF incubator project) provides high-performance, low-latency SQL queries on data stored in popular Apache Hadoop file formats. The fast response for queries enables interactive exploration and fine-tuning of analytic queries, rather than long batch jobs traditionally associated with SQL-on-Hadoop technologies. (You will often see the term "interactive" applied to these kinds of fast queries with human-scale response times.)
Impala integrates with the Apache Hive metastore database to share databases and tables between both components. The high level of integration with Apache Hive, and compatibility with the HiveQL syntax, lets you use either Impala or Hive to create tables, issue queries, load data, and so on.
This white paper contains featured blog posts from the Cloudera Engineering Blog about key Impala concepts, Impala performance, and best practices.
Sponsored by Cloudera
Individual, Student, and Team memberships available.