IMPORTANT: This documentation is no longer updated. Refer to Elastic's version policy and the latest documentation.

Preface

edit

Elasticsearch for Apache Hadoop is an ‘umbrella’ project consisting of three similar, yet independent sub-projects with their own, dedicated, section in the documentation:

Elasticsearch on YARN
run Elasticsearch on top of YARN - see Elasticsearch on YARN
repository-hdfs
use HDFS as a repository back-end; that is storage for doing snapshot/restore from/to Elasticsearch. For more information refer to its home page
elasticsearch-hadoop proper
interact with Elasticsearch from within a Hadoop environment. If you are using Map/Reduce, Cascading, Hive, Pig, Apache Spark or Apache Storm, this project is for you.

Thus, while all projects fall under the Hadoop umbrella, each is covering a certain aspect of it so please be sure to read the appropriate documentation.