|
1 | 1 | # Elasticsearch Hadoop [](https://travis-ci.org/elastic/elasticsearch-hadoop) |
2 | 2 | Elasticsearch real-time search and analytics natively integrated with Hadoop. |
3 | | -Supports [Map/Reduce](#mapreduce), [Apache Hive](#apache-hive), [Apache Pig](#apache-pig), [Apache Spark](#apache-spark) and [Apache Storm](#apache-storm). |
| 3 | +Supports [Map/Reduce](#mapreduce), [Apache Hive](#apache-hive), and [Apache Spark](#apache-spark). |
4 | 4 |
|
5 | 5 | See [project page](http://www.elastic.co/products/hadoop/) and [documentation](http://www.elastic.co/guide/en/elasticsearch/hadoop/current/index.html) for detailed information. |
6 | 6 |
|
@@ -184,33 +184,6 @@ INSERT OVERWRITE TABLE artists |
184 | 184 |
|
185 | 185 | As one can note, currently the reading and writing are treated separately but we're working on unifying the two and automatically translating [HiveQL][] to Elasticsearch queries. |
186 | 186 |
|
187 | | -## [Apache Pig][] |
188 | | -ES-Hadoop provides both read and write functions for Pig so you can access Elasticsearch from Pig scripts. |
189 | | - |
190 | | -Register ES-Hadoop jar into your script or add it to your Pig classpath: |
191 | | -``` |
192 | | -REGISTER /path_to_jar/es-hadoop-<version>.jar; |
193 | | -``` |
194 | | -Additionally one can define an alias to save some chars: |
195 | | -``` |
196 | | -%define ESSTORAGE org.elasticsearch.hadoop.pig.EsStorage() |
197 | | -``` |
198 | | -and use `$ESSTORAGE` for storage definition. |
199 | | - |
200 | | -### Reading |
201 | | -To read data from ES, use `EsStorage` and specify the query through the `LOAD` function: |
202 | | -```SQL |
203 | | -A = LOAD 'radio/artists' USING org.elasticsearch.hadoop.pig.EsStorage('es.query=?q=me*'); |
204 | | -DUMP A; |
205 | | -``` |
206 | | - |
207 | | -### Writing |
208 | | -Use the same `Storage` to write data to Elasticsearch: |
209 | | -```SQL |
210 | | -A = LOAD 'src/artists.dat' USING PigStorage() AS (id:long, name, url:chararray, picture: chararray); |
211 | | -B = FOREACH A GENERATE name, TOTUPLE(url, picture) AS links; |
212 | | -STORE B INTO 'radio/artists' USING org.elasticsearch.hadoop.pig.EsStorage(); |
213 | | -``` |
214 | 187 | ## [Apache Spark][] |
215 | 188 | ES-Hadoop provides native (Java and Scala) integration with Spark: for reading a dedicated `RDD` and for writing, methods that work on any `RDD`. Spark SQL is also supported |
216 | 189 |
|
@@ -313,30 +286,6 @@ DataFrame df = sqlContext.read.json("examples/people.json") |
313 | 286 | JavaEsSparkSQL.saveToEs(df, "spark/docs") |
314 | 287 | ``` |
315 | 288 |
|
316 | | -## [Apache Storm][] |
317 | | -ES-Hadoop provides native integration with Storm: for reading a dedicated `Spout` and for writing a specialized `Bolt` |
318 | | - |
319 | | -### Reading |
320 | | -To read data from ES, use `EsSpout`: |
321 | | -```java |
322 | | -import org.elasticsearch.storm.EsSpout; |
323 | | - |
324 | | -TopologyBuilder builder = new TopologyBuilder(); |
325 | | -builder.setSpout("es-spout", new EsSpout("storm/docs", "?q=me*"), 5); |
326 | | -builder.setBolt("bolt", new PrinterBolt()).shuffleGrouping("es-spout"); |
327 | | -``` |
328 | | - |
329 | | -### Writing |
330 | | -To index data to ES, use `EsBolt`: |
331 | | - |
332 | | -```java |
333 | | -import org.elasticsearch.storm.EsBolt; |
334 | | - |
335 | | -TopologyBuilder builder = new TopologyBuilder(); |
336 | | -builder.setSpout("spout", new RandomSentenceSpout(), 10); |
337 | | -builder.setBolt("es-bolt", new EsBolt("storm/docs"), 5).shuffleGrouping("spout"); |
338 | | -``` |
339 | | - |
340 | 289 | ## Building the source |
341 | 290 |
|
342 | 291 | Elasticsearch Hadoop uses [Gradle][] for its build system and it is not required to have it installed on your machine. By default (`gradlew`), it automatically builds the package and runs the unit tests. For integration testing, use the `integrationTests` task. |
@@ -370,10 +319,8 @@ under the License. |
370 | 319 |
|
371 | 320 | [Hadoop]: http://hadoop.apache.org |
372 | 321 | [Map/Reduce]: http://hadoop.apache.org/docs/r1.2.1/mapred_tutorial.html |
373 | | -[Apache Pig]: http://pig.apache.org |
374 | 322 | [Apache Hive]: http://hive.apache.org |
375 | 323 | [Apache Spark]: http://spark.apache.org |
376 | | -[Apache Storm]: http://storm.apache.org |
377 | 324 | [HiveQL]: http://cwiki.apache.org/confluence/display/Hive/LanguageManual |
378 | 325 | [external table]: http://cwiki.apache.org/Hive/external-tables.html |
379 | 326 | [Apache License]: http://www.apache.org/licenses/LICENSE-2.0 |
|
0 commit comments