Skip to content
Chris Fregly edited this page Aug 26, 2016 · 1 revision
  • This repo comes with many example applications to help you get started
  • You are encouraged to start with one of these examples when building your own custom apps or libraries
  • The root of these examples is $MYAPPS_HOME
  • Some are Spark Apps that need to be submitted to a Spark Cluster using submit-spark
  • Some are Standalone Apps with their own main() method
  • Some are packages meant to be imported and used by Spark Apps, Standalone Apps, Notebooks, etc
  • Below is a high-level description of each of these examples separated into different paths within $MYAPPS_HOME
ls -l $MYAPPS_HOME
...
airflow/
akka/
codegen/
flink/
html/
jupyter/
kafka/
nifi/
pmml/
serving/    
spark/
tensorflow/
titan/
zeppelin/

Airflow

Akka/Feeder

  • akka/feeder
  • Akka-based App that Feeds Ratings from CSV into a Kafka Topic
  • Simulates Users Posting Ratings to a Kafka Topic

Code Generation/Spark

Flink Streaming

HTML

  • html
  • Demo HTML and JavaScript

Jupyter

  • juptyer
  • Jupyter/iPython notebooks including PySpark, TensorFlow, SciKit-Learn, NetworkX, Matplotlib and various other examples from around the world wide web (aka www)

Kafka

  • kafka
  • Kafka Consumers and Producers
  • Kafka Connectors
  • Kafka Streams

NiFi

  • nifi
  • NiFi Data Flows

PMML + Spark

Model Serving and Online Predicting

Spark Core

  • spark/core
  • Project Tungsten
  • Mechanical Sympathy
  • CPU Cache Aware Algorithms
  • Thread Context Switch Aware Algorithms
  • Efficient Sorting with Sequential and Off-Heap Data
  • Using Linux Perf to Analyze and Compare Algorithms

Spark ML/MLlib, GraphX, and CoreNLP

  • spark/ml
  • Machine Learning
  • Graph Processing
  • Text Analytics and NLP

Spark Redis

Spark SQL

  • spark/sql
  • Custom In-memory DataSource API Implementation
  • Custom Tungsten-Friendly UDF Participating in Catalyst Optimizations

Spark Streaming

  • spark/streaming
  • Read data from Kafka
  • Store data in Cassandra, ElasticSearch, Redis
  • Calculate distinct count using Redis HyperLogLog and Twitter Algebird HyperLogLog
  • Calculate count and heavy hitters using Twitter Algebird CountMin Sketch

TitanDB Graph Database

Zeppelin

  • zeppelin
  • Python and Scala-based notebooks including many new and existing examples from around the web
Clone this wiki locally