Demo Code Layout

Jump to bottom Edit New page

Chris Fregly edited this page Aug 26, 2016 · 1 revision

This repo comes with many example applications to help you get started
You are encouraged to start with one of these examples when building your own custom apps or libraries
The root of these examples is $MYAPPS_HOME
Some are Spark Apps that need to be submitted to a Spark Cluster using submit-spark
Some are Standalone Apps with their own main() method
Some are packages meant to be imported and used by Spark Apps, Standalone Apps, Notebooks, etc
Below is a high-level description of each of these examples separated into different paths within $MYAPPS_HOME

ls -l $MYAPPS_HOME
...
airflow/
akka/
codegen/
flink/
html/
jupyter/
kafka/
nifi/
pmml/
serving/    
spark/
tensorflow/
titan/
zeppelin/

Airflow

airflow
AirFlow Workflow DAGs

Akka/Feeder

akka/feeder
Akka-based App that Feeds Ratings from CSV into a Kafka Topic
Simulates Users Posting Ratings to a Kafka Topic

Code Generation/Spark

codegen/spark
Code Generation of Spark ML Models

Flink Streaming

flink/streaming
Flink Streaming

HTML

html
Demo HTML and JavaScript

Jupyter

juptyer
Jupyter/iPython notebooks including PySpark, TensorFlow, SciKit-Learn, NetworkX, Matplotlib and various other examples from around the world wide web (aka www)

Kafka

kafka
Kafka Consumers and Producers
Kafka Connectors
Kafka Streams

NiFi

nifi
NiFi Data Flows

PMML + Spark

pmml/spark
Generating PMML from Spark ML Models

Model Serving and Online Predicting

Spark Core

spark/core
Project Tungsten
Mechanical Sympathy
CPU Cache Aware Algorithms
Thread Context Switch Aware Algorithms
Efficient Sorting with Sequential and Off-Heap Data
Using Linux Perf to Analyze and Compare Algorithms

Spark ML/MLlib, GraphX, and CoreNLP

spark/ml
Machine Learning
Graph Processing
Text Analytics and NLP

Spark Redis

spark/redis
Spark + Redis Integration

Spark SQL

spark/sql
Custom In-memory DataSource API Implementation
Custom Tungsten-Friendly UDF Participating in Catalyst Optimizations

Spark Streaming

spark/streaming
Read data from Kafka
Store data in Cassandra, ElasticSearch, Redis
Calculate distinct count using Redis HyperLogLog and Twitter Algebird HyperLogLog
Calculate count and heavy hitters using Twitter Algebird CountMin Sketch

TitanDB Graph Database

titan

Zeppelin

zeppelin
Python and Scala-based notebooks including many new and existing examples from around the web

Continue Following the Sidebar From Top to Bottom -->