forked from andypetrella/pipeline
-
Notifications
You must be signed in to change notification settings - Fork 0
Demo Code Layout
Chris Fregly edited this page Aug 26, 2016
·
1 revision
- This repo comes with many example applications to help you get started
- You are encouraged to start with one of these examples when building your own custom apps or libraries
- The root of these examples is
$MYAPPS_HOME
- Some are Spark Apps that need to be submitted to a Spark Cluster using
submit-spark
- Some are Standalone Apps with their own
main()
method - Some are packages meant to be imported and used by Spark Apps, Standalone Apps, Notebooks, etc
- Below is a high-level description of each of these examples separated into different paths within
$MYAPPS_HOME
ls -l $MYAPPS_HOME
...
airflow/
akka/
codegen/
flink/
html/
jupyter/
kafka/
nifi/
pmml/
serving/
spark/
tensorflow/
titan/
zeppelin/
- airflow
- AirFlow Workflow DAGs
- akka/feeder
- Akka-based App that Feeds Ratings from CSV into a Kafka Topic
- Simulates Users Posting Ratings to a Kafka Topic
- codegen/spark
- Code Generation of Spark ML Models
- flink/streaming
- Flink Streaming
- html
- Demo HTML and JavaScript
- juptyer
- Jupyter/iPython notebooks including PySpark, TensorFlow, SciKit-Learn, NetworkX, Matplotlib and various other examples from around the world wide web (aka www)
- kafka
- Kafka Consumers and Producers
- Kafka Connectors
- Kafka Streams
- nifi
- NiFi Data Flows
- pmml/spark
- Generating PMML from Spark ML Models
- serving/discovery
- serving/finagle
- serving/flask
- serving/recommendation
- serving/tensorflow
- serving/watcher
- spark/core
- Project Tungsten
- Mechanical Sympathy
- CPU Cache Aware Algorithms
- Thread Context Switch Aware Algorithms
- Efficient Sorting with Sequential and Off-Heap Data
- Using Linux Perf to Analyze and Compare Algorithms
- spark/ml
- Machine Learning
- Graph Processing
- Text Analytics and NLP
- spark/redis
- Spark + Redis Integration
- spark/sql
- Custom In-memory DataSource API Implementation
- Custom Tungsten-Friendly UDF Participating in Catalyst Optimizations
- spark/streaming
- Read data from Kafka
- Store data in Cassandra, ElasticSearch, Redis
- Calculate distinct count using Redis HyperLogLog and Twitter Algebird HyperLogLog
- Calculate count and heavy hitters using Twitter Algebird CountMin Sketch
- zeppelin
- Python and Scala-based notebooks including many new and existing examples from around the web
Environment Setup
Demos
6. Serve Batch Recommendations
8. Streaming Probabilistic Algos
9. TensorFlow Image Classifier
Active Research (Unstable)
15. Kubernetes Docker Spark ML
Managing Environment
15. Stop and Start Environment