forked from andypetrella/pipeline
-
Notifications
You must be signed in to change notification settings - Fork 0
Home
Chris Fregly edited this page Sep 8, 2016
·
157 revisions
Building an End-to-End Streaming Analytics and Recommendations Pipeline with Spark, Kafka, and TensorFlow
Part 1 (Analytics and Visualizations)
- Analytics and Visualizations Overview (Live Demo!)
- Verify Environment Setup (Docker, Cloud Instance)
- Notebooks (Zeppelin, Jupyter/iPython)
- Interactive Data Analytics (Spark SQL, Hive, Presto)
- Graph Analytics (Spark, Elastic, NetworkX, TitanDB)
- Time-series Analytics (Spark, Cassandra)
- Visualizations (Kibana, Matplotlib, D3)
- Approximate Queries (Spark SQL, Redis, Algebird)
- Workflow Management (Airflow)
Part 2 (Streaming and Recommendations)
- Streaming and Recommendations (Live Demo!)
- Streaming (NiFi, Kafka, Spark Streaming, Flink)
- Cluster-based Recommendation (Spark ML, Scikit-Learn)
- Graph-based Recommendation (Spark ML, Spark Graph)
- Collaborative-based Recommendation (Spark ML)
- NLP-based Recommendation (CoreNLP, NLTK)
- Geo-based Recommendation (ElasticSearch)
- Hybrid On-Premise+Cloud Auto-scale Deploy (Docker)
- Save Workshop Environment for Your Use Cases
- San Francisco: Saturday, April 23rd (SOLD OUT)
- San Francisco: Saturday, June 4th (SOLD OUT)
- Washington DC: Saturday, June 18th (SOLD OUT)
- Los Angeles: Sunday, July 10th (SOLD OUT)
- Seattle: Saturday, July 30th (SOLD OUT)
- Santa Clara: Saturday, August 6th (SOLD OUT)
- Chicago: Saturday, August 27th (SOLD OUT)
- Atlanta: Sunday, September 25th
- New York: Saturday, October 1st
- Munich: Saturday, October 15th
- London: Saturday, October 22nd
- Brussels: Saturday, October 29th
- Madrid: Saturday, November 19th
- Tokyo: December 3rd
- Shanghai: December 10th
- Beijing: Saturday, December 17th
- Hyderabad: Saturday, December 24th
- Bangalore: Saturday, December 31st
- Sydney: Saturday, January 7th, 2017
- Melbourne: Saturday, January 14th, 2017
- Sao Paulo: Saturday, February 11th, 2017
- Rio de Janeiro: Saturday, February 18th, 2017
The goal of this workshop is to build an end-to-end, streaming data analytics and recommendations pipeline on your local machine using Docker and the latest streaming analytics
- First, we create a data pipeline to interactively analyze, approximate, and visualize streaming data using modern tools such as Apache Spark, Kafka, Zeppelin, iPython, and ElasticSearch.
- Next, we extend our pipeline to use streaming data to generate personalized recommendation models using popular machine learning, graph, and natural language processing techniques such as collaborative filtering, clustering, and topic modeling.
- Last, we productionize our pipeline and serve live recommendations to our users!
Follow Wiki to Setup Docker-based Environment
Environment Setup
Demos
6. Serve Batch Recommendations
8. Streaming Probabilistic Algos
9. TensorFlow Image Classifier
Active Research (Unstable)
15. Kubernetes Docker Spark ML
Managing Environment
15. Stop and Start Environment