Skip to content
Chris Fregly edited this page Jul 22, 2016 · 1 revision

Why PipelineIO?

Docker

  • Docker and Docker Networking are finally usable
  • Native Docker Mac and Windows are almost ready
  • Syncing local dev and production environments is much easier

Cluster Orchestration

  • Kubernetes and Docker Swarm are being embraced by everyone including AWS, Azure, Google - and available On-Premise
  • Netflix's Spinnaker is being adopted everywhere including AWS (Netflix), Azure (dedicated Microsoft engineers), and Google (dedicated Google engineers)

TensorFlow

  • TensorFlow Serving brings the serving layer to the spotlight

Spark 2.0+

  • Spark 2.0 enables model saving/loading across languages (Python for Data Scientist, Scala/Java/C++ for Services Engineer)

Globally-Replicated Redis Cache

  • Netflix's new Open Source project, Dynomite, fills missing piece in distributed, fault-tolerant, cross-region caching layer built on top of Redis (and any ephemeral or persistent store)

PipelineIO Goals

  • enable one-ML/AI developer to build ML/AI apps
  • become MySQL for ML/AI
  • enable real-time predictions and real-time model updates
  • real-time incorporates business logic like time sensitivity and geo sensitivity
  • emphasize prediction explainability -- no black-box models
  • simple <-> flexible tradeoff: automation (simple), gui (middle ground), custom code & scripts (flexible)
  • enable tradeoff choices as every prediction problem is unique
  • best tools go unnoticed and stay out of the way
  • best tools reduce friction
  • best tools adapt to the user
  • best tools are simple yet powerful

PipelineIO Tool Support

Notebooks

  • Jupyter/iPython

Streaming

  • Kafka Streams
  • Spark Streaming

Metrics

  • Ganglia
  • Graphite

Monitoring

  • Nagios
  • PagerDuty

Data Store

  • Redis (Dynomite on top of Redis)

Pipeline

  • Spinnaker

Cluster Resource Manager

  • Spark Standalone

Cluster Orchestration

  • Kubernetes

ML

  • Spark ML (Scala & Python)
  • C++ (Quora QMF)

Neural Networks

  • TensorFlow
  • Caffe

Language

  • Python
  • Scala/Java
  • C/C++

GPU

  • Nvidia
  • CUDA

Data Analysis

  • Presto
  • SparkR

Streaming

  • Flink/Streaming
  • NiFi

Deep Learning

  • SparkNet
  • Caffe
  • Theano
  • DeepLearning4J

Language

  • R
Clone this wiki locally