Data-driven Microservices with Nstack

Problem

Data scientists have mature platforms for realtime, large-scale data processing, but they force a certain structure on work which may not be suitable for all problems. Many modern data-driven tasks now represent microservices: small, decomposed units of compute exposed as APIs or available to be plugged into other parts of the business.

For many data teams, implementing a data-driven service requires architecting and deploying an entire platform, and current microservice or PaaS platforms have bad primitives for handling data-driven tasks and event-streams. Additionally, data-driven services often require inherently complex engineering (e.g. distribution and integration into middleware.)

Thus, data-scientists spent huge amounts of time wiring up complex infrastructure, boiler-plate, and middleware, all of which lie outside of their role and expertise.

Implications

  • Slow time-to-market: building a service has a large fixed infrastructural cost
  • Bureaucratic overhead: data-driven APIs or services require collaboration between data-science and other engineering teams
  • Lack of robustness: robustness suffers as ill-suited infrastructure is used in production

Nstack

Nstack is a platform for building composable, stream-based microservices.

An nstack microservice can be created by combining a regular function or class with a small configuration file. Aside from defining their API, the service developer does not have to deal with any boiler-plate or infrastructure, and can exclusively focus on their business-logic. Services can currently be written in Java, Python, or Haskell, and can contain any operating system, language, or binary packages.

service.py
import nstack  
import tensorflow as tf  
import numpy as np

class ClickAnalytics(nstack.BaseService):

  def analyse(messages) :
      ....
service.yaml
name: ClickAnalytics

stack: python

api: |  
    analyse: (Text) -> (Text)

packages: [libsvm]  

Once built and deployed, an nstack service can be attached to event sources and event sinks. Nstack comes with various integrations, such as Kafka, RabbitMQ, Redis, and the Nstack HTTP Gateway, or organisations can add their own custom sources and sinks.

This example workflow uses our service to process a stream of events from Kafka and write results to RabbitMQ. It can be started via the nstack CLI or DSL.

CLI
$ nstack start ClickAnalytics --source=kafka/click-stream --sink=rabbitMQ/analytics
DSL
nstack> myWorkflow = kafka/click-stream -> ClickAnalytics -> rabbitMQ/analytics  
nstack> start myWorkflow  



Additionally, multiple microservices can be composed together to form workflows. Nstack handles the entire orchestration and passes messages through the workflow.

DSL
nstack> myWorkflow = kafka/click-stream -> ClickCleaning -> ClickAnalytics -> rabbitMQ/analytics  
nstack> start myWorkflow  



In addition to enabling data teams to implement APIs and microservices, nstack enables other developers to implement existing nstack microservices in their own workflows. For instance, a developer in another team could easily deploy a new workflow using ClickAnalytics, which reads from an HTTP endpoint and writes to Elasticsearch:

nstack> newWorkflow = http/endpoint-internal -> ClickAnalytics -> Elasticsearch/analytics  
nstack> start newWorkflow  

Because nstack services are type-safe, developers can help guarantee a service can be implemented elsewhere without issues.

Nstack is platform agnostic, and can be deployed as a virtual appliance, a RedHat RPM, or as an AWS AMI.