+++ title = "Nuclio functions" description = "Nuclio - High performance serverless for data processing and ML" weight = 40 toc = true +++ ## Nuclio Overview [nuclio](https://github.com/nuclio/nuclio) is a high performance serverless platform which runs over docker or kubernetes and automate the development, operation, and scaling of code (written in 8 supported languages). Nuclio is focused on data analytics and ML workloads, it provides extreme performance and parallelism, supports stateful and data intensive workloads, GPU resource optimization, check-pointing, and 14 native triggers/streaming protocols out of the box including HTTP, Cron, batch, Kafka, Kinesis, Google pub/sub, Azure event-hub, MQTT, etc. additional triggers can be added dynamically (e.g. [Twitter feed](https://github.com/v3io/tutorials/blob/master/demos/stocks/read-tweets.ipynb)). nuclio can run in the cloud as a [managed offering](https://www.iguazio.com/), or on any Kubernetes cluster (cloud, on-prem, or edge)
[read more about nuclio ...](https://github.com/nuclio/nuclio) ## Using Nuclio In Data Science Pipelines Nuclio functions can be used in the following ML pipline tasks: - Data collectors, ETL, stream processing - Data preparation and analysis - Hyper parameter model training - Real-time model serving - Feature vector assembly (real-time data preparation) Containerized functions (+ dependent files and spec) can be created directly from a Jupyter Notebook using `%nuclio` magic commands or SDK API calls (see [nuclio-jupyter](https://github.com/nuclio/nuclio-jupyter)), or they can be built/deployed using KubeFlow Pipeline (see: [nuclio pipeline components]()) e.g. if we want to deploy/update Inference functions right after we update an ML model. ## Installing Nuclio over Kubernetes Nuclio [Git repo](https://github.com/nuclio/nuclio) contain detailed documentation on the installation and usage. can also follow this [interactive tutorial](https://www.katacoda.com/javajon/courses/kubernetes-serverless/nuclio). The simplest way to install is using `Helm`, assuming you deployed Helm on your cluster, type the following commands: ``` helm repo add nuclio https://nuclio.github.io/nuclio/charts kubectl create ns nuclio helm install nuclio/nuclio --name=nuclio --namespace=nuclio --set dashboard.nodePort=31000 kubectl -n nuclio get all ``` Browse to the dashboard URL, you can create, test, and manage functions using a visual editor. > Note: you can change the NodePort number or skip that option for in-cluster use. ## Writing and Deploying a Simple Function The simplest way to write a nuclio function is from within Jupyter. the entire Notebook, portions of it, or code files can be turned into functions in a single magic/SDK command. see [the SDK](https://github.com/nuclio/nuclio-jupyter) for detailed documentation. The full notebook with the example below can be [found here](https://github.com/nuclio/nuclio-jupyter/blob/master/docs/nlp-example.ipynb) before you begin install the latest `nuclio-jupyter` package: pip install --upgrade nuclio-jupyter We write and test our code inside a notebook like any other data science code. We add some `%nuclio` magic commands to describe additional configurations such as which packages to install, CPU/Mem/GPU resources, how the code will get triggered (http, cron, stream), environment variables, additional files we want to bundle (e.g. ML model, libraries), versioning, etc. First we need to import `nuclio` package (we add an `ignore` comment so this line wont be compiled later): ```python # nuclio: ignore import nuclio ``` We add function spec, environment, configuration details using magic commands: ``` %nuclio cmd pip install textblob %nuclio env TO_LANG=fr %nuclio config spec.build.baseImage = "python:3.6-jessie" ``` and we write our code as usual, just make sure we have a handler function which is invoked to initiate our run. The function accepts a context and an event, e.g.: `def handler(context, event)` **Function code** the following example show accepting text and doing NLP processing (correction, translation, sentiments): ```python from textblob import TextBlob import os def handler(context, event): context.logger.info('This is an NLP example! ') # process and correct the text blob = TextBlob(str(event.body.decode('utf-8'))) corrected = blob.correct() # debug print the text before and after correction context.logger.info_with("Corrected text", corrected=str(corrected), orig=str(blob)) # calculate sentiments context.logger.info_with("Sentiment", polarity=str(corrected.sentiment.polarity), subjectivity=str(corrected.sentiment.subjectivity)) # read target language from environment and return translated text lang = os.getenv('TO_LANG','fr') return str(corrected.translate(to=lang)) ``` Now we can test the function using a built-in function context and examine its output ```python # nuclio: ignore event = nuclio.Event(body=b'good morninng') handler(context, event) ``` Finally we deploy our function using the magic commands, SDK, or KubeFlow Pipeline. we can simply write and run the following command a cell: `%nuclio deploy -n nlp -p ai -d ` if we want more control we can use the SDK: ```python # nuclio: ignore # deploy the notebook code with extra configuration (env vars, config, etc.) spec = nuclio.ConfigSpec(config={'spec.maxReplicas': 2}, env={'EXTRA_VAR': 'something'}) addr = nuclio.deploy_file(name='nlp',project='ai',verbose=True, spec=spec, tag='v1.1', dashboard_url='') # invoke the generated function resp = requests.get('http://' + addr) print(resp.text) ``` We can also deploy our function directly from Git: ```python addr = nuclio.deploy_file('git://github.com/nuclio/nuclio#master:/hack/examples/python/helloworld', name='hw', project='myproj', dashboard_url='') resp = requests.get('http://' + addr) print(resp.text) ``` and we can deploy functions as part of a KubeFlow pipeline step: ```python nuclio_dep = kfp.components.load_component_from_file('deploy/component.yaml') def my_pipeline(): new_func = nuclio_dep(url='git://github.com/nuclio/nuclio#master:/hack/examples/python/helloworld', name='myfunc', project='myproj', tag='0.11') ... ``` See [nuclio pipline components](https://github.com/kubeflow/pipelines/tree/master/components/nuclio) (allowing to deploy, delete, or invoke functions) > Note: Nuclio is not limited to Python, [see this example](https://github.com/nuclio/nuclio-jupyter/blob/master/docs/nuclio_bash.ipynb) showing how we create a simple `Bash` function from a Notebook, e.g. we can create `Go` functions if we need performance/concurrency for our inference. ## Nuclio function examples Some useful function example Notebooks: - [TensorFlow Serving function](https://github.com/v3io/tutorials/blob/master/demos/image-classification/infer.ipynb) - [Predictive Infrastructure Monitoring (Scikit Learn)](https://github.com/v3io/tutorials/blob/master/demos/netops/04-infer.ipynb) - [Twitter Feed NLP](https://github.com/v3io/tutorials/blob/master/demos/stocks/read-tweets.ipynb) - [Real-time Stock data reader](https://github.com/v3io/tutorials/blob/master/demos/stocks/read-stocks.ipynb)