examples/Natural-Language-Processing/3. Jupyter Notebook/Jupyter Notebook.md

3.5 KiB
Raw Blame History

Jupyter Notebook

Please put Cornell-1000-nltk.ipynb and Twitter-5000-nltk.ipynb into the folder of Jupyter Notebook first.
If you use Minikube to install Kubeflow, the folder of Jupyter Notebook will usually be in

/tmp/hostpath-provisioner/kubeflow-user-example-com/workspace-<your Jupyter name>

Pipeline

Cornell-1000.zip and twitter-5000.zip are compressed files generated after executing Cornell-1000-nltk.ipynb and Twitter-5000-nltk.ipynb.
The content of the compressed file is the yaml file of the pipeline.

pipeline

Custom data

Twitter-5000-nltk and Cornell-1000-nltk use similar code, and the difference is in downloading and reading data.
If you want to use other data, you only need to classify the data and save it in str format into pos_tweets and neg_tweets.

data list

Port Forward

Step 1Find the pod name of Http port

nltk pod

Step 2Port-forward

kubectl port-forward -n kubeflow-user-example-com <pod name> 3000:5000

nltk pod port forward

Step 3Input in the browser

http://localhost:3000/

or

127.0.0.1:3000

NLP

Step 4Predict

nice to meet you
i hate you

Accuracy

You can confirm the accuracy of the NLP individually,
twitter cornell
or you can use a comparison run for comparison.

compare

Disabling caching in your Kubeflow Pipelines deployment

If you delete the pvc and execute the pipeline again, you find that it does not work properly, it may be a cache problem.
The following command can be executed to disable the cache.

export NAMESPACE=kubeflow
kubectl patch mutatingwebhookconfiguration cache-webhook-${NAMESPACE} --type='json' -p='[{"op":"replace", "path": "/webhooks/0/rules/0/operations/0", "value": "DELETE"}]'

Relevant part