{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# VectorStore: Chroma and OpenAI Embeddings Basic Examples\n", "\n", "This notebook demonstrates how to use the `ChromaVectorStore` in `dapr-agents` for storing, querying, and filtering documents. We will explore:\n", "\n", "* Initializing the `OpenAIEmbedder` embedding function and `ChromaVectorStore`.\n", "* Adding documents with text and metadata.\n", "* Retrieving documents by ID.\n", "* Updating documents.\n", "* Deleting documents.\n", "* Performing similarity searches.\n", "* Filtering results based on metadata." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Install Required Libraries\n", "Before starting, ensure the required libraries are installed:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "!pip install dapr-agents python-dotenv chromadb" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Load Environment Variables\n", "\n", "Load API keys or other configuration values from your `.env` file using `dotenv`." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "True" ] }, "execution_count": 1, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from dotenv import load_dotenv\n", "load_dotenv()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Initialize OpenAI Embedding Function\n", "\n", "The default embedding function is `SentenceTransformerEmbedder`, but for this example we will use the `OpenAIEmbedder`." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "from dapr_agents.document.embedder import OpenAIEmbedder\n", "\n", "embedding_funciton = OpenAIEmbedder(\n", " model = \"text-embedding-ada-002\",\n", " encoding_name=\"cl100k_base\"\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Initializing the ChromaVectorStore\n", "\n", "To start, create an instance of the `ChromaVectorStore`. You can customize its parameters if needed, such as enabling persistence or specifying the embedding_function." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "from dapr_agents.storage import ChromaVectorStore\n", "\n", "# Initialize ChromaVectorStore\n", "store = ChromaVectorStore(\n", " name=\"example_collection\", # Name of the collection\n", " embedding_function=embedding_funciton,\n", " persistent=False, # No persistence for this example\n", " host=\"localhost\", # Host for the Chroma server\n", " port=8000 # Port for the Chroma server\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Adding Documents\n", "We will use Document objects to add content to the collection. Each Document includes text and optional metadata." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Creating Documents" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "from dapr_agents.types.document import Document\n", "\n", "# Example Lord of the Rings-inspired conversations\n", "documents = [\n", " Document(\n", " text=\"Gandalf: A wizard is never late, Frodo Baggins. Nor is he early; he arrives precisely when he means to.\",\n", " metadata={\"topic\": \"wisdom\", \"location\": \"The Shire\"}\n", " ),\n", " Document(\n", " text=\"Frodo: I wish the Ring had never come to me. I wish none of this had happened.\",\n", " metadata={\"topic\": \"destiny\", \"location\": \"Moria\"}\n", " ),\n", " Document(\n", " text=\"Aragorn: You cannot wield it! None of us can. The One Ring answers to Sauron alone. It has no other master.\",\n", " metadata={\"topic\": \"power\", \"location\": \"Rivendell\"}\n", " ),\n", " Document(\n", " text=\"Sam: I can't carry it for you, but I can carry you!\",\n", " metadata={\"topic\": \"friendship\", \"location\": \"Mount Doom\"}\n", " ),\n", " Document(\n", " text=\"Legolas: A red sun rises. Blood has been spilled this night.\",\n", " metadata={\"topic\": \"war\", \"location\": \"Rohan\"}\n", " ),\n", " Document(\n", " text=\"Gimli: Certainty of death. Small chance of success. What are we waiting for?\",\n", " metadata={\"topic\": \"bravery\", \"location\": \"Helm's Deep\"}\n", " ),\n", " Document(\n", " text=\"Boromir: One does not simply walk into Mordor.\",\n", " metadata={\"topic\": \"impossible tasks\", \"location\": \"Rivendell\"}\n", " ),\n", " Document(\n", " text=\"Galadriel: Even the smallest person can change the course of the future.\",\n", " metadata={\"topic\": \"hope\", \"location\": \"Lothlórien\"}\n", " ),\n", " Document(\n", " text=\"Théoden: So it begins.\",\n", " metadata={\"topic\": \"battle\", \"location\": \"Helm's Deep\"}\n", " ),\n", " Document(\n", " text=\"Elrond: The strength of the Ring-bearer is failing. In his heart, Frodo begins to understand. The quest will claim his life.\",\n", " metadata={\"topic\": \"sacrifice\", \"location\": \"Rivendell\"}\n", " )\n", "]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Adding Documents to the Collection" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Number of documents in the collection: 10\n" ] } ], "source": [ "store.add_documents(documents=documents)\n", "print(f\"Number of documents in the collection: {store.count()}\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Retrieving Documents\n", "\n", "Retrieve documents by their IDs or fetch all items in the collection." ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Retrieved documents:\n", "ID: 82f3b922-c64c-4ad1-a632-ea9f8d13a19a, Text: Gandalf: A wizard is never late, Frodo Baggins. Nor is he early; he arrives precisely when he means to., Metadata: {'location': 'The Shire', 'topic': 'wisdom'}\n", "ID: f5a45d8b-7f8f-4516-a54a-d9ef3c39db53, Text: Frodo: I wish the Ring had never come to me. I wish none of this had happened., Metadata: {'location': 'Moria', 'topic': 'destiny'}\n", "ID: 7fead849-c4eb-42ce-88ca-ca62fe9f51a4, Text: Aragorn: You cannot wield it! None of us can. The One Ring answers to Sauron alone. It has no other master., Metadata: {'location': 'Rivendell', 'topic': 'power'}\n", "ID: ebd6c642-c8f4-4f45-a75e-4a5acdf33ad5, Text: Sam: I can't carry it for you, but I can carry you!, Metadata: {'location': 'Mount Doom', 'topic': 'friendship'}\n", "ID: 1dc4da81-cbfc-417b-ad71-120fae505842, Text: Legolas: A red sun rises. Blood has been spilled this night., Metadata: {'location': 'Rohan', 'topic': 'war'}\n", "ID: d1ed1836-c0d8-491c-a813-2c5a2688b2d1, Text: Gimli: Certainty of death. Small chance of success. What are we waiting for?, Metadata: {'location': \"Helm's Deep\", 'topic': 'bravery'}\n", "ID: 6fe3f229-bf74-4eea-8fe4-fc38efb2cf9a, Text: Boromir: One does not simply walk into Mordor., Metadata: {'location': 'Rivendell', 'topic': 'impossible tasks'}\n", "ID: 081453e4-0a56-4e78-927b-79289735e8a4, Text: Galadriel: Even the smallest person can change the course of the future., Metadata: {'location': 'Lothlórien', 'topic': 'hope'}\n", "ID: a45db7d1-4224-4e42-b51d-bdb4593b5cf5, Text: Théoden: So it begins., Metadata: {'location': \"Helm's Deep\", 'topic': 'battle'}\n", "ID: 5258d6f6-1f1b-459d-a04e-c96f58d76fca, Text: Elrond: The strength of the Ring-bearer is failing. In his heart, Frodo begins to understand. The quest will claim his life., Metadata: {'location': 'Rivendell', 'topic': 'sacrifice'}\n" ] } ], "source": [ "# Retrieve all documents\n", "retrieved_docs = store.get()\n", "print(\"Retrieved documents:\")\n", "for doc in retrieved_docs:\n", " print(f\"ID: {doc['id']}, Text: {doc['document']}, Metadata: {doc['metadata']}\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Updating Documents\n", "\n", "You can update existing documents' text or metadata using their IDs." ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Updated document: [{'id': '82f3b922-c64c-4ad1-a632-ea9f8d13a19a', 'metadata': {'location': 'Fangorn Forest', 'topic': 'hope and wisdom'}, 'document': 'Gandalf: Even the wisest cannot foresee all ends, but hope remains while the Company is true.'}]\n" ] } ], "source": [ "# Retrieve a document by its ID\n", "retrieved_docs = store.get() # Get all documents to find the ID\n", "doc_id = retrieved_docs[0]['id'] # Select the first document's ID for this example\n", "\n", "# Define updated text and metadata\n", "updated_text = \"Gandalf: Even the wisest cannot foresee all ends, but hope remains while the Company is true.\"\n", "updated_metadata = {\"topic\": \"hope and wisdom\", \"location\": \"Fangorn Forest\"}\n", "\n", "# Update the document's text and metadata in the store\n", "store.update(ids=[doc_id], documents=[updated_text], metadatas=[updated_metadata])\n", "\n", "# Verify the update\n", "updated_doc = store.get(ids=[doc_id])\n", "print(f\"Updated document: {updated_doc}\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Deleting Documents\n", "\n", "Delete documents by their IDs." ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Number of documents after deletion: 9\n" ] } ], "source": [ "# Delete a document by ID\n", "doc_id_to_delete = retrieved_docs[2]['id']\n", "store.delete(ids=[doc_id_to_delete])\n", "\n", "# Verify deletion\n", "print(f\"Number of documents after deletion: {store.count()}\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Similarity Search\n", "\n", "Perform a similarity search using text queries. The embedding function automatically generates embeddings for the input query." ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Similarity search results:\n", "Text: ['Gandalf: Even the wisest cannot foresee all ends, but hope remains while the Company is true.', 'Galadriel: Even the smallest person can change the course of the future.']\n", "Metadata: [{'location': 'Fangorn Forest', 'topic': 'hope and wisdom'}, {'location': 'Lothlórien', 'topic': 'hope'}]\n" ] } ], "source": [ "# Search for similar documents based on a query\n", "query = \"wise advice\"\n", "results = store.search_similar(query_texts=query, k=2)\n", "\n", "# Display results\n", "print(\"Similarity search results:\")\n", "for doc, metadata in zip(results[\"documents\"], results[\"metadatas\"]):\n", " print(f\"Text: {doc}\")\n", " print(f\"Metadata: {metadata}\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Filtering Results\n", "\n", "Filter results based on metadata." ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [], "source": [ "# Search for documents with specific metadata filters\n", "filter_conditions = {\n", " \"$and\": [\n", " {\"location\": {\"$eq\": \"Fangorn Forest\"}},\n", " {\"topic\": {\"$eq\": \"hope and wisdom\"}}\n", " ]\n", "}\n", "\n", "filtered_results = store.query_with_filters(query_texts=[\"journey\"], where=filter_conditions, k=3)" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{'ids': [['82f3b922-c64c-4ad1-a632-ea9f8d13a19a']],\n", " 'embeddings': None,\n", " 'documents': [['Gandalf: Even the wisest cannot foresee all ends, but hope remains while the Company is true.']],\n", " 'uris': None,\n", " 'data': None,\n", " 'metadatas': [[{'location': 'Fangorn Forest', 'topic': 'hope and wisdom'}]],\n", " 'distances': [[0.21403032541275024]],\n", " 'included': [,\n", " ,\n", " ]}" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "filtered_results" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Resetting the Database\n", "\n", "Reset the database to clear all stored data." ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "['example_collection']" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "store.client.list_collections()" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [], "source": [ "# Reset the collection\n", "store.reset()" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[]" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "store.client.list_collections()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": ".venv", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.12.1" } }, "nbformat": 4, "nbformat_minor": 2 }