dapr-agents/cookbook/vectorstores/chroma_openai_embeddings.ipynb

500 lines
15 KiB
Plaintext

{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# VectorStore: Chroma and OpenAI Embeddings Basic Examples\n",
"\n",
"This notebook demonstrates how to use the `ChromaVectorStore` in `dapr-agents` for storing, querying, and filtering documents. We will explore:\n",
"\n",
"* Initializing the `OpenAIEmbedder` embedding function and `ChromaVectorStore`.\n",
"* Adding documents with text and metadata.\n",
"* Retrieving documents by ID.\n",
"* Updating documents.\n",
"* Deleting documents.\n",
"* Performing similarity searches.\n",
"* Filtering results based on metadata."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Install Required Libraries\n",
"Before starting, ensure the required libraries are installed:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"!pip install dapr-agents python-dotenv chromadb"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Load Environment Variables\n",
"\n",
"Load API keys or other configuration values from your `.env` file using `dotenv`."
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"True"
]
},
"execution_count": 1,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from dotenv import load_dotenv\n",
"load_dotenv()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Initialize OpenAI Embedding Function\n",
"\n",
"The default embedding function is `SentenceTransformerEmbedder`, but for this example we will use the `OpenAIEmbedder`."
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"from dapr_agents.document.embedder import OpenAIEmbedder\n",
"\n",
"embedding_funciton = OpenAIEmbedder(\n",
" model = \"text-embedding-ada-002\",\n",
" encoding_name=\"cl100k_base\"\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Initializing the ChromaVectorStore\n",
"\n",
"To start, create an instance of the `ChromaVectorStore`. You can customize its parameters if needed, such as enabling persistence or specifying the embedding_function."
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [],
"source": [
"from dapr_agents.storage import ChromaVectorStore\n",
"\n",
"# Initialize ChromaVectorStore\n",
"store = ChromaVectorStore(\n",
" name=\"example_collection\", # Name of the collection\n",
" embedding_function=embedding_funciton,\n",
" persistent=False, # No persistence for this example\n",
" host=\"localhost\", # Host for the Chroma server\n",
" port=8000 # Port for the Chroma server\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Adding Documents\n",
"We will use Document objects to add content to the collection. Each Document includes text and optional metadata."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Creating Documents"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [],
"source": [
"from dapr_agents.types.document import Document\n",
"\n",
"# Example Lord of the Rings-inspired conversations\n",
"documents = [\n",
" Document(\n",
" text=\"Gandalf: A wizard is never late, Frodo Baggins. Nor is he early; he arrives precisely when he means to.\",\n",
" metadata={\"topic\": \"wisdom\", \"location\": \"The Shire\"}\n",
" ),\n",
" Document(\n",
" text=\"Frodo: I wish the Ring had never come to me. I wish none of this had happened.\",\n",
" metadata={\"topic\": \"destiny\", \"location\": \"Moria\"}\n",
" ),\n",
" Document(\n",
" text=\"Aragorn: You cannot wield it! None of us can. The One Ring answers to Sauron alone. It has no other master.\",\n",
" metadata={\"topic\": \"power\", \"location\": \"Rivendell\"}\n",
" ),\n",
" Document(\n",
" text=\"Sam: I can't carry it for you, but I can carry you!\",\n",
" metadata={\"topic\": \"friendship\", \"location\": \"Mount Doom\"}\n",
" ),\n",
" Document(\n",
" text=\"Legolas: A red sun rises. Blood has been spilled this night.\",\n",
" metadata={\"topic\": \"war\", \"location\": \"Rohan\"}\n",
" ),\n",
" Document(\n",
" text=\"Gimli: Certainty of death. Small chance of success. What are we waiting for?\",\n",
" metadata={\"topic\": \"bravery\", \"location\": \"Helm's Deep\"}\n",
" ),\n",
" Document(\n",
" text=\"Boromir: One does not simply walk into Mordor.\",\n",
" metadata={\"topic\": \"impossible tasks\", \"location\": \"Rivendell\"}\n",
" ),\n",
" Document(\n",
" text=\"Galadriel: Even the smallest person can change the course of the future.\",\n",
" metadata={\"topic\": \"hope\", \"location\": \"Lothlórien\"}\n",
" ),\n",
" Document(\n",
" text=\"Théoden: So it begins.\",\n",
" metadata={\"topic\": \"battle\", \"location\": \"Helm's Deep\"}\n",
" ),\n",
" Document(\n",
" text=\"Elrond: The strength of the Ring-bearer is failing. In his heart, Frodo begins to understand. The quest will claim his life.\",\n",
" metadata={\"topic\": \"sacrifice\", \"location\": \"Rivendell\"}\n",
" )\n",
"]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Adding Documents to the Collection"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Number of documents in the collection: 10\n"
]
}
],
"source": [
"store.add_documents(documents=documents)\n",
"print(f\"Number of documents in the collection: {store.count()}\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Retrieving Documents\n",
"\n",
"Retrieve documents by their IDs or fetch all items in the collection."
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Retrieved documents:\n",
"ID: 82f3b922-c64c-4ad1-a632-ea9f8d13a19a, Text: Gandalf: A wizard is never late, Frodo Baggins. Nor is he early; he arrives precisely when he means to., Metadata: {'location': 'The Shire', 'topic': 'wisdom'}\n",
"ID: f5a45d8b-7f8f-4516-a54a-d9ef3c39db53, Text: Frodo: I wish the Ring had never come to me. I wish none of this had happened., Metadata: {'location': 'Moria', 'topic': 'destiny'}\n",
"ID: 7fead849-c4eb-42ce-88ca-ca62fe9f51a4, Text: Aragorn: You cannot wield it! None of us can. The One Ring answers to Sauron alone. It has no other master., Metadata: {'location': 'Rivendell', 'topic': 'power'}\n",
"ID: ebd6c642-c8f4-4f45-a75e-4a5acdf33ad5, Text: Sam: I can't carry it for you, but I can carry you!, Metadata: {'location': 'Mount Doom', 'topic': 'friendship'}\n",
"ID: 1dc4da81-cbfc-417b-ad71-120fae505842, Text: Legolas: A red sun rises. Blood has been spilled this night., Metadata: {'location': 'Rohan', 'topic': 'war'}\n",
"ID: d1ed1836-c0d8-491c-a813-2c5a2688b2d1, Text: Gimli: Certainty of death. Small chance of success. What are we waiting for?, Metadata: {'location': \"Helm's Deep\", 'topic': 'bravery'}\n",
"ID: 6fe3f229-bf74-4eea-8fe4-fc38efb2cf9a, Text: Boromir: One does not simply walk into Mordor., Metadata: {'location': 'Rivendell', 'topic': 'impossible tasks'}\n",
"ID: 081453e4-0a56-4e78-927b-79289735e8a4, Text: Galadriel: Even the smallest person can change the course of the future., Metadata: {'location': 'Lothlórien', 'topic': 'hope'}\n",
"ID: a45db7d1-4224-4e42-b51d-bdb4593b5cf5, Text: Théoden: So it begins., Metadata: {'location': \"Helm's Deep\", 'topic': 'battle'}\n",
"ID: 5258d6f6-1f1b-459d-a04e-c96f58d76fca, Text: Elrond: The strength of the Ring-bearer is failing. In his heart, Frodo begins to understand. The quest will claim his life., Metadata: {'location': 'Rivendell', 'topic': 'sacrifice'}\n"
]
}
],
"source": [
"# Retrieve all documents\n",
"retrieved_docs = store.get()\n",
"print(\"Retrieved documents:\")\n",
"for doc in retrieved_docs:\n",
" print(f\"ID: {doc['id']}, Text: {doc['document']}, Metadata: {doc['metadata']}\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Updating Documents\n",
"\n",
"You can update existing documents' text or metadata using their IDs."
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Updated document: [{'id': '82f3b922-c64c-4ad1-a632-ea9f8d13a19a', 'metadata': {'location': 'Fangorn Forest', 'topic': 'hope and wisdom'}, 'document': 'Gandalf: Even the wisest cannot foresee all ends, but hope remains while the Company is true.'}]\n"
]
}
],
"source": [
"# Retrieve a document by its ID\n",
"retrieved_docs = store.get() # Get all documents to find the ID\n",
"doc_id = retrieved_docs[0]['id'] # Select the first document's ID for this example\n",
"\n",
"# Define updated text and metadata\n",
"updated_text = \"Gandalf: Even the wisest cannot foresee all ends, but hope remains while the Company is true.\"\n",
"updated_metadata = {\"topic\": \"hope and wisdom\", \"location\": \"Fangorn Forest\"}\n",
"\n",
"# Update the document's text and metadata in the store\n",
"store.update(ids=[doc_id], documents=[updated_text], metadatas=[updated_metadata])\n",
"\n",
"# Verify the update\n",
"updated_doc = store.get(ids=[doc_id])\n",
"print(f\"Updated document: {updated_doc}\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Deleting Documents\n",
"\n",
"Delete documents by their IDs."
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Number of documents after deletion: 9\n"
]
}
],
"source": [
"# Delete a document by ID\n",
"doc_id_to_delete = retrieved_docs[2]['id']\n",
"store.delete(ids=[doc_id_to_delete])\n",
"\n",
"# Verify deletion\n",
"print(f\"Number of documents after deletion: {store.count()}\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Similarity Search\n",
"\n",
"Perform a similarity search using text queries. The embedding function automatically generates embeddings for the input query."
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Similarity search results:\n",
"Text: ['Gandalf: Even the wisest cannot foresee all ends, but hope remains while the Company is true.', 'Galadriel: Even the smallest person can change the course of the future.']\n",
"Metadata: [{'location': 'Fangorn Forest', 'topic': 'hope and wisdom'}, {'location': 'Lothlórien', 'topic': 'hope'}]\n"
]
}
],
"source": [
"# Search for similar documents based on a query\n",
"query = \"wise advice\"\n",
"results = store.search_similar(query_texts=query, k=2)\n",
"\n",
"# Display results\n",
"print(\"Similarity search results:\")\n",
"for doc, metadata in zip(results[\"documents\"], results[\"metadatas\"]):\n",
" print(f\"Text: {doc}\")\n",
" print(f\"Metadata: {metadata}\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Filtering Results\n",
"\n",
"Filter results based on metadata."
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [],
"source": [
"# Search for documents with specific metadata filters\n",
"filter_conditions = {\n",
" \"$and\": [\n",
" {\"location\": {\"$eq\": \"Fangorn Forest\"}},\n",
" {\"topic\": {\"$eq\": \"hope and wisdom\"}}\n",
" ]\n",
"}\n",
"\n",
"filtered_results = store.query_with_filters(query_texts=[\"journey\"], where=filter_conditions, k=3)"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'ids': [['82f3b922-c64c-4ad1-a632-ea9f8d13a19a']],\n",
" 'embeddings': None,\n",
" 'documents': [['Gandalf: Even the wisest cannot foresee all ends, but hope remains while the Company is true.']],\n",
" 'uris': None,\n",
" 'data': None,\n",
" 'metadatas': [[{'location': 'Fangorn Forest', 'topic': 'hope and wisdom'}]],\n",
" 'distances': [[0.21403032541275024]],\n",
" 'included': [<IncludeEnum.distances: 'distances'>,\n",
" <IncludeEnum.documents: 'documents'>,\n",
" <IncludeEnum.metadatas: 'metadatas'>]}"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"filtered_results"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Resetting the Database\n",
"\n",
"Reset the database to clear all stored data."
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"['example_collection']"
]
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"store.client.list_collections()"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [],
"source": [
"# Reset the collection\n",
"store.reset()"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[]"
]
},
"execution_count": 14,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"store.client.list_collections()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": ".venv",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.12.1"
}
},
"nbformat": 4,
"nbformat_minor": 2
}