# VectorStore: Chroma and OpenAI Embeddings Basic Examples

This notebook demonstrates how to use the `ChromaVectorStore` in `dapr-agents` for storing, querying, and filtering documents. We will explore:

* Initializing the `OpenAIEmbedder` embedding function and `ChromaVectorStore`.
* Adding documents with text and metadata.
* Retrieving documents by ID.
* Updating documents.
* Deleting documents.
* Performing similarity searches.
* Filtering results based on metadata.

## Install Required Libraries
Before starting, ensure the required libraries are installed:

In [None]:
!pip install dapr-agents python-dotenv chromadb

## Load Environment Variables

Load API keys or other configuration values from your `.env` file using `dotenv`.

In [1]:
from dotenv import load_dotenv
load_dotenv()

True

## Initialize OpenAI Embedding Function

The default embedding function is `SentenceTransformerEmbedder`, but for this example we will use the `OpenAIEmbedder`.

In [2]:
from dapr_agents.document.embedder import OpenAIEmbedder

embedding_funciton = OpenAIEmbedder(
    model = "text-embedding-ada-002",
    encoding_name="cl100k_base"
)

## Initializing the ChromaVectorStore

To start, create an instance of the `ChromaVectorStore`. You can customize its parameters if needed, such as enabling persistence or specifying the embedding_function.

In [3]:
from dapr_agents.storage import ChromaVectorStore

# Initialize ChromaVectorStore
store = ChromaVectorStore(
    name="example_collection",  # Name of the collection
    embedding_function=embedding_funciton,
    persistent=False,           # No persistence for this example
    host="localhost",           # Host for the Chroma server
    port=8000                   # Port for the Chroma server
)

## Adding Documents
We will use Document objects to add content to the collection. Each Document includes text and optional metadata.

### Creating Documents

In [4]:
from dapr_agents.types.document import Document

# Example Lord of the Rings-inspired conversations
documents = [
    Document(
        text="Gandalf: A wizard is never late, Frodo Baggins. Nor is he early; he arrives precisely when he means to.",
        metadata={"topic": "wisdom", "location": "The Shire"}
    ),
    Document(
        text="Frodo: I wish the Ring had never come to me. I wish none of this had happened.",
        metadata={"topic": "destiny", "location": "Moria"}
    ),
    Document(
        text="Aragorn: You cannot wield it! None of us can. The One Ring answers to Sauron alone. It has no other master.",
        metadata={"topic": "power", "location": "Rivendell"}
    ),
    Document(
        text="Sam: I can't carry it for you, but I can carry you!",
        metadata={"topic": "friendship", "location": "Mount Doom"}
    ),
    Document(
        text="Legolas: A red sun rises. Blood has been spilled this night.",
        metadata={"topic": "war", "location": "Rohan"}
    ),
    Document(
        text="Gimli: Certainty of death. Small chance of success. What are we waiting for?",
        metadata={"topic": "bravery", "location": "Helm's Deep"}
    ),
    Document(
        text="Boromir: One does not simply walk into Mordor.",
        metadata={"topic": "impossible tasks", "location": "Rivendell"}
    ),
    Document(
        text="Galadriel: Even the smallest person can change the course of the future.",
        metadata={"topic": "hope", "location": "Lothlórien"}
    ),
    Document(
        text="Théoden: So it begins.",
        metadata={"topic": "battle", "location": "Helm's Deep"}
    ),
    Document(
        text="Elrond: The strength of the Ring-bearer is failing. In his heart, Frodo begins to understand. The quest will claim his life.",
        metadata={"topic": "sacrifice", "location": "Rivendell"}
    )
]

### Adding Documents to the Collection

In [5]:
store.add_documents(documents=documents)
print(f"Number of documents in the collection: {store.count()}")

Number of documents in the collection: 10


## Retrieving Documents

Retrieve documents by their IDs or fetch all items in the collection.

In [6]:
# Retrieve all documents
retrieved_docs = store.get()
print("Retrieved documents:")
for doc in retrieved_docs:
    print(f"ID: {doc['id']}, Text: {doc['document']}, Metadata: {doc['metadata']}")

Retrieved documents:
ID: 82f3b922-c64c-4ad1-a632-ea9f8d13a19a, Text: Gandalf: A wizard is never late, Frodo Baggins. Nor is he early; he arrives precisely when he means to., Metadata: {'location': 'The Shire', 'topic': 'wisdom'}
ID: f5a45d8b-7f8f-4516-a54a-d9ef3c39db53, Text: Frodo: I wish the Ring had never come to me. I wish none of this had happened., Metadata: {'location': 'Moria', 'topic': 'destiny'}
ID: 7fead849-c4eb-42ce-88ca-ca62fe9f51a4, Text: Aragorn: You cannot wield it! None of us can. The One Ring answers to Sauron alone. It has no other master., Metadata: {'location': 'Rivendell', 'topic': 'power'}
ID: ebd6c642-c8f4-4f45-a75e-4a5acdf33ad5, Text: Sam: I can't carry it for you, but I can carry you!, Metadata: {'location': 'Mount Doom', 'topic': 'friendship'}
ID: 1dc4da81-cbfc-417b-ad71-120fae505842, Text: Legolas: A red sun rises. Blood has been spilled this night., Metadata: {'location': 'Rohan', 'topic': 'war'}
ID: d1ed1836-c0d8-491c-a813-2c5a2688b2d1, Text: Gimli: Certai

## Updating Documents

You can update existing documents' text or metadata using their IDs.

In [7]:
# Retrieve a document by its ID
retrieved_docs = store.get()  # Get all documents to find the ID
doc_id = retrieved_docs[0]['id']  # Select the first document's ID for this example

# Define updated text and metadata
updated_text = "Gandalf: Even the wisest cannot foresee all ends, but hope remains while the Company is true."
updated_metadata = {"topic": "hope and wisdom", "location": "Fangorn Forest"}

# Update the document's text and metadata in the store
store.update(ids=[doc_id], documents=[updated_text], metadatas=[updated_metadata])

# Verify the update
updated_doc = store.get(ids=[doc_id])
print(f"Updated document: {updated_doc}")

Updated document: [{'id': '82f3b922-c64c-4ad1-a632-ea9f8d13a19a', 'metadata': {'location': 'Fangorn Forest', 'topic': 'hope and wisdom'}, 'document': 'Gandalf: Even the wisest cannot foresee all ends, but hope remains while the Company is true.'}]


## Deleting Documents

Delete documents by their IDs.

In [8]:
# Delete a document by ID
doc_id_to_delete = retrieved_docs[2]['id']
store.delete(ids=[doc_id_to_delete])

# Verify deletion
print(f"Number of documents after deletion: {store.count()}")

Number of documents after deletion: 9


## Similarity Search

Perform a similarity search using text queries. The embedding function automatically generates embeddings for the input query.

In [9]:
# Search for similar documents based on a query
query = "wise advice"
results = store.search_similar(query_texts=query, k=2)

# Display results
print("Similarity search results:")
for doc, metadata in zip(results["documents"], results["metadatas"]):
    print(f"Text: {doc}")
    print(f"Metadata: {metadata}")

Similarity search results:
Text: ['Gandalf: Even the wisest cannot foresee all ends, but hope remains while the Company is true.', 'Galadriel: Even the smallest person can change the course of the future.']
Metadata: [{'location': 'Fangorn Forest', 'topic': 'hope and wisdom'}, {'location': 'Lothlórien', 'topic': 'hope'}]


## Filtering Results

Filter results based on metadata.

In [10]:
# Search for documents with specific metadata filters
filter_conditions = {
    "$and": [
        {"location": {"$eq": "Fangorn Forest"}},
        {"topic": {"$eq": "hope and wisdom"}}
    ]
}

filtered_results = store.query_with_filters(query_texts=["journey"], where=filter_conditions, k=3)

In [11]:
filtered_results

{'ids': [['82f3b922-c64c-4ad1-a632-ea9f8d13a19a']],
 'embeddings': None,
 'documents': [['Gandalf: Even the wisest cannot foresee all ends, but hope remains while the Company is true.']],
 'uris': None,
 'data': None,
 'metadatas': [[{'location': 'Fangorn Forest', 'topic': 'hope and wisdom'}]],
 'distances': [[0.21403032541275024]],
 'included': [<IncludeEnum.distances: 'distances'>,
  <IncludeEnum.documents: 'documents'>,
  <IncludeEnum.metadatas: 'metadatas'>]}

## Resetting the Database

Reset the database to clear all stored data.

In [12]:
store.client.list_collections()

['example_collection']

In [13]:
# Reset the collection
store.reset()

In [14]:
store.client.list_collections()

[]