mirror of https://github.com/dapr/dapr-agents.git
523 lines
16 KiB
Plaintext
523 lines
16 KiB
Plaintext
{
|
|
"cells": [
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"# VectorStore: Postgres and Sentence Transformer (all-MiniLM-L6-v2) with Basic Examples\n",
|
|
"\n",
|
|
"This notebook demonstrates how to use the `PostgresVectorStore` in `dapr-agents` for storing, querying, and filtering documents. We will explore:\n",
|
|
"\n",
|
|
"* Initializing the `SentenceTransformerEmbedder` embedding function and `PostgresVectorStore`.\n",
|
|
"* Adding documents with text and metadata.\n",
|
|
"* Performing similarity searches.\n",
|
|
"* Filtering results based on metadata.\n",
|
|
"* Resetting the database."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Install Required Libraries\n",
|
|
"Before starting, ensure the required libraries are installed:"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"!pip install dapr-agents python-dotenv \"psycopg[binary,pool]\" pgvector"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Load Environment Variables\n",
|
|
"\n",
|
|
"Load API keys or other configuration values from your `.env` file using `dotenv`."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 1,
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"data": {
|
|
"text/plain": [
|
|
"True"
|
|
]
|
|
},
|
|
"execution_count": 1,
|
|
"metadata": {},
|
|
"output_type": "execute_result"
|
|
}
|
|
],
|
|
"source": [
|
|
"from dotenv import load_dotenv\n",
|
|
"load_dotenv()"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Setting Up The Database\n",
|
|
"\n",
|
|
"Before initializing the `PostgresVectorStore`, set up a PostgreSQL instance with pgvector enabled. For a local setup, use Docker:"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 4,
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"name": "stdout",
|
|
"output_type": "stream",
|
|
"text": [
|
|
"d920da4b841a66223431ad1dce49c3b0c215a971a4860ee9e25ea5bf0b4bfcd0\n"
|
|
]
|
|
}
|
|
],
|
|
"source": [
|
|
"!docker run --name pgvector-container \\\n",
|
|
" -e POSTGRES_USER=dapr_agents \\\n",
|
|
" -e POSTGRES_PASSWORD=dapr_agents \\\n",
|
|
" -e POSTGRES_DB=dapr_agents \\\n",
|
|
" -p 5432:5432 \\\n",
|
|
" -d pgvector/pgvector:pg17"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Initializing SentenceTransformer Embedding Function\n",
|
|
"\n",
|
|
"The default embedding function is `SentenceTransformerEmbedder`, but we will initialize it explicitly for clarity."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 5,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"from dapr_agents.document.embedder import SentenceTransformerEmbedder\n",
|
|
"\n",
|
|
"embedding_function = SentenceTransformerEmbedder(\n",
|
|
" model=\"all-MiniLM-L6-v2\"\n",
|
|
")"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Initializing the PostgresVectorStore\n",
|
|
"\n",
|
|
"To start, create an instance of the `PostgresVectorStore` and set the `embedding_function` to the instance of `SentenceTransformerEmbedder`"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 6,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"from dapr_agents.storage.vectorstores import PostgresVectorStore\n",
|
|
"import os\n",
|
|
"\n",
|
|
"# Set up connection parameters\n",
|
|
"connection_string = os.getenv(\"POSTGRES_CONNECTION_STRING\", \"postgresql://dapr_agents:dapr_agents@localhost:5432/dapr_agents\")\n",
|
|
"\n",
|
|
"# Initialize PostgresVectorStore\n",
|
|
"store = PostgresVectorStore(\n",
|
|
" connection_string=connection_string,\n",
|
|
" table_name=\"dapr_agents\",\n",
|
|
" embedding_function=SentenceTransformerEmbedder()\n",
|
|
")"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Adding Documents\n",
|
|
"We will use Document objects to add content to the collection. Each document includes text and optional metadata."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Creating Documents"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 7,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"from dapr_agents.types.document import Document\n",
|
|
"\n",
|
|
"# Example Lord of the Rings-inspired conversations\n",
|
|
"documents = [\n",
|
|
" Document(\n",
|
|
" text=\"Gandalf: A wizard is never late, Frodo Baggins. Nor is he early; he arrives precisely when he means to.\",\n",
|
|
" metadata={\"topic\": \"wisdom\", \"location\": \"The Shire\"}\n",
|
|
" ),\n",
|
|
" Document(\n",
|
|
" text=\"Frodo: I wish the Ring had never come to me. I wish none of this had happened.\",\n",
|
|
" metadata={\"topic\": \"destiny\", \"location\": \"Moria\"}\n",
|
|
" ),\n",
|
|
" Document(\n",
|
|
" text=\"Aragorn: You cannot wield it! None of us can. The One Ring answers to Sauron alone. It has no other master.\",\n",
|
|
" metadata={\"topic\": \"power\", \"location\": \"Rivendell\"}\n",
|
|
" ),\n",
|
|
" Document(\n",
|
|
" text=\"Sam: I can't carry it for you, but I can carry you!\",\n",
|
|
" metadata={\"topic\": \"friendship\", \"location\": \"Mount Doom\"}\n",
|
|
" ),\n",
|
|
" Document(\n",
|
|
" text=\"Legolas: A red sun rises. Blood has been spilled this night.\",\n",
|
|
" metadata={\"topic\": \"war\", \"location\": \"Rohan\"}\n",
|
|
" ),\n",
|
|
" Document(\n",
|
|
" text=\"Gimli: Certainty of death. Small chance of success. What are we waiting for?\",\n",
|
|
" metadata={\"topic\": \"bravery\", \"location\": \"Helm's Deep\"}\n",
|
|
" ),\n",
|
|
" Document(\n",
|
|
" text=\"Boromir: One does not simply walk into Mordor.\",\n",
|
|
" metadata={\"topic\": \"impossible tasks\", \"location\": \"Rivendell\"}\n",
|
|
" ),\n",
|
|
" Document(\n",
|
|
" text=\"Galadriel: Even the smallest person can change the course of the future.\",\n",
|
|
" metadata={\"topic\": \"hope\", \"location\": \"Lothlórien\"}\n",
|
|
" ),\n",
|
|
" Document(\n",
|
|
" text=\"Théoden: So it begins.\",\n",
|
|
" metadata={\"topic\": \"battle\", \"location\": \"Helm's Deep\"}\n",
|
|
" ),\n",
|
|
" Document(\n",
|
|
" text=\"Elrond: The strength of the Ring-bearer is failing. In his heart, Frodo begins to understand. The quest will claim his life.\",\n",
|
|
" metadata={\"topic\": \"sacrifice\", \"location\": \"Rivendell\"}\n",
|
|
" )\n",
|
|
"]"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Adding Documents to the Collection"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 8,
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"name": "stdout",
|
|
"output_type": "stream",
|
|
"text": [
|
|
"Number of documents in the collection: 10\n"
|
|
]
|
|
}
|
|
],
|
|
"source": [
|
|
"store.add_documents(documents=documents)\n",
|
|
"print(f\"Number of documents in the collection: {store.count()}\")"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Retrieving Documents\n",
|
|
"\n",
|
|
"Retrieve all documents or specific ones by ID."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 9,
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"name": "stdout",
|
|
"output_type": "stream",
|
|
"text": [
|
|
"Retrieved documents:\n",
|
|
"ID: feb3b2c1-d3cf-423b-bd5d-6094e2200bc8, Text: Gandalf: A wizard is never late, Frodo Baggins. Nor is he early; he arrives precisely when he means to., Metadata: {'topic': 'wisdom', 'location': 'The Shire'}\n",
|
|
"ID: b206833f-4c19-4f3c-91e2-2ccbcc895a63, Text: Frodo: I wish the Ring had never come to me. I wish none of this had happened., Metadata: {'topic': 'destiny', 'location': 'Moria'}\n",
|
|
"ID: 57226af8-d035-4052-86b2-4f68d7c5a8f6, Text: Aragorn: You cannot wield it! None of us can. The One Ring answers to Sauron alone. It has no other master., Metadata: {'topic': 'power', 'location': 'Rivendell'}\n",
|
|
"ID: 5376d46a-4161-408c-850c-4b73cd8d2aa6, Text: Sam: I can't carry it for you, but I can carry you!, Metadata: {'topic': 'friendship', 'location': 'Mount Doom'}\n",
|
|
"ID: 7d8c78c3-e4c9-4c6a-8bb4-a04f450e6bfd, Text: Legolas: A red sun rises. Blood has been spilled this night., Metadata: {'topic': 'war', 'location': 'Rohan'}\n",
|
|
"ID: 749a126e-2ad5-4aa6-b043-a204e50963f3, Text: Gimli: Certainty of death. Small chance of success. What are we waiting for?, Metadata: {'topic': 'bravery', 'location': \"Helm's Deep\"}\n",
|
|
"ID: 4848f783-fbc0-43ec-98d6-43b03fa79809, Text: Boromir: One does not simply walk into Mordor., Metadata: {'topic': 'impossible tasks', 'location': 'Rivendell'}\n",
|
|
"ID: ecc3257d-e542-407e-9db9-21ec3b78249c, Text: Galadriel: Even the smallest person can change the course of the future., Metadata: {'topic': 'hope', 'location': 'Lothlórien'}\n",
|
|
"ID: 6dad5159-724f-4f03-8cc8-aabc4ee308cd, Text: Théoden: So it begins., Metadata: {'topic': 'battle', 'location': \"Helm's Deep\"}\n",
|
|
"ID: 63a09862-438a-41d7-abe7-74ec5510ce82, Text: Elrond: The strength of the Ring-bearer is failing. In his heart, Frodo begins to understand. The quest will claim his life., Metadata: {'topic': 'sacrifice', 'location': 'Rivendell'}\n"
|
|
]
|
|
}
|
|
],
|
|
"source": [
|
|
"# Retrieve all documents\n",
|
|
"retrieved_docs = store.get()\n",
|
|
"print(\"Retrieved documents:\")\n",
|
|
"for doc in retrieved_docs:\n",
|
|
" print(f\"ID: {doc['id']}, Text: {doc['document']}, Metadata: {doc['metadata']}\")"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 10,
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"name": "stdout",
|
|
"output_type": "stream",
|
|
"text": [
|
|
"Specific document: [{'id': UUID('feb3b2c1-d3cf-423b-bd5d-6094e2200bc8'), 'document': 'Gandalf: A wizard is never late, Frodo Baggins. Nor is he early; he arrives precisely when he means to.', 'metadata': {'topic': 'wisdom', 'location': 'The Shire'}}]\n"
|
|
]
|
|
}
|
|
],
|
|
"source": [
|
|
"# Retrieve a specific document by ID\n",
|
|
"doc_id = retrieved_docs[0]['id']\n",
|
|
"specific_doc = store.get(ids=[doc_id])\n",
|
|
"print(f\"Specific document: {specific_doc}\")"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 11,
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"name": "stdout",
|
|
"output_type": "stream",
|
|
"text": [
|
|
"Specific document Embedding (first 5 values): [-0.0\n"
|
|
]
|
|
}
|
|
],
|
|
"source": [
|
|
"# Retrieve a specific document by ID\n",
|
|
"doc_id = retrieved_docs[0]['id']\n",
|
|
"specific_doc = store.get(ids=[doc_id], with_embedding=True)\n",
|
|
"embedding = specific_doc[0]['embedding']\n",
|
|
"print(f\"Specific document Embedding (first 5 values): {embedding[:5]}\")"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Updating Documents\n",
|
|
"\n",
|
|
"You can update existing documents' text or metadata using their IDs."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 12,
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"name": "stdout",
|
|
"output_type": "stream",
|
|
"text": [
|
|
"Updated document: [{'id': UUID('feb3b2c1-d3cf-423b-bd5d-6094e2200bc8'), 'document': 'Gandalf: Even the wisest cannot foresee all ends, but hope remains while the Company is true.', 'metadata': {'topic': 'hope and wisdom', 'location': 'Fangorn Forest'}}]\n"
|
|
]
|
|
}
|
|
],
|
|
"source": [
|
|
"# Retrieve a document by its ID\n",
|
|
"retrieved_docs = store.get() # Get all documents to find the ID\n",
|
|
"doc_id = retrieved_docs[0]['id'] # Select the first document's ID for this example\n",
|
|
"\n",
|
|
"# Define updated text and metadata\n",
|
|
"updated_text = \"Gandalf: Even the wisest cannot foresee all ends, but hope remains while the Company is true.\"\n",
|
|
"updated_metadata = {\"topic\": \"hope and wisdom\", \"location\": \"Fangorn Forest\"}\n",
|
|
"\n",
|
|
"# Update the document's text and metadata in the store\n",
|
|
"store.update(ids=[doc_id], documents=[updated_text], metadatas=[updated_metadata])\n",
|
|
"\n",
|
|
"# Verify the update\n",
|
|
"updated_doc = store.get(ids=[doc_id])\n",
|
|
"print(f\"Updated document: {updated_doc}\")"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Deleting Documents\n",
|
|
"\n",
|
|
"Delete documents by their IDs."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 13,
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"name": "stdout",
|
|
"output_type": "stream",
|
|
"text": [
|
|
"Number of documents after deletion: 9\n"
|
|
]
|
|
}
|
|
],
|
|
"source": [
|
|
"# Delete a document by ID\n",
|
|
"doc_id_to_delete = retrieved_docs[2]['id']\n",
|
|
"store.delete(ids=[doc_id_to_delete])\n",
|
|
"\n",
|
|
"# Verify deletion\n",
|
|
"print(f\"Number of documents after deletion: {store.count()}\")"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Similarity Search\n",
|
|
"\n",
|
|
"Perform a similarity search using text queries. The embedding function automatically generates embeddings for the input query."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 14,
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"name": "stdout",
|
|
"output_type": "stream",
|
|
"text": [
|
|
"Similarity search results:\n",
|
|
"ID: 749a126e-2ad5-4aa6-b043-a204e50963f3, Document: Gimli: Certainty of death. Small chance of success. What are we waiting for?, Metadata: {'topic': 'bravery', 'location': \"Helm's Deep\"}, Similarity: 0.1567628941818613\n",
|
|
"ID: 4848f783-fbc0-43ec-98d6-43b03fa79809, Document: Boromir: One does not simply walk into Mordor., Metadata: {'topic': 'impossible tasks', 'location': 'Rivendell'}, Similarity: 0.13233356090384096\n"
|
|
]
|
|
}
|
|
],
|
|
"source": [
|
|
"# Perform a similarity search using text queries.\n",
|
|
"query = \"wise advice\"\n",
|
|
"results = store.search_similar(query_texts=query, k=2)\n",
|
|
"\n",
|
|
"# Display results\n",
|
|
"print(\"Similarity search results:\")\n",
|
|
"for result in results:\n",
|
|
" print(f\"ID: {result['id']}, Document: {result['document']}, Metadata: {result['metadata']}, Similarity: {result['similarity']}\")"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Filtering Results\n",
|
|
"\n",
|
|
"Filter results based on metadata."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 15,
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"name": "stdout",
|
|
"output_type": "stream",
|
|
"text": [
|
|
"Filtered search results:\n",
|
|
"ID: feb3b2c1-d3cf-423b-bd5d-6094e2200bc8, Document: Gandalf: Even the wisest cannot foresee all ends, but hope remains while the Company is true., Metadata: {'topic': 'hope and wisdom', 'location': 'Fangorn Forest'}, Similarity: 0.1670202911216282\n"
|
|
]
|
|
}
|
|
],
|
|
"source": [
|
|
"# Search for documents with specific metadata filters\n",
|
|
"query = \"journey\"\n",
|
|
"filter_conditions = {\n",
|
|
" \"location\": \"Fangorn Forest\",\n",
|
|
" \"topic\": \"hope and wisdom\"\n",
|
|
"}\n",
|
|
"\n",
|
|
"filtered_results = store.search_similar(query_texts=query, metadata_filter=filter_conditions, k=3)\n",
|
|
"\n",
|
|
"# Display filtered results\n",
|
|
"print(\"Filtered search results:\")\n",
|
|
"for result in filtered_results:\n",
|
|
" print(f\"ID: {result['id']}, Document: {result['document']}, Metadata: {result['metadata']}, Similarity: {result['similarity']}\")"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Resetting the Database\n",
|
|
"\n",
|
|
"Reset the database to clear all stored data."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 16,
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"name": "stdout",
|
|
"output_type": "stream",
|
|
"text": [
|
|
"Database reset complete. Current documents: []\n"
|
|
]
|
|
}
|
|
],
|
|
"source": [
|
|
"# Reset the collection\n",
|
|
"store.reset()\n",
|
|
"print(\"Database reset complete. Current documents:\", store.get())"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": []
|
|
}
|
|
],
|
|
"metadata": {
|
|
"kernelspec": {
|
|
"display_name": ".venv",
|
|
"language": "python",
|
|
"name": "python3"
|
|
},
|
|
"language_info": {
|
|
"codemirror_mode": {
|
|
"name": "ipython",
|
|
"version": 3
|
|
},
|
|
"file_extension": ".py",
|
|
"mimetype": "text/x-python",
|
|
"name": "python",
|
|
"nbconvert_exporter": "python",
|
|
"pygments_lexer": "ipython3",
|
|
"version": "3.12.1"
|
|
}
|
|
},
|
|
"nbformat": 4,
|
|
"nbformat_minor": 2
|
|
}
|