dapr-agents/quickstarts/02_llm_call_nvidia
Roberto Rodriguez 29edfc419b Refactor LLM Workflows and Orchestrators for Unified Response Handling and Iteration (#163)
* Refactor ChatClientBase: drop Pydantic inheritance and add typed generate() overloads

Signed-off-by: Roberto Rodriguez <9653181+Cyb3rWard0g@users.noreply.github.com>

* Align all LLM chat clients with refactored base and unified response models

Signed-off-by: Roberto Rodriguez <9653181+Cyb3rWard0g@users.noreply.github.com>

* Unify LLM utils across providers and delegate streaming/response to provider‑specific handlers

Signed-off-by: Roberto Rodriguez <9653181+Cyb3rWard0g@users.noreply.github.com>

* Refactor LLM pipeline: add HuggingFace tool calls, unify chat client/response types, and switch DurableAgent to loop‑based workflow

Signed-off-by: Roberto Rodriguez <9653181+Cyb3rWard0g@users.noreply.github.com>

* Refactor orchestrators with loops and unify LLM response handling using LLMChatResponse

Signed-off-by: Roberto Rodriguez <9653181+Cyb3rWard0g@users.noreply.github.com>

* test remaining quickstarts after all changes

Signed-off-by: Roberto Rodriguez <9653181+Cyb3rWard0g@users.noreply.github.com>

* run pytest after all changes

Signed-off-by: Roberto Rodriguez <9653181+Cyb3rWard0g@users.noreply.github.com>

* Run linting and formatting checks to ensure code quality

Signed-off-by: Roberto Rodriguez <9653181+Cyb3rWard0g@users.noreply.github.com>

* Update logging, Orchestrator Name and OTel module name

Signed-off-by: Roberto Rodriguez <9653181+Cyb3rWard0g@users.noreply.github.com>

---------

Signed-off-by: Roberto Rodriguez <9653181+Cyb3rWard0g@users.noreply.github.com>
Co-authored-by: Yaron Schneider <schneider.yaron@live.com>
2025-07-28 13:14:09 -07:00
..
README.md Refactor LLM Workflows and Orchestrators for Unified Response Handling and Iteration (#163) 2025-07-28 13:14:09 -07:00
basic.prompty More testing coverage (more quickstarts) (#25) 2025-03-10 09:24:00 -07:00
embeddings.py Fix/30 add linter action (#95) 2025-04-23 22:58:48 -07:00
requirements.txt fix: dump to 0.6.1 and add missing requirements.txt files to fix streamablehttp quickstart (#156) (#160) 2025-07-22 16:46:33 -07:00
structured_completion.py Refactor LLM Workflows and Orchestrators for Unified Response Handling and Iteration (#163) 2025-07-28 13:14:09 -07:00
text_completion.py Refactor LLM Workflows and Orchestrators for Unified Response Handling and Iteration (#163) 2025-07-28 13:14:09 -07:00
text_completion_stream.py Refactor LLM Workflows and Orchestrators for Unified Response Handling and Iteration (#163) 2025-07-28 13:14:09 -07:00
text_completion_stream_with_tools.py Refactor LLM Workflows and Orchestrators for Unified Response Handling and Iteration (#163) 2025-07-28 13:14:09 -07:00

README.md

Nvidia LLM calls with Dapr Agents

This quickstart demonstrates how to use Dapr Agents' LLM capabilities to interact with language models and generate both free-form text and structured data. You'll learn how to make basic calls to LLMs and how to extract structured information in a type-safe manner.

Prerequisites

  • Python 3.10 (recommended)
  • pip package manager
  • Nvidia API key

Environment Setup

# Create a virtual environment
python3.10 -m venv .venv

# Activate the virtual environment 
# On Windows:
.venv\Scripts\activate
# On macOS/Linux:
source .venv/bin/activate

# Install dependencies
pip install -r requirements.txt

Configuration

Create a .env file in the project root:

NVIDIA_API_KEY=your_api_key_here

Replace your_api_key_here with your actual Nvidia key.

Examples

Text

1. Run the basic text completion example:

python text_completion.py

The script demonstrates basic usage of Dapr Agents' NVIDIAChatClient for text generation:

from dotenv import load_dotenv

from dapr_agents import NVIDIAChatClient
from dapr_agents.types import LLMChatResponse, UserMessage

# Load environment variables from .env
load_dotenv()

# Basic chat completion
llm = NVIDIAChatClient()
response: LLMChatResponse = llm.generate("Name a famous dog!")

if response.get_message() is not None:
    print("Response: ", response.get_message().content)

# Chat completion using a prompty file for context
llm = NVIDIAChatClient.from_prompty("basic.prompty")
response: LLMChatResponse = llm.generate(input_data={"question": "What is your name?"})

if response.get_message() is not None:
    print("Response with prompty: ", response.get_message().content)

# Chat completion with user input
llm = NVIDIAChatClient()
response: LLMChatResponse = llm.generate(messages=[UserMessage("hello")])

if response.get_message() is not None and "hello" in response.get_message().content.lower():
    print("Response with user input: ", response.get_message().content)

2. Expected output: The LLM will respond with the name of a famous dog (e.g., "Lassie", "Hachiko", etc.).

Run the structured text completion example:

python structured_completion.py

This example shows how to use Pydantic models to get structured data from LLMs:

import json

from dotenv import load_dotenv
from pydantic import BaseModel

from dapr_agents import NVIDIAChatClient
from dapr_agents.types import UserMessage

# Load environment variables from .env
load_dotenv()


# Define our data model
class Dog(BaseModel):
    name: str
    breed: str
    reason: str


# Initialize the chat client
llm = NVIDIAChatClient(model="meta/llama-3.1-8b-instruct")

# Get structured response
response: Dog = llm.generate(
    messages=[UserMessage("One famous dog in history.")], response_format=Dog
)

Expected output: A structured Dog object with name, breed, and reason fields (e.g., Dog(name='Hachiko', breed='Akita', reason='Known for his remarkable loyalty...'))

Embeddings

You can use the NVIDIAEmbedder in dapr-agents for generating text embeddings.

1. Embeddings a single text:

python embeddings.py

Streaming

Our NVIDIA chat client also support streaming responses, where you can process partial results as they arrive. Below are two examples:

1. Basic Streaming Example

Run the text_completion_stream.py script to see tokenbytoken output:

python text_completion_stream.py

The scripts:

from dotenv import load_dotenv

from dapr_agents import NVIDIAChatClient
from dapr_agents.types.message import LLMChatResponseChunk
from typing import Iterator
import logging

logging.basicConfig(level=logging.INFO)

# Load environment variables from .env
load_dotenv()

# Basic chat completion
llm = NVIDIAChatClient()

response: Iterator[LLMChatResponseChunk] = llm.generate("Name a famous dog!", stream=True)

for chunk in response:
    if chunk.result.content:
        print(chunk.result.content, end="", flush=True)

This will print each partial chunk as it arrives, so you can build up the full answer in real time.

2. Streaming with Tool Calls:

Use text_completion_with_tools.py to combine streaming with functioncall “tools”:

python text_completion_with_tools.py
from dotenv import load_dotenv

from dapr_agents import NVIDIAChatClient
from dapr_agents.types.message import LLMChatResponseChunk
from typing import Iterator
import logging

logging.basicConfig(level=logging.INFO)

# Load environment variables from .env
load_dotenv()

# Basic chat completion
llm = NVIDIAChatClient()

# Define a tool for addition
def add_numbers(a: int, b: int) -> int:
    return a + b

# Define the tool function call schema
add_tool = {
    "type": "function",
    "function": {
        "name": "add_numbers",
        "description": "Add two numbers together.",
        "parameters": {
            "type": "object",
            "properties": {
                "a": {"type": "integer", "description": "The first number."},
                "b": {"type": "integer", "description": "The second number."}
            },
            "required": ["a", "b"]
        }
    }
}

# Define messages for the chat
messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Add 5 and 7 and 2 and 2."}
]

response: Iterator[LLMChatResponseChunk] = llm.generate(messages=messages, tools=[add_tool], stream=True)

for chunk in response:
    print(chunk.result)

Here, the model can decide to call your add_numbers function midstream, and youll see those calls (and their results) as they come in.