dapr-agents/quickstarts/02_llm_call_hugging_face
Roberto Rodriguez dce6623150
Workflow App updates to Register Tasks and LLM client Fix (#172)
* Update quickstarts

Signed-off-by: Roberto Rodriguez <9653181+Cyb3rWard0g@users.noreply.github.com>

* Update stream parameter in LLM generation

Signed-off-by: Roberto Rodriguez <9653181+Cyb3rWard0g@users.noreply.github.com>

* Update initialization of LLM client for agent base

Signed-off-by: Roberto Rodriguez <9653181+Cyb3rWard0g@users.noreply.github.com>

* Switch comment to debug logging

Signed-off-by: Roberto Rodriguez <9653181+Cyb3rWard0g@users.noreply.github.com>

* Improve logic to handle api key and other parameters in openai clients

Signed-off-by: Roberto Rodriguez <9653181+Cyb3rWard0g@users.noreply.github.com>

* Add Workflow register_task method

Signed-off-by: Roberto Rodriguez <9653181+Cyb3rWard0g@users.noreply.github.com>

* Fix lint errors

Signed-off-by: Roberto Rodriguez <9653181+Cyb3rWard0g@users.noreply.github.com>

* Update version

Signed-off-by: Roberto Rodriguez <9653181+Cyb3rWard0g@users.noreply.github.com>

---------

Signed-off-by: Roberto Rodriguez <9653181+Cyb3rWard0g@users.noreply.github.com>
2025-08-01 21:07:26 -07:00
..
README.md Workflow App updates to Register Tasks and LLM client Fix (#172) 2025-08-01 21:07:26 -07:00
basic.prompty Refactor agent workflows, orchestrators, and integrations for reliability and modularity (#161) 2025-07-28 07:40:58 -07:00
requirements.txt Workflow App updates to Register Tasks and LLM client Fix (#172) 2025-08-01 21:07:26 -07:00
structured_completion.py Refactor LLM Workflows and Orchestrators for Unified Response Handling and Iteration (#163) (#165) 2025-07-28 13:31:53 -07:00
text_completion.py Refactor LLM Workflows and Orchestrators for Unified Response Handling and Iteration (#163) (#165) 2025-07-28 13:31:53 -07:00
text_completion_stream.py Refactor LLM Workflows and Orchestrators for Unified Response Handling and Iteration (#163) (#165) 2025-07-28 13:31:53 -07:00
text_completion_stream_with_tools.py Refactor LLM Workflows and Orchestrators for Unified Response Handling and Iteration (#163) (#165) 2025-07-28 13:31:53 -07:00
text_completion_with_tools.py Workflow App updates to Register Tasks and LLM client Fix (#172) 2025-08-01 21:07:26 -07:00

README.md

LLM calls with Hugging Face

This quickstart demonstrates how to use Dapr Agents' LLM capabilities to interact with the Hugging Face Hub language models and generate both free-form text and structured data. You'll learn how to make basic calls to LLMs and how to extract structured information in a type-safe manner.

Prerequisites

  • Python 3.10 (recommended)
  • pip package manager

Environment Setup

# Create a virtual environment
python3.10 -m venv .venv

# Activate the virtual environment 
# On Windows:
.venv\Scripts\activate
# On macOS/Linux:
source .venv/bin/activate

# Install dependencies
pip install -r requirements.txt

Examples

Text

1. Run the basic text completion example:

python text_completion.py

The script demonstrates basic usage of the DaprChatClient for text generation:

from dotenv import load_dotenv

from dapr_agents.llm import HFHubChatClient
from dapr_agents.types import LLMChatResponse, UserMessage

load_dotenv()

# Basic chat completion
llm = HFHubChatClient(model="HuggingFaceTB/SmolLM3-3B")
response: LLMChatResponse = llm.generate("Name a famous dog!")

if response.get_message() is not None:
    print("Response: ", response.get_message().content)

# Chat completion using a prompty file for context
llm = HFHubChatClient.from_prompty("basic.prompty")
response: LLMChatResponse = llm.generate(input_data={"question": "What is your name?"})

if response.get_message() is not None:
    print("Response with prompty: ", response.get_message().content)

# Chat completion with user input
llm = HFHubChatClient(model="HuggingFaceTB/SmolLM3-3B")
response: LLMChatResponse = llm.generate(messages=[UserMessage("hello")])

if response.get_message() is not None and "hello" in response.get_message().content.lower():
    print("Response with user input: ", response.get_message().content)

2. Expected output: The LLM will respond with the name of a famous dog (e.g., "Lassie", "Hachiko", etc.).

Run the structured text completion example:

python structured_completion.py

This example shows how to use Pydantic models to get structured data from LLMs:

import json

from dotenv import load_dotenv
from pydantic import BaseModel

from dapr_agents import HFHubChatClient
from dapr_agents.types import UserMessage

# Load environment variables from .env
load_dotenv()


# Define our data model
class Dog(BaseModel):
    name: str
    breed: str
    reason: str


# Initialize the chat client
llm = HFHubChatClient(model="HuggingFaceTB/SmolLM3-3B")

# Get structured response
response: Dog = llm.generate(
    messages=[UserMessage("One famous dog in history.")], response_format=Dog
)

print(json.dumps(response.model_dump(), indent=2))

Expected output: A JSON object with name, breed, and reason fields

{
  "name": "Dog",
  "breed": "Siberian Husky",
  "reason": "Known for its endurance, intelligence, and loyalty, Siberian Huskies have played crucial roles in dog sledding and have been beloved companions for many."
}

Streaming

Our Hugging Face chat client also support streaming responses, where you can process partial results as they arrive. Below are two examples:

1. Basic Streaming Example

Run the text_completion_stream.py script to see tokenbytoken output:

python text_completion_stream.py

The scripts:

from dotenv import load_dotenv
from dapr_agents import HFHubChatClient
from dapr_agents.types.message import LLMChatResponseChunk
from typing import Iterator
import logging

logging.basicConfig(level=logging.INFO)
load_dotenv()

llm = HFHubChatClient(model="HuggingFaceTB/SmolLM3-3B")
response: Iterator[LLMChatResponseChunk] = llm.generate("Name a famous dog!", stream=True)

for chunk in response:
    if chunk.result.content:
        print(chunk.result.content, end="", flush=True)

This will print each partial chunk as it arrives, so you can build up the full answer in real time.

2. Streaming with Tool Calls:

Use text_completion_stream_with_tools.py to combine streaming with functioncall “tools”:

python text_completion_stream_with_tools.py
from dotenv import load_dotenv
from dapr_agents import HFHubChatClient
from dapr_agents.types.message import LLMChatResponseChunk
from typing import Iterator
import logging

logging.basicConfig(level=logging.INFO)
load_dotenv()

# Initialize client
llm = HFHubChatClient(model="HuggingFaceTB/SmolLM3-3B", hf_provider="auto")

# Define a simple addition tool
def add_numbers(a: int, b: int) -> int:
    return a + b

add_tool = {
    "type": "function",
    "function": {
        "name": "add_numbers",
        "description": "Add two numbers together.",
        "parameters": {
            "type": "object",
            "properties": {
                "a": {"type": "integer", "description": "The first number."},
                "b": {"type": "integer", "description": "The second number."}
            },
            "required": ["a", "b"]
        }
    }
}

messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Add 5 and 7 and 2 and 2."}
]

response: Iterator[LLMChatResponseChunk] = llm.generate(
    messages=messages,
    tools=[add_tool],
    stream=True
)

for chunk in response:
    print(chunk.result)

Here, the model can decide to call your add_numbers function midstream, and youll see those calls (and their results) as they come in.