mirror of https://github.com/dapr/dapr-agents.git
188 lines
4.5 KiB
Plaintext
188 lines
4.5 KiB
Plaintext
{
|
|
"cells": [
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"# LLM: ElevenLabs Text-To-Speech Endpoint Basic Examples\n",
|
|
"\n",
|
|
"This notebook demonstrates how to use the `ElevenLabsSpeechClient` in dapr-agents for basic tasks with the [ElevenLabs Text-To-Speech Endpoint](https://elevenlabs.io/docs/api-reference/text-to-speech/convert). We will explore:\n",
|
|
"\n",
|
|
"* Initializing the `ElevenLabsSpeechClient`.\n",
|
|
"* Generating speech from text and saving it as an MP3 file.."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Install Required Libraries\n",
|
|
"\n",
|
|
"Ensure you have the required library installed:"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"!pip install dapr-agents python-dotenv elevenlabs"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Load Environment Variables"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"from dotenv import load_dotenv\n",
|
|
"load_dotenv()"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Enable Logging"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 2,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"import logging\n",
|
|
"\n",
|
|
"logging.basicConfig(level=logging.INFO)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Initialize ElevenLabsSpeechClient\n",
|
|
"\n",
|
|
"Initialize the `ElevenLabsSpeechClient`. By default the voice is set to: `voice_id=EXAVITQu4vr4xnSDxMaL\",name=\"Sarah\"`"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"from dapr_agents import ElevenLabsSpeechClient\n",
|
|
"\n",
|
|
"client = ElevenLabsSpeechClient(\n",
|
|
" model=\"eleven_multilingual_v2\", # Default model\n",
|
|
" voice=\"JBFqnCBsd6RMkjVDRZzb\" # 'name': 'George', 'language': 'en', 'labels': {'accent': 'British', 'description': 'warm', 'age': 'middle aged', 'gender': 'male', 'use_case': 'narration'}\n",
|
|
")"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Generate Speech from Text\n",
|
|
"\n",
|
|
"### Manual File Creation\n",
|
|
"\n",
|
|
"This section demonstrates how to generate speech from a given text input and save it as an MP3 file."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"# Define the text to convert to speech\n",
|
|
"text = \"Hello Roberto! This is an example of text-to-speech generation.\"\n",
|
|
"\n",
|
|
"# Create speech from text\n",
|
|
"audio_bytes = client.create_speech(\n",
|
|
" text=text,\n",
|
|
" output_format=\"mp3_44100_128\" # default output format, mp3 with 44.1kHz sample rate at 128kbps.\n",
|
|
")\n"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"# Save the audio to an MP3 file\n",
|
|
"output_path = \"output_speech.mp3\"\n",
|
|
"with open(output_path, \"wb\") as audio_file:\n",
|
|
" audio_file.write(audio_bytes)\n",
|
|
"\n",
|
|
"print(f\"Audio saved to {output_path}\")"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Automatic File Creation\n",
|
|
"\n",
|
|
"The audio file is saved directly by providing the file_name parameter."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"# Define the text to convert to speech\n",
|
|
"text = \"Hello Roberto! This is another example of text-to-speech generation.\"\n",
|
|
"\n",
|
|
"# Create speech from text\n",
|
|
"client.create_speech(\n",
|
|
" text=text,\n",
|
|
" output_format=\"mp3_44100_128\", # default output format, mp3 with 44.1kHz sample rate at 128kbps.,\n",
|
|
" file_name='output_speech_auto.mp3'\n",
|
|
")"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": []
|
|
}
|
|
],
|
|
"metadata": {
|
|
"kernelspec": {
|
|
"display_name": ".venv",
|
|
"language": "python",
|
|
"name": "python3"
|
|
},
|
|
"language_info": {
|
|
"codemirror_mode": {
|
|
"name": "ipython",
|
|
"version": 3
|
|
},
|
|
"file_extension": ".py",
|
|
"mimetype": "text/x-python",
|
|
"name": "python",
|
|
"nbconvert_exporter": "python",
|
|
"pygments_lexer": "ipython3",
|
|
"version": "3.12.1"
|
|
}
|
|
},
|
|
"nbformat": 4,
|
|
"nbformat_minor": 2
|
|
}
|