# LLM: OpenAI Audio Endpoint Basic Examples

This notebook demonstrates how to use the `OpenAIAudioClient` in `dapr-agents` for basic tasks with the OpenAI Audio API. We will explore:

* Generating speech from text and saving it as an MP3 file.
* Transcribing audio to text.
* Translating audio content to English.

## Install Required Libraries

Ensure you have the required library installed:

In [None]:
!pip install dapr-agents python-dotenv

## Load Environment Variables

In [1]:
from dotenv import load_dotenv
load_dotenv()

True

## Initialize OpenAIAudioClient

In [2]:
from dapr_agents import OpenAIAudioClient

client = OpenAIAudioClient()

## Generate Speech from Text

### Manual File Creation

This section demonstrates how to generate speech from a given text input and save it as an MP3 file.

In [3]:
from dapr_agents.types.llm import AudioSpeechRequest

# Define the text to convert to speech
text_to_speech = "Hello Roberto! This is an example of text-to-speech generation."

# Create a request for TTS
tts_request = AudioSpeechRequest(
 model="tts-1",
 input=text_to_speech,
 voice="fable",
 response_format="mp3"
)

# Generate the audio
audio_bytes = client.create_speech(request=tts_request)

# Save the audio to an MP3 file
output_path = "output_speech.mp3"
with open(output_path, "wb") as audio_file:
 audio_file.write(audio_bytes)

print(f"Audio saved to {output_path}")


Audio saved to output_speech.mp3


### Automatic File Creation

The audio file is saved directly by providing the file_name parameter.

In [4]:
from dapr_agents.types.llm import AudioSpeechRequest

# Define the text to convert to speech
text_to_speech = "Hola Roberto! Este es otro ejemplo de generacion de voz desde texto."

# Create a request for TTS
tts_request = AudioSpeechRequest(
 model="tts-1",
 input=text_to_speech,
 voice="echo",
 response_format="mp3"
)

# Generate the audio
client.create_speech(request=tts_request, file_name="output_speech_spanish_auto.mp3")

## Transcribe Audio to Text

This section demonstrates how to transcribe audio content into text.

### Using a File Path

In [5]:
from dapr_agents.types.llm import AudioTranscriptionRequest

# Specify the audio file to transcribe
audio_file_path = "output_speech.mp3"

# Create a transcription request
transcription_request = AudioTranscriptionRequest(
 model="whisper-1",
 file=audio_file_path
)

# Generate transcription
transcription_response = client.create_transcription(request=transcription_request)

# Display the transcription result
print("Transcription:", transcription_response.text)

Transcription: Hello Roberto, this is an example of text-to-speech generation.


### Using Audio Bytes

In [6]:
# audio_bytes = open("output_speech_spanish_auto.mp3", "rb")

with open("output_speech_spanish_auto.mp3", "rb") as f:
 audio_bytes = f.read()

transcription_request = AudioTranscriptionRequest(
 model="whisper-1",
 file=audio_bytes, # File as bytes
 language="en" # Optional: Specify the language of the audio
)

# Generate transcription
transcription_response = client.create_transcription(request=transcription_request)

# Display the transcription result
print("Transcription:", transcription_response.text)

Transcription: Hola Roberto, este es otro ejemplo de generación de voz desde texto.


### Using File-Like Objects (e.g., BufferedReader)

You can use file-like objects, such as BufferedReader, directly for transcription or translation.

In [7]:
from io import BufferedReader

# Open the audio file as a BufferedReader
audio_file_path = "output_speech_spanish_auto.mp3"
with open(audio_file_path, "rb") as f:
 buffered_file = BufferedReader(f)

 # Create a transcription request
 transcription_request = AudioTranscriptionRequest(
 model="whisper-1",
 file=buffered_file, # File as BufferedReader
 language="es"
 )

 # Generate transcription
 transcription_response = client.create_transcription(request=transcription_request)

 # Display the transcription result
 print("Transcription:", transcription_response.text)

Transcription: ¡Hola, Roberto! Este es otro ejemplo de generación de voz desde texto.


## Translate Audio to English

This section demonstrates how to translate audio content into English.

### Using a File Path

In [8]:
from dapr_agents.types.llm import AudioTranslationRequest

# Specify the audio file to translate
audio_file_path = "output_speech_spanish_auto.mp3"

# Create a translation request
translation_request = AudioTranslationRequest(
 model="whisper-1",
 file=audio_file_path,
 prompt="The following audio needs to be translated to English."
)

# Generate translation
translation_response = client.create_translation(request=translation_request)

# Display the translation result
print("Translation:", translation_response.text)

Translation: Hola Roberto, este es otro ejemplo de generación de voz desde texto.


### Using Audio Bytes

In [9]:
# audio_bytes = open("output_speech_spanish_auto.mp3", "rb")

with open("output_speech_spanish_auto.mp3", "rb") as f:
 audio_bytes = f.read()

translation_request = AudioTranslationRequest(
 model="whisper-1",
 file=audio_bytes, # File as bytes
 prompt="The following audio needs to be translated to English."
)

# Generate translation
translation_response = client.create_translation(request=translation_request)

# Display the translation result
print("Translation:", translation_response.text)

Translation: Hola Roberto, este es otro ejemplo de generación de voz desde texto.


### Using File-Like Objects (e.g., BufferedReader) for Translation

You can use a file-like object, such as a BufferedReader, directly for translating audio content.

In [10]:
from io import BufferedReader

# Open the audio file as a BufferedReader
audio_file_path = "output_speech_spanish_auto.mp3"
with open(audio_file_path, "rb") as f:
 buffered_file = BufferedReader(f)

 # Create a translation request
 translation_request = AudioTranslationRequest(
 model="whisper-1",
 file=buffered_file, # File as BufferedReader
 prompt="The following audio needs to be translated to English."
 )

 # Generate translation
 translation_response = client.create_translation(request=translation_request)

 # Display the translation result
 print("Translation:", translation_response.text)

Translation: Hola Roberto, este es otro ejemplo de generación de voz desde texto.
