mirror of https://github.com/dapr/dapr-agents.git
|
||
---|---|---|
.. | ||
components | ||
README.md | ||
config.json | ||
podcast_dialogue.json | ||
requirements.txt | ||
workflow.py |
README.md
Doc2Podcast: Automating Podcast Creation from Research Papers
This workflow is a basic step toward automating the creation of podcast content from research using AI. It demonstrates how to process a single research paper, generate a dialogue-style transcript with LLMs, and convert it into a podcast audio file. While simple, this workflow serves as a foundation for exploring more advanced processes, such as handling multiple documents or optimizing content splitting for better audio output.
Key Features and Workflow
- PDF Processing: Downloads a research paper from a specified URL and extracts its content page by page.
- LLM-Powered Transcripts: Transforms extracted text into a dialogue-style transcript using a large language model, alternating between a host and participants.
- AI-Generated Audio: Converts the transcript into a podcast-like audio file with natural-sounding voices for the host and participants.
- Custom Workflow: Saves the final podcast audio and transcript files locally, offering flexibility for future enhancements like handling multiple files or integrating additional AI tools.
Prerequisites
- Python 3.8 or higher
- Required Python dependencies (install using
pip install -r requirements.txt
) - A valid
OpenAI
API key for generating audio content- Set the
OPENAI_API_KEY
variable with your key value in an.env
file.
- Set the
Configuration
To run the workflow, provide a configuration file in JSON format. The config.json
file in this folder points to the following file "Exploring Applicability of LLM-Powered Autonomous Agents to Solve Real-life Problems". Config example:
{
"pdf_url": "https://example.com/research-paper.pdf",
"podcast_name": "AI Explorations",
"host": {
"name": "John Doe",
"voice": "alloy"
},
"participants": [
{ "name": "Alice Smith" },
{ "name": "Bob Johnson" }
],
"max_rounds": 4,
"output_transcript_path": "podcast_dialogue.json",
"output_audio_path": "final_podcast.mp3",
"audio_model": "tts-1"
}
Running the Workflow
- Place the configuration file (e.g., config.json) in the project directory.
- Run the workflow with the following command:
dapr run --app-id doc2podcast --resources-path components -- python3 workflow.py --config config.json
- Output:
- Transcript: A structured transcript saved as
podcast_dialogue.json
by default. An example can be found in the current directory. - Audio: The final podcast audio saved as
final_podcast.mp3
as default. An example can be found here.
- Transcript: A structured transcript saved as
Next Steps
This workflow is a simple starting point. Future enhancements could include:
- Processing Multiple Files: Extend the workflow to handle batches of PDFs.
- Advanced Text Splitting: Dynamically split text based on content rather than pages.
- Web Search Integration: Pull additional context or related research from the web.
- Multi-Modal Content: Process documents alongside images, slides, or charts.