dapr-agents/cookbook/workflows/doc2podcast
Roberto Rodriguez 75274ac607
Adding DaprWorkflowContext from dapr.ext.workflow (#99)
2025-04-24 04:05:43 -07:00
..
components FLOKI Initial Contribution 2025-01-14 13:05:22 -05:00
README.md FLOKI Initial Contribution 2025-01-14 13:05:22 -05:00
config.json FLOKI Initial Contribution 2025-01-14 13:05:22 -05:00
podcast_dialogue.json FLOKI Initial Contribution 2025-01-14 13:05:22 -05:00
requirements.txt Refreshed all notebooks with dapr_agents module 2025-01-25 19:05:01 -05:00
workflow.py Adding DaprWorkflowContext from dapr.ext.workflow (#99) 2025-04-24 04:05:43 -07:00

README.md

Doc2Podcast: Automating Podcast Creation from Research Papers

This workflow is a basic step toward automating the creation of podcast content from research using AI. It demonstrates how to process a single research paper, generate a dialogue-style transcript with LLMs, and convert it into a podcast audio file. While simple, this workflow serves as a foundation for exploring more advanced processes, such as handling multiple documents or optimizing content splitting for better audio output.

Key Features and Workflow

  • PDF Processing: Downloads a research paper from a specified URL and extracts its content page by page.
  • LLM-Powered Transcripts: Transforms extracted text into a dialogue-style transcript using a large language model, alternating between a host and participants.
  • AI-Generated Audio: Converts the transcript into a podcast-like audio file with natural-sounding voices for the host and participants.
  • Custom Workflow: Saves the final podcast audio and transcript files locally, offering flexibility for future enhancements like handling multiple files or integrating additional AI tools.

Prerequisites

  • Python 3.8 or higher
  • Required Python dependencies (install using pip install -r requirements.txt)
  • A valid OpenAI API key for generating audio content
    • Set the OPENAI_API_KEY variable with your key value in an .env file.

Configuration

To run the workflow, provide a configuration file in JSON format. The config.json file in this folder points to the following file "Exploring Applicability of LLM-Powered Autonomous Agents to Solve Real-life Problems". Config example:

{
    "pdf_url": "https://example.com/research-paper.pdf",
    "podcast_name": "AI Explorations",
    "host": {
        "name": "John Doe",
        "voice": "alloy"
    },
    "participants": [
        { "name": "Alice Smith" },
        { "name": "Bob Johnson" }
    ],
    "max_rounds": 4,
    "output_transcript_path": "podcast_dialogue.json",
    "output_audio_path": "final_podcast.mp3",
    "audio_model": "tts-1"
}

Running the Workflow

  • Place the configuration file (e.g., config.json) in the project directory.
  • Run the workflow with the following command:
dapr run --app-id doc2podcast --resources-path components -- python3 workflow.py --config config.json
  • Output:
    • Transcript: A structured transcript saved as podcast_dialogue.json by default. An example can be found in the current directory.
    • Audio: The final podcast audio saved as final_podcast.mp3 as default. An example can be found here.

Next Steps

This workflow is a simple starting point. Future enhancements could include:

  • Processing Multiple Files: Extend the workflow to handle batches of PDFs.
  • Advanced Text Splitting: Dynamically split text based on content rather than pages.
  • Web Search Integration: Pull additional context or related research from the web.
  • Multi-Modal Content: Process documents alongside images, slides, or charts.