History

Roberto Rodriguez 75274ac607 Adding DaprWorkflowContext from dapr.ext.workflow (#99 )		2025-04-24 04:05:43 -07:00
..
components	FLOKI Initial Contribution	2025-01-14 13:05:22 -05:00
README.md	FLOKI Initial Contribution	2025-01-14 13:05:22 -05:00
config.json	FLOKI Initial Contribution	2025-01-14 13:05:22 -05:00
podcast_dialogue.json	FLOKI Initial Contribution	2025-01-14 13:05:22 -05:00
requirements.txt	Refreshed all notebooks with dapr_agents module	2025-01-25 19:05:01 -05:00
workflow.py	Adding DaprWorkflowContext from dapr.ext.workflow (#99 )	2025-04-24 04:05:43 -07:00

README.md

Doc2Podcast: Automating Podcast Creation from Research Papers

This workflow is a basic step toward automating the creation of podcast content from research using AI. It demonstrates how to process a single research paper, generate a dialogue-style transcript with LLMs, and convert it into a podcast audio file. While simple, this workflow serves as a foundation for exploring more advanced processes, such as handling multiple documents or optimizing content splitting for better audio output.

Key Features and Workflow

PDF Processing: Downloads a research paper from a specified URL and extracts its content page by page.
LLM-Powered Transcripts: Transforms extracted text into a dialogue-style transcript using a large language model, alternating between a host and participants.
AI-Generated Audio: Converts the transcript into a podcast-like audio file with natural-sounding voices for the host and participants.
Custom Workflow: Saves the final podcast audio and transcript files locally, offering flexibility for future enhancements like handling multiple files or integrating additional AI tools.

Prerequisites

Python 3.8 or higher
Required Python dependencies (install using pip install -r requirements.txt)
A valid OpenAI API key for generating audio content
- Set the OPENAI_API_KEY variable with your key value in an .env file.

Configuration

To run the workflow, provide a configuration file in JSON format. The config.json file in this folder points to the following file "Exploring Applicability of LLM-Powered Autonomous Agents to Solve Real-life Problems". Config example:

{
    "pdf_url": "https://example.com/research-paper.pdf",
    "podcast_name": "AI Explorations",
    "host": {
        "name": "John Doe",
        "voice": "alloy"
    },
    "participants": [
        { "name": "Alice Smith" },
        { "name": "Bob Johnson" }
    ],
    "max_rounds": 4,
    "output_transcript_path": "podcast_dialogue.json",
    "output_audio_path": "final_podcast.mp3",
    "audio_model": "tts-1"
}

Running the Workflow

Place the configuration file (e.g., config.json) in the project directory.
Run the workflow with the following command:

dapr run --app-id doc2podcast --resources-path components -- python3 workflow.py --config config.json

Output:
- Transcript: A structured transcript saved as podcast_dialogue.json by default. An example can be found in the current directory.
- Audio: The final podcast audio saved as final_podcast.mp3 as default. An example can be found here.

Next Steps

This workflow is a simple starting point. Future enhancements could include:

Processing Multiple Files: Extend the workflow to handle batches of PDFs.
Advanced Text Splitting: Dynamically split text based on content rather than pages.
Web Search Integration: Pull additional context or related research from the web.
Multi-Modal Content: Process documents alongside images, slides, or charts.