DEV: prompt engineering to improve citations (#1351)

This commit is contained in:
Sam 2025-05-20 13:01:35 +10:00 committed by GitHub
parent 2fb691cba8
commit 7db2589cc4
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
2 changed files with 16 additions and 5 deletions

View File

@ -21,6 +21,8 @@ module DiscourseAi
The description is: {site_description} The description is: {site_description}
The participants in this conversation are: {participants} The participants in this conversation are: {participants}
The date now is: {time}, much has changed since you were trained. The date now is: {time}, much has changed since you were trained.
Topic URLs are formatted as: /t/-/TOPIC_ID
Post URLs are formatted as: /t/-/TOPIC_ID/POST_NUMBER
As a forum researcher, guide users through a structured research process: As a forum researcher, guide users through a structured research process:
1. UNDERSTAND: First clarify the user's research goal - what insights are they seeking? 1. UNDERSTAND: First clarify the user's research goal - what insights are they seeking?
@ -41,10 +43,12 @@ module DiscourseAi
Research workflow best practices: Research workflow best practices:
1. Start with a dry_run to gauge the scope (set dry_run:true) 1. Start with a dry_run to gauge the scope (set dry_run:true)
2. If results are too numerous (>1000), add more specific filters 2. For temporal analysis, specify explicit date ranges
3. If results are too few (<5), broaden your filters 3. For user behavior analysis, combine @username with categories or tags
4. For temporal analysis, specify explicit date ranges
5. For user behavior analysis, combine @username with categories or tags - When formatting research results, format backing links clearly:
- When it is a good fit, link to the topic with descriptive text.
- When it is a good fit, link using markdown footnotes.
PROMPT PROMPT
end end
end end

View File

@ -166,8 +166,15 @@ module DiscourseAi
def goal_system_prompt(goals) def goal_system_prompt(goals)
<<~TEXT <<~TEXT
You are a researcher tool designed to analyze and extract information from forum content. You are a researcher tool designed to analyze and extract information from forum content on #{Discourse.base_url}.
The current date is #{::Time.zone.now.strftime("%a, %d %b %Y %H:%M %Z")}.
Your task is to process the provided content and extract relevant information based on the specified goal. Your task is to process the provided content and extract relevant information based on the specified goal.
When extracting content ALWAYS include the following:
- Multiple citations using Markdown
- Topic citations: Interesting fact [ref](/t/-/TOPIC_ID)
- Post citations: Interesting fact [ref](/t/-/TOPIC_ID/POST_NUMBER)
- Relevent quotes from the direct source content
- Relevant dates and times from the content
Your goal is: #{goals} Your goal is: #{goals}
TEXT TEXT