Commit Graph

9 Commits

Author SHA1 Message Date
Rafael dos Santos Silva 4eac377987
DEV: Zero delays on fake endpoint used in tests (#1311) 2025-05-05 17:47:32 -03:00
Roman Rizzi fccd072f44
DEV: Don't use delays for streaming summaries. (#1244)
We started used a callback as a buffer in FoldContent, so the Fake endpoint is attempting
to emulate delays in the streaming. However, we don't care about that in these specs.
2025-04-02 13:38:15 -03:00
Roman Rizzi 1b1b44353b
FEATURE: Changes to summaries' outdated logic. (#1108)
Before this change, a summary was only outdated when new content appeared, for topics with "best replies", when the query returned different results. The intent behind this change is to detect when a summary is outdated as a result of an edit.

Additionally, we are changing the backfill candidates query to compare "ai_summary_backfill_topic_max_age_days" against "last_posted_at" instead of "created_at", to catch long-lived, active topics. This was discussed here: https://meta.discourse.org/t/ai-summarization-backfill-is-stuck-keeps-regenerating-the-same-topic/347088/14?u=roman_rizzi
2025-02-04 09:31:11 -03:00
Roman Rizzi 46fcdb6ba5
FIX: Make summaries backfill job more resilient. (#1071)
To quickly select backfill candidates without comparing SHAs, we compare the last summarized post to the topic's highest_post_number. However, hiding or deleting a post and adding a small action will update this column, causing the job to stall and re-generate the same summary repeatedly until someone posts a regular reply. On top of this, this is not always true for topics with `best_replies`, as this last reply isn't necessarily included.

Since this is not evident at first glance and each summarization strategy picks its targets differently, I'm opting to simplify the backfill logic and how we track potential candidates.

The first step is dropping `content_range`, which serves no purpose and it's there because summary caching was supposed to work differently at the beginning. So instead, I'm replacing it with a column called `highest_target_number`, which tracks `highest_post_number` for topics and could track other things like channel's `message_count` in the future.

Now that we have this column when selecting every potential backfill candidate, we'll check if the summary is truly outdated by comparing the SHAs, and if it's not, we just update the column and move on
2025-01-16 09:42:53 -03:00
Roman Rizzi 94b85ece80
FIX: Make sure gists are atleast five minutes old before updating them (#1029)
* FIX: Make sure gists are atleast five minutes old before updating them

* Update app/jobs/regular/fast_track_topic_gist.rb

Co-authored-by: Keegan George <kgeorge13@gmail.com>

---------

Co-authored-by: Keegan George <kgeorge13@gmail.com>
2024-12-13 19:36:34 -03:00
Rafael dos Santos Silva 0ac18d157b
FEATURE: Adjustments to gist summaries (#988)
- makes visible to everyone by default
- backfills gists before full summaries
- adds configurable max age setting to backfill job
2024-12-02 15:22:35 -03:00
Rafael dos Santos Silva 23193ee6f2
FEATURE: Calculate gists from non hot topics too (#958)
Also renames some settings to remove 'hot' references.
2024-11-26 13:44:12 -03:00
Roman Rizzi fbc74c7467
FEATURE: Extend summary backfill to also generate gists (#896)
Updates default batch size to 0 and max to 10000
2024-11-07 13:40:18 -03:00
Roman Rizzi 9505a8976c
FEATURE: Automatically backfill regular summaries. (#892)
This change introduces a job to summarize topics and cache the results automatically. We provide a setting to control how many topics we'll backfill per hour and what the topic's minimum word count is to qualify.

We'll prioritize topics without summary over outdated ones.
2024-11-04 17:48:11 -03:00