discourse-ai/lib
Roman Rizzi 46fcdb6ba5
FIX: Make summaries backfill job more resilient. (#1071)
To quickly select backfill candidates without comparing SHAs, we compare the last summarized post to the topic's highest_post_number. However, hiding or deleting a post and adding a small action will update this column, causing the job to stall and re-generate the same summary repeatedly until someone posts a regular reply. On top of this, this is not always true for topics with `best_replies`, as this last reply isn't necessarily included.

Since this is not evident at first glance and each summarization strategy picks its targets differently, I'm opting to simplify the backfill logic and how we track potential candidates.

The first step is dropping `content_range`, which serves no purpose and it's there because summary caching was supposed to work differently at the beginning. So instead, I'm replacing it with a column called `highest_target_number`, which tracks `highest_post_number` for topics and could track other things like channel's `message_count` in the future.

Now that we have this column when selecting every potential backfill candidate, we'll check if the summary is truly outdated by comparing the SHAs, and if it's not, we just update the column and move on
2025-01-16 09:42:53 -03:00
..
ai_bot FIX: properly spin down unused streamer threads (#1035) 2024-12-20 12:09:42 +11:00
ai_helper FEATURE: smart date support for AI helper (#1044) 2024-12-31 08:04:25 +11:00
ai_moderation FIX: only hide posts detected explicitly as spam (#1070) 2025-01-15 16:50:41 +11:00
automation FIX: Triage rule should append selected tags instead of replacing them (#1022) 2024-12-11 11:19:44 -03:00
completions FIX: AWS Bedrock non-streaming calls response log (#1072) 2025-01-15 18:51:25 -03:00
configuration FIX: Prevent LLM enumerator from erroring when spam enabled (#1045) 2024-12-27 09:12:29 +11:00
database DEV: port directory structure to Zeitwerk (#319) 2023-11-29 15:17:46 +11:00
discord/bot FEATURE: Discord Bot integration (#831) 2024-10-16 12:41:18 -03:00
embeddings DEV: Embedding tables' model_id has to be a bigint (#1058) 2025-01-14 10:53:06 -03:00
inference FIX: Cloudflare Workers AI embeddings (#1037) 2024-12-20 17:45:27 -03:00
sentiment UX: Make sentiment trends more readable (#1018) 2024-12-11 09:13:18 -08:00
summarization FIX: Make summaries backfill job more resilient. (#1071) 2025-01-16 09:42:53 -03:00
tasks/modules DEV: Add rake task to send topics or posts to spam scanner (#1059) 2025-01-15 11:48:57 +08:00
tokenizer FIX/REFACTOR: FoldContent revamp (#866) 2024-10-25 11:51:17 -03:00
utils FEATURE: allow artifacts to be updated (#980) 2024-12-03 07:23:31 +11:00
automation.rb FIX: AI Automation scripts were broken when using seeded models (#991) 2024-12-02 19:07:05 -03:00
engine.rb DEV: port directory structure to Zeitwerk (#319) 2023-11-29 15:17:46 +11:00
guardian_extensions.rb FEATURE: Calculate gists from non hot topics too (#958) 2024-11-26 13:44:12 -03:00
multisite_hash.rb FIX: properly cache user locale (#593) 2024-04-26 09:28:35 -03:00
post_extensions.rb FEATURE: Backfill posts sentiment. (#982) 2024-12-03 10:27:03 -03:00
summarization.rb FEATURE: Generate topic gists for the hot topics list. (#837) 2024-10-18 18:01:39 -03:00
topic_extensions.rb PERF: Preload only gists when including summaries in topic list (#948) 2024-11-25 12:24:02 -03:00