discourse-ai

Commit Graph

Author	SHA1	Message	Date
Rafael dos Santos Silva	4eac377987	DEV: Zero delays on fake endpoint used in tests (#1311 )	2025-05-05 17:47:32 -03:00
Roman Rizzi	fccd072f44	DEV: Don't use delays for streaming summaries. (#1244 ) We started used a callback as a buffer in FoldContent, so the Fake endpoint is attempting to emulate delays in the streaming. However, we don't care about that in these specs.	2025-04-02 13:38:15 -03:00
Roman Rizzi	1b1b44353b	FEATURE: Changes to summaries' outdated logic. (#1108 ) Before this change, a summary was only outdated when new content appeared, for topics with "best replies", when the query returned different results. The intent behind this change is to detect when a summary is outdated as a result of an edit. Additionally, we are changing the backfill candidates query to compare "ai_summary_backfill_topic_max_age_days" against "last_posted_at" instead of "created_at", to catch long-lived, active topics. This was discussed here: https://meta.discourse.org/t/ai-summarization-backfill-is-stuck-keeps-regenerating-the-same-topic/347088/14?u=roman_rizzi	2025-02-04 09:31:11 -03:00
Roman Rizzi	f5cf1019fb	FEATURE: configurable embeddings (#1049 ) * Use AR model for embeddings features * endpoints * Embeddings CRUD UI * Add presets. Hide a couple more settings * system specs * Seed embedding definition from old settings * Generate search bit index on the fly. cleanup orphaned data * support for seeded models * Fix run test for new embedding * fix selected model not set correctly	2025-01-21 12:23:19 -03:00
Roman Rizzi	46fcdb6ba5	FIX: Make summaries backfill job more resilient. (#1071 ) To quickly select backfill candidates without comparing SHAs, we compare the last summarized post to the topic's highest_post_number. However, hiding or deleting a post and adding a small action will update this column, causing the job to stall and re-generate the same summary repeatedly until someone posts a regular reply. On top of this, this is not always true for topics with `best_replies`, as this last reply isn't necessarily included. Since this is not evident at first glance and each summarization strategy picks its targets differently, I'm opting to simplify the backfill logic and how we track potential candidates. The first step is dropping `content_range`, which serves no purpose and it's there because summary caching was supposed to work differently at the beginning. So instead, I'm replacing it with a column called `highest_target_number`, which tracks `highest_post_number` for topics and could track other things like channel's `message_count` in the future. Now that we have this column when selecting every potential backfill candidate, we'll check if the summary is truly outdated by comparing the SHAs, and if it's not, we just update the column and move on	2025-01-16 09:42:53 -03:00
Roman Rizzi	94b85ece80	FIX: Make sure gists are atleast five minutes old before updating them (#1029 ) * FIX: Make sure gists are atleast five minutes old before updating them * Update app/jobs/regular/fast_track_topic_gist.rb Co-authored-by: Keegan George <kgeorge13@gmail.com> --------- Co-authored-by: Keegan George <kgeorge13@gmail.com>	2024-12-13 19:36:34 -03:00
Roman Rizzi	1c40a698ca	FIX: get strategy version through vector_rep (#1028 )	2024-12-13 18:49:18 -03:00
Roman Rizzi	eae527f99d	REFACTOR: A Simpler way of interacting with embeddings tables. (#1023 ) * REFACTOR: A Simpler way of interacting with embeddings' tables. This change adds a new abstraction called `Schema`, which acts as a repository that supports the same DB features `VectorRepresentation::Base` has, with the exception that removes the need to have duplicated methods per embeddings table. It is also a bit more flexible when performing a similarity search because you can pass it a block that gives you access to the builder, allowing you to add multiple joins/where conditions.	2024-12-13 10:15:21 -03:00
Roman Rizzi	ce6a2eca21	FEATURE: Backfill posts sentiment. (#982 ) * FEATURE: Backfill posts sentiment. It adds a scheduled job to backfill posts' sentiment, similar to our existing rake task, but with two settings to control the batch size and posts' max-age. * Make sure model_name order is consistent.	2024-12-03 10:27:03 -03:00
Rafael dos Santos Silva	0ac18d157b	FEATURE: Adjustments to gist summaries (#988 ) - makes visible to everyone by default - backfills gists before full summaries - adds configurable max age setting to backfill job	2024-12-02 15:22:35 -03:00
Rafael dos Santos Silva	23193ee6f2	FEATURE: Calculate gists from non hot topics too (#958 ) Also renames some settings to remove 'hot' references.	2024-11-26 13:44:12 -03:00
Roman Rizzi	fbc74c7467	FEATURE: Extend summary backfill to also generate gists (#896 ) Updates default batch size to 0 and max to 10000	2024-11-07 13:40:18 -03:00
Roman Rizzi	9505a8976c	FEATURE: Automatically backfill regular summaries. (#892 ) This change introduces a job to summarize topics and cache the results automatically. We provide a setting to control how many topics we'll backfill per hour and what the topic's minimum word count is to qualify. We'll prioritize topics without summary over outdated ones.	2024-11-04 17:48:11 -03:00
Sam	584753cf60	FIX: we were never reindexing old content (#786 ) * FIX: we were never reindexing old content Embedding backfill contains logic for searching for old content change and then backfilling. Unfortunately it was excluding all topics that had embedding unconditionally, leading to no backfill ever happening. This change adds a test and ensures we backfill. * over select results, this ensures we will be more likely to find ai results when filtered	2024-08-30 14:37:55 +10:00
Roman Rizzi	392e2e8aef	Revert "UX: Validate embeddings settings (#455 )" (#456 ) This reverts commit `85fca89e01`.	2024-02-01 14:06:51 -03:00
Roman Rizzi	85fca89e01	UX: Validate embeddings settings (#455 )	2024-02-01 13:05:38 -03:00
Sam	dcafc8032f	FIX: improve embedding generation (#452 ) 1. on failure we were queuing a job to generate embeddings, it had the wrong params. This is both fixed and covered in a test. 2. backfill embedding in the order of bumped_at, so newest content is embedded first, cover with a test 3. add a safeguard for hidden site setting that only allows batches of 50k in an embedding job run Previously old embeddings were updated in a random order, this changes it so we update in a consistent order	2024-01-31 10:38:47 -03:00

17 Commits