discourse-ai

Commit Graph

Author	SHA1	Message	Date
Sam	c34fcc8a95	FEATURE: forum researcher persona for deep research (#1313 ) This commit introduces a new Forum Researcher persona specialized in deep forum content analysis along with comprehensive improvements to our AI infrastructure. Key additions: New Forum Researcher persona with advanced filtering and analysis capabilities Robust filtering system supporting tags, categories, dates, users, and keywords LLM formatter to efficiently process and chunk research results Infrastructure improvements: Implemented CancelManager class to centrally manage AI completion cancellations Replaced callback-based cancellation with a more robust pattern Added systematic cancellation monitoring with callbacks Other improvements: Added configurable default_enabled flag to control which personas are enabled by default Updated translation strings for the new researcher functionality Added comprehensive specs for the new components Renames Researcher -> Web Researcher This change makes our AI platform more stable while adding powerful research capabilities that can analyze forum trends and surface relevant content.	2025-05-14 12:36:16 +10:00
Sam	1dde82eb58	FEATURE: allow specifying tool use none in completion prompt This PR adds support for disabling further tool calls by setting tool_choice to :none across all supported LLM providers: - OpenAI: Uses "none" tool_choice parameter - Anthropic: Uses {type: "none"} and adds a prefill message to prevent confusion - Gemini: Sets function_calling_config mode to "NONE" - AWS Bedrock: Doesn't natively support tool disabling, so adds a prefill message We previously used to disable tool calls by simply removing tool definitions, but this would cause errors with some providers. This implementation uses the supported method appropriate for each provider while providing a fallback for Bedrock. Co-authored-by: Natalie Tay <natalie.tay@gmail.com> * remove stray puts * cleaner chain breaker for last tool call (works in thinking) remove unused code * improve test --------- Co-authored-by: Natalie Tay <natalie.tay@gmail.com>	2025-03-25 08:06:43 +11:00
Sam	8f4cd2fcbd	FEATURE: allow disabling of top_p and temp for thinking models (#1184 ) thinking models such as Claude 3.7 Thinking and o1 / o3 do not support top_p or temp. Previously you would have to carefully remove it from everywhere by having it be a provider param we now support blanker removing without forcing people to update automation rules or personas	2025-03-11 16:54:02 +11:00
Sam	f6eedf3e0b	FEATURE: implement thinking token support (#1155 ) adds support for "thinking tokens" - a feature that exposes the model's reasoning process before providing the final response. Key improvements include: - Add a new Thinking class to handle thinking content from LLMs - Modify endpoints (Claude, AWS Bedrock) to handle thinking output - Update AI bot to display thinking in collapsible details section - Fix SEARCH/REPLACE blocks to support empty replacement strings and general improvements to artifact editing - Allow configurable temperature in triage and report automations - Various bug fixes and improvements to diff parsing	2025-03-04 12:22:30 +11:00
Sam	fe19133dd4	FEATURE: full support for Sonnet 3.7 (#1151 ) * FEATURE: full support for Sonnet 3.7 - Adds support for Sonnet 3.7 with reasoning on bedrock and anthropic - Fixes regression where provider params were not populated Note. reasoning tokens are hardcoded to minimum of 100 maximum of 65536 * FIX: open ai non reasoning models need to use deprecate max_tokens	2025-02-25 17:32:12 +11:00
Sam	12f00a62d2	FIX: use max_completion_tokens for open ai models (#1134 ) max_tokens is now deprecated per API	2025-02-19 15:49:15 +11:00
Sam	a7d032fa28	DEV: artifact system update (#1096 ) ### Why This pull request fundamentally restructures how AI bots create and update web artifacts to address critical limitations in the previous approach: 1. Improved Artifact Context for LLMs: Previously, artifact creation and update tools included the entire artifact source code directly in the tool arguments. This overloaded the Language Model (LLM) with raw code, making it difficult for the LLM to maintain a clear understanding of the artifact's current state when applying changes. The LLM would struggle to differentiate between the base artifact and the requested modifications, leading to confusion and less effective updates. 2. Reduced Token Usage and History Bloat: Including the full artifact source code in every tool interaction was extremely token-inefficient. As conversations progressed, this redundant code in the history consumed a significant number of tokens unnecessarily. This not only increased costs but also diluted the context for the LLM with less relevant historical information. 3. Enabling Updates for Large Artifacts: The lack of a practical diff or targeted update mechanism made it nearly impossible to efficiently update larger web artifacts. Sending the entire source code for every minor change was both computationally expensive and prone to errors, effectively blocking the use of AI bots for meaningful modifications of complex artifacts. This pull request addresses these core issues by: * Introducing methods for the AI bot to explicitly read and understand the current state of an artifact. * Implementing efficient update strategies that send targeted changes rather than the entire artifact source code. * Providing options to control the level of artifact context included in LLM prompts, optimizing token usage. ### What The main changes implemented in this PR to resolve the above issues are: 1. `Read Artifact` Tool for Contextual Awareness: - A new `read_artifact` tool is introduced, enabling AI bots to fetch and process the current content of a web artifact from a given URL (local or external). - This provides the LLM with a clear and up-to-date representation of the artifact's HTML, CSS, and JavaScript, improving its understanding of the base to be modified. - By cloning local artifacts, it allows the bot to work with a fresh copy, further enhancing context and control. 2. Refactored `Update Artifact` Tool with Efficient Strategies: - The `update_artifact` tool is redesigned to employ more efficient update strategies, minimizing token usage and improving update precision: - `diff` strategy: Utilizes a search-and-replace diff algorithm to apply only the necessary, targeted changes to the artifact's code. This significantly reduces the amount of code sent to the LLM and focuses its attention on the specific modifications. - `full` strategy: Provides the option to replace the entire content sections (HTML, CSS, JavaScript) when a complete rewrite is required. - Tool options enhance the control over the update process: - `editor_llm`: Allows selection of a specific LLM for artifact updates, potentially optimizing for code editing tasks. - `update_algorithm`: Enables choosing between `diff` and `full` update strategies based on the nature of the required changes. - `do_not_echo_artifact`: Defaults to true, and by not echoing the artifact in prompts, it further reduces token consumption in scenarios where the LLM might not need the full artifact context for every update step (though effectiveness might be slightly reduced in certain update scenarios). 3. System and General Persona Tool Option Visibility and Customization: - Tool options, including those for system personas, are made visible and editable in the admin UI. This allows administrators to fine-tune the behavior of all personas and their tools, including setting specific LLMs or update algorithms. This was previously limited or hidden for system personas. 4. Centralized and Improved Content Security Policy (CSP) Management: - The CSP for AI artifacts is consolidated and made more maintainable through the `ALLOWED_CDN_SOURCES` constant. This improves code organization and future updates to the allowed CDN list, while maintaining the existing security posture. 5. Codebase Improvements: - Refactoring of diff utilities, introduction of strategy classes, enhanced error handling, new locales, and comprehensive testing all contribute to a more robust, efficient, and maintainable artifact management system. By addressing the issues of LLM context confusion, token inefficiency, and the limitations of updating large artifacts, this pull request significantly improves the practicality and effectiveness of AI bots in managing web artifacts within Discourse.	2025-02-04 16:27:27 +11:00
Sam	381a2715c8	FEATURE: o3-mini supports (#1105 ) 1. Adds o3-mini presets 2. Adds support for reasoning effort 3. properly use "developer" messages for reasoning models	2025-02-01 14:08:34 +11:00
Sam	20612fde52	FEATURE: add the ability to disable streaming on an Open AI LLM Disabling streaming is required for models such o1 that do not have streaming enabled yet It is good to carry this feature around in case various apis decide not to support streaming endpoints and Discourse AI can continue to work just as it did before. Also: fixes issue where sharing artifacts would miss viewport leading to tiny artifacts on mobile	2025-01-13 17:01:01 +11:00
Sam	bc0657f478	FEATURE: AI Usage page (#964 ) - Added a new admin interface to track AI usage metrics, including tokens, features, and models. - Introduced a new route `/admin/plugins/discourse-ai/ai-usage` and supporting API endpoint in `AiUsageController`. - Implemented `AiUsageSerializer` for structuring AI usage data. - Integrated CSS stylings for charts and tables under `stylesheets/modules/llms/common/usage.scss`. - Enhanced backend with `AiApiAuditLog` model changes: added `cached_tokens` column (implemented with OpenAI for now) with relevant DB migration and indexing. - Created `Report` module for efficient aggregation and filtering of AI usage metrics. - Updated AI Bot title generation logic to log correctly to user vs bot - Extended test coverage for the new tracking features, ensuring data consistency and access controls.	2024-11-29 06:26:48 +11:00
Sam	0d7f353284	FEATURE: AI artifacts (#898 ) This is a significant PR that introduces AI Artifacts functionality to the discourse-ai plugin along with several other improvements. Here are the key changes: 1. AI Artifacts System: - Adds a new `AiArtifact` model and database migration - Allows creation of web artifacts with HTML, CSS, and JavaScript content - Introduces security settings (`strict`, `lax`, `disabled`) for controlling artifact execution - Implements artifact rendering in iframes with sandbox protection - New `CreateArtifact` tool for AI to generate interactive content 2. Tool System Improvements: - Adds support for partial tool calls, allowing incremental updates during generation - Better handling of tool call states and progress tracking - Improved XML tool processing with CDATA support - Fixes for tool parameter handling and duplicate invocations 3. LLM Provider Updates: - Updates for Anthropic Claude models with correct token limits - Adds support for native/XML tool modes in Gemini integration - Adds new model configurations including Llama 3.1 models - Improvements to streaming response handling 4. UI Enhancements: - New artifact viewer component with expand/collapse functionality - Security controls for artifact execution (click-to-run in strict mode) - Improved dialog and response handling - Better error management for tool execution 5. Security Improvements: - Sandbox controls for artifact execution - Public/private artifact sharing controls - Security settings to control artifact behavior - CSP and frame-options handling for artifacts 6. Technical Improvements: - Better post streaming implementation - Improved error handling in completions - Better memory management for partial tool calls - Enhanced testing coverage 7. Configuration: - New site settings for artifact security - Extended LLM model configurations - Additional tool configuration options This PR significantly enhances the plugin's capabilities for generating and displaying interactive content while maintaining security and providing flexible configuration options for administrators.	2024-11-19 09:22:39 +11:00
Sam	823e8ef490	FEATURE: partial tool call support for OpenAI and Anthropic (#908 ) Implement streaming tool call implementation for Anthropic and Open AI. When calling: llm.generate(..., partial_tool_calls: true) do ... Partials may contain ToolCall instances with partial: true, These tool calls are partially populated with json partially parsed. So for example when performing a search you may get: ToolCall(..., {search: "hello" }) ToolCall(..., {search: "hello world" }) The library used to parse json is: https://github.com/dgraham/json-stream We use a fork cause we need access to the internal buffer. This prepares internals to perform partial tool calls, but does not implement it yet.	2024-11-14 06:58:24 +11:00
Sam	e817b7dc11	FEATURE: improve tool support (#904 ) This re-implements tool support in DiscourseAi::Completions::Llm #generate Previously tool support was always returned via XML and it would be the responsibility of the caller to parse XML New implementation has the endpoints return ToolCall objects. Additionally this simplifies the Llm endpoint interface and gives it more clarity. Llms must implement decode, decode_chunk (for streaming) It is the implementers responsibility to figure out how to decode chunks, base no longer implements. To make this easy we ship a flexible json decoder which is easy to wire up. Also (new) Better debugging for PMs, we now have a next / previous button to see all the Llm messages associated with a PM Token accounting is fixed for vllm (we were not correctly counting tokens)	2024-11-12 08:14:30 +11:00
Sam	bffe9dfa07	FIX: we must properly encode objects prior to escaping (#891 ) in cases of arrays escapeHTML will not work) *	2024-11-04 16:16:25 +11:00
Sam	c352054d4e	FIX: encode parameters returned from LLMs correctly (#889 ) Fixes encoding of params on LLM function calls. Previously we would improperly return results if a function parameter returned an HTML tag. Additionally adds some missing HTTP verbs to tool calls.	2024-11-04 10:07:17 +11:00
Rafael dos Santos Silva	3022d34613	FEATURE: Support srv records for OpenAI compatible LLMs (#865 )	2024-10-24 15:47:12 -03:00
Sam	059d3b6fd2	FEATURE: better logging for automation reports (#853 ) A new feature_context json column was added to ai_api_audit_logs This allows us to store rich json like context on any LLM request made. This new field now stores automation id and name. Additionally allows llm_triage to specify maximum number of tokens This means that you can limit the cost of llm triage by scanning only first N tokens of a post.	2024-10-23 16:49:56 +11:00
Sam	545500b329	FEATURE: allows forced LLM tool use (#818 ) * FEATURE: allows forced LLM tool use Sometimes we need to force LLMs to use tools, for example in RAG like use cases we may want to force an unconditional search. The new framework allows you backend to force tool usage. Front end commit to follow * UI for forcing tools now works, but it does not react right * fix bugs * fix tests, this is now ready for review	2024-10-05 09:46:57 +10:00
Sam	4b21eb7974	FEATURE: basic support for GPT-o models (#804 ) Caveats - No streaming, by design - No tool support (including no XML tools) - No vision Open AI will revamt the model and more of these features may become available. This solution is a bit hacky for now	2024-09-17 09:41:00 +10:00
Roman Rizzi	bed044448c	DEV: Remove old code now that features rely on LlmModels. (#729 ) * DEV: Remove old code now that features rely on LlmModels. * Hide old settings and migrate persona llm overrides * Remove shadowing special URL + seeding code. Use srv:// prefix instead.	2024-07-30 13:44:57 -03:00
Roman Rizzi	f622e2644f	FEATURE: Store provider-specific parameters. (#686 ) Previously, we stored request parameters like the OpenAI organization and Bedrock's access key and region as site settings. This change stores them in the `llm_models` table instead, letting us drop more settings while also becoming more flexible.	2024-06-25 08:26:30 +10:00
Roman Rizzi	bd1490a536	FIX: include_usage is not available in the Azure API. (#648 ) Follow-up #618	2024-05-28 16:55:43 -03:00
Roman Rizzi	1d786fbaaf	FEATURE: Set endpoint credentials directly from LlmModel. (#625 ) * FEATURE: Set endpoint credentials directly from LlmModel. Drop Llama2Tokenizer since we no longer use it. * Allow http for custom LLMs --------- Co-authored-by: Rafael Silva <xfalcox@gmail.com>	2024-05-16 09:50:22 -03:00
Sam	8eee6893d6	FEATURE: GPT4o support and better auditing (#618 ) - Introduce new support for GPT4o (automation / bot / summary / helper) - Properly account for token counts on OpenAI models - Track feature that was used when generating AI completions - Remove custom llm support for summarization as we need better interfaces to control registration and de-registration	2024-05-14 13:28:46 +10:00
Roman Rizzi	e22194f321	HACK: Llama3 support for summarization/AI helper. (#616 ) There are still some limitations to which models we can support with the `LlmModel` class. This will enable support for Llama3 while we sort those out.	2024-05-13 15:54:42 -03:00
Roman Rizzi	4f1a3effe0	REFACTOR: Migrate Vllm/TGI-served models to the OpenAI format. (#588 ) Both endpoints provide OpenAI-compatible servers. The only difference is that Vllm doesn't support passing tools as a separate parameter. Even if the tool param is supported, it ultimately relies on the model's ability to handle native functions, which is not the case with the models we have today. As a part of this change, we are dropping support for StableBeluga/Llama2 models. They don't have a chat_template, meaning the new API can translate them. These changes let us remove some of our existing dialects and are a first step in our plan to support any LLM by defining them as data-driven concepts. I rewrote the "translate" method to use a template method and extracted the tool support strategies into its classes to simplify the code. Finally, these changes bring support for Ollama when running in dev mode. It only works with Mistral for now, but it will change soon..	2024-05-07 10:02:16 -03:00
Sam	6623928b95	FIX: call after tool calls failing on OpenAI / Gemini (#599 ) A recent change meant that llm instance got cached internally, repeat calls to inference would cache data in Endpoint object leading model to failures. Both Gemini and Open AI expect a clean endpoint object cause they set data. This amends internals to make sure llm.generate will always operate on clean objects	2024-05-01 17:50:58 +10:00
Sam	a223d18f1a	FIX: more robust function call support (#581 ) For quite a few weeks now, some times, when running function calls on Anthropic we would get a "stray" - "calls" line. This has been enormously frustrating! I have been unable to find the source of the bug so instead decoupled the implementation and create a very clear "function call normalizer" This new class is extensively tested and guards against the type of edge cases we saw pre-normalizer. This also simplifies the implementation of "endpoint" which no longer needs to handle all this complex logic.	2024-04-19 06:54:54 +10:00
Sam	6de9c53a71	FEATURE: remove gpt-4-turbo-0125 preview swap with gpt-4-turbo (#568 ) Open AI just released gpt-4-turbo (with vision) This change stops using the old preview model and swaps with the officially released gpt-4-turbo To come is an implementation of vision.	2024-04-10 09:53:20 -03:00
Sam	2ad743d246	FEATURE: Add GitHub Helper AI Bot persona and tools (#513 ) Introduces a new AI Bot persona called 'GitHub Helper' which is specialized in assisting with GitHub-related tasks and questions. It includes the following key changes: - Implements the GitHub Helper persona class with its system prompt and available tools - Adds three new AI Bot tools for GitHub interactions: - github_file_content: Retrieves content of files from a GitHub repository - github_pull_request_diff: Retrieves the diff for a GitHub pull request - github_search_code: Searches for code in a GitHub repository - Updates the AI Bot dialects to support the new GitHub tools - Implements multiple function calls for standard tool dialect	2024-03-08 06:37:23 +11:00
Sam	c02794cf2e	FIX: support multiple tool calls (#502 ) * FIX: support multiple tool calls Prior to this change we had a hard limit of 1 tool call per llm round trip. This meant you could not google multiple things at once or perform searches across two tools. Also: - Hint when Google stops working - Log topic_id / post_id when performing completions * Also track id for title	2024-03-02 07:53:21 +11:00
Sam	9fb1430e40	FIX: support spaces within arguments for Open AI (#499 ) Previous to this fix if a tool call ever streamed a SPACE alone, we would eat it and ignore it, breaking params Also fixes some tests to ensure they are actually called :)	2024-02-29 12:47:34 +11:00
Keegan George	a9b2d6a30a	FEATURE: AI image caption (#470 ) This PR adds a new feature where you can generate captions for images in the composer using AI. --------- Co-authored-by: Rafael Silva <xfalcox@gmail.com>	2024-02-19 14:56:28 -03:00
Roman Rizzi	0634b85a81	UX: Validations to LLM-backed features (except AI Bot) (#436 ) * UX: Validations to Llm-backed features (except AI Bot) This change is part of an ongoing effort to prevent enabling a broken feature due to lack of configuration. We also want to explicit which provider we are going to use. For example, Claude models are available through AWS Bedrock and Anthropic, but the configuration differs. Validations are: * You must choose a model before enabling the feature. * You must turn off the feature before setting the model to blank. * You must configure each model settings before being able to select it. * Add provider name to summarization options * vLLM can technically support same models as HF * Check we can talk to the selected model * Check for Bedrock instead of anthropic as a site could have both creds setup	2024-01-29 16:04:25 -03:00
Sam	092da860e2	FEATURE: support gpt-4-0125 which was just released (#443 ) The new model has better performance and is always preferable to the old one which has unicode issues during function calls.	2024-01-26 09:08:02 +11:00
Sam	03fc94684b	FIX: AI helper not working correctly with mixtral (#399 ) * FIX: AI helper not working correctly with mixtral This PR introduces a new function on the generic llm called #generate This will replace the implementation of completion! #generate introduces a new way to pass temperature, max_tokens and stop_sequences Then LLM implementers need to implement #normalize_model_params to ensure the generic names match the LLM specific endpoint This also adds temperature and stop_sequences to completion_prompts this allows for much more robust completion prompts * port everything over to #generate * Fix translation - On anthropic this no longer throws random "This is your translation:" - On mixtral this actually works * fix markdown table generation as well	2024-01-04 09:53:47 -03:00
Roman Rizzi	4182af230a	FIX: Correctly translate and read tools for Claude and Chat GPT. (#393 ) I tested against the live models for the AI bot migration. It ensures Open AI's tool syntax is correct and we can correctly read the replies. :	2024-01-02 11:21:13 -03:00
Sam	529703b5ec	FEATURE: support sending AI report to an email address (#368 ) Support emailing the AI report to any arbitrary email	2023-12-19 17:51:49 +11:00
Sam	d0f54443ae	FEATURE: LLM based peroidical summary report (#357 ) Introduce a Discourse Automation based periodical report. Depends on Discourse Automation. Report works best with very large context language models such as GPT-4-Turbo and Claude 2. - Introduces final_insts to generic llm format, for claude to work best it is better to guide the last assistant message (we should add this to other spots as well) - Adds GPT-4 turbo support to generic llm interface	2023-12-19 12:04:15 +11:00
Roman Rizzi	e0bf6adb5b	DEV: Tool support for the LLM service. (#366 ) This PR adds tool support to available LLMs. We'll buffer tool invocations and return them instead of making users of this service parse the response. It also adds support for conversation context in the generic prompt. It includes bot messages, user messages, and tool invocations, which we'll trim to make sure it doesn't exceed the prompt limit, then translate them to the correct dialect. Finally, It adds some buffering when reading chunks to handle cases when streaming is extremely slow.:M	2023-12-18 18:06:01 -03:00
Sam	6ddc17fd61	DEV: port directory structure to Zeitwerk (#319 ) Previous to this change we relied on explicit loading for a files in Discourse AI. This had a few downsides: - Busywork whenever you add a file (an extra require relative) - We were not keeping to conventions internally ... some places were OpenAI others are OpenAi - Autoloader did not work which lead to lots of full application broken reloads when developing. This moves all of DiscourseAI into a Zeitwerk compatible structure. It also leaves some minimal amount of manual loading (automation - which is loading into an existing namespace that may or may not be there) To avoid needing /lib/discourse_ai/... we mount a namespace thus we are able to keep /lib pointed at ::DiscourseAi Various files were renamed to get around zeitwerk rules and minimize usage of custom inflections Though we can get custom inflections to work it is not worth it, will require a Discourse core patch which means we create a hard dependency.	2023-11-29 15:17:46 +11:00
Roman Rizzi	3064d4c288	REFACTOR: Summarization and HyDE now use an LLM abstraction. (#297 ) * DEV: One LLM abstraction to rule them all * REFACTOR: HyDE search uses new LLM abstraction * REFACTOR: Summarization uses the LLM abstraction * Updated documentation and made small fixes. Remove Bedrock claude-2 restriction	2023-11-23 12:58:54 -03:00

42 Commits