Commit Graph

  • 0eff85e3c2
    models: avoid error response when request is canceled/timed out Dorin Geman 2025-07-22 09:43:39 +0300
  • 505613344c
    Merge pull request #111 from docker/recorder-use-model-ref Jacob Howard 2025-07-18 18:28:59 +0300
  • 3b5b4b58e4
    recorder: use model ref rather than model ID Jacob Howard 2025-07-18 12:28:51 +0300
  • c873021e07
    Merge pull request #112 from docker/context-regulation Jacob Howard 2025-07-18 18:19:30 +0300
  • 843a94cf1b
    Merge pull request #110 from docker/recorder-memory-leak Jacob Howard 2025-07-18 18:18:52 +0300
  • 3fef8f2980
    Use more aggressive server shutdown and resequence termination Jacob Howard 2025-07-18 13:17:47 +0300
  • 2fd8517720
    recorder: fix memory leak due to slice appends Jacob Howard 2025-07-18 12:24:31 +0300
  • a8437d373b
    Use modelID instead of model tag (#98) Ignasi 2025-07-14 15:34:04 +0200
  • 566ec6f11a
    Merge pull request #107 from doringeman/cors Dorin-Andrei Geman 2025-07-14 14:28:10 +0300
  • 3d466a2d80
    runner: Do not allow CORS headers set by the inference engines Dorin Geman 2025-07-14 14:16:54 +0300
  • 8907b3ddf8
    Merge pull request #100 from doringeman/rm-untagged-msg Dorin-Andrei Geman 2025-07-11 16:08:25 +0300
  • 2bd09df3ea
    models: return untagged model information on delete Dorin Geman 2025-07-04 14:27:54 +0300
  • d370bbddbc
    Move passthrough model name to constant and clarify error message. openai-passthrough Jacob Howard 2025-07-11 15:05:37 +0300
  • 74c9839a5b
    Update pkg/inference/scheduling/scheduler.go Jacob Howard 2025-07-11 05:59:59 -0600
  • fb9e1b83ef
    Update pkg/inference/backends/openai/openai.go Jacob Howard 2025-07-11 05:57:24 -0600
  • e9f3b2f9e9
    Include all backends' disk usage in total. Jacob Howard 2025-07-10 17:25:14 +0300
  • c05842c0a5
    Don't attempt to log nil runner configurations. Jacob Howard 2025-07-10 17:16:27 +0300
  • 8a5365132a
    Add OpenAI passthrough backend. Jacob Howard 2025-07-10 17:09:37 +0300
  • 9806a43e79
    Adds support for Multimodal projector file (#104) Ignasi 2025-07-10 14:41:28 +0200
  • c532a6f835
    Fix test ilopezluna 2025-07-10 11:37:40 +0200
  • 17fe98de99
    Includes --proj if the model includes a Multimodal projector file ilopezluna 2025-07-10 11:32:30 +0200
  • 8b5d6aedfb
    When the wf is triggered by the cron we are not setting the default model (#103) Ignasi 2025-07-10 10:05:45 +0200
  • e137b6b4e0
    When the wf is triggered by the cron we are not setting the default model ilopezluna 2025-07-10 09:05:17 +0200
  • 51a93c0aff
    Adds daily check (#102) Ignasi 2025-07-09 13:55:29 +0200
  • d1cb1af584
    Potential fix for code scanning alert no. 52: Workflow does not contain permissions Ignasi 2025-07-09 13:47:45 +0200
  • 3c7613f115
    no need to execute the wf on push anymore ilopezluna 2025-07-09 13:15:10 +0200
  • faedd52810
    changing the JSON payload to use double quotes and proper variable expansion ilopezluna 2025-07-09 13:14:52 +0200
  • 2438078cd4
    Adds apt repo ilopezluna 2025-07-09 13:10:02 +0200
  • 1070c64b35
    Adds daily check ilopezluna 2025-07-09 12:53:15 +0200
  • a23dc5e514
    Include the ID of the model and the actual reference in the logs ilopezluna 2025-07-09 10:42:04 +0200
  • 296d29d7ee
    Merge branch 'main' into use-model-id ilopezluna 2025-07-09 10:21:20 +0200
  • ebdff5c136
    Merge pull request #101 from doringeman/tracking Dorin-Andrei Geman 2025-07-09 10:04:44 +0300
  • e21bd8f5de
    metrics: include current request's User-Agent Dorin Geman 2025-07-08 15:22:38 +0300
  • 66ce06c63b
    Bump model-distribution to include total on pull/push progress messages bump-model-distribution-progress ilopezluna 2025-07-03 15:45:15 +0200
  • dfeb5677b4
    Uses runnerInfo as a value type for runner map, to store runtime info of the runner. Currently the slot and the model reference used. ilopezluna 2025-07-02 16:07:41 +0200
  • d47c0f84bb
    Rename model to modelID to make it clear ilopezluna 2025-07-02 15:50:00 +0200
  • d285a7fcc8
    already sanitized ilopezluna 2025-07-02 15:30:25 +0200
  • 28b1fd04a9
    Potential fix for code scanning alert no. 43: Log entries created from user input Ignasi 2025-07-02 15:27:50 +0200
  • 77b2752e89
    Potential fix for code scanning alert no. 42: Log entries created from user input Ignasi 2025-07-02 15:27:34 +0200
  • 12f80d12da
    Potential fix for code scanning alert no. 40: Log entries created from user input Ignasi 2025-07-02 15:27:06 +0200
  • b6bf9a1463
    Use modelID as key for runners & runnerConfigs map ilopezluna 2025-07-02 12:12:43 +0200
  • 721058167a
    Use modelID as key to record req/resp instead of model tag ilopezluna 2025-07-02 11:56:47 +0200
  • 495cdb02ec
    Add modelID resolver in model manager ilopezluna 2025-07-02 11:56:16 +0200
  • de1ef38e5c
    Merge pull request #93 from docker/context-size Emily Casey 2025-06-27 11:06:14 -0600
  • cbdbb83bf7 Incorporate review feedback Emily Casey 2025-06-27 10:54:59 -0600
  • aac988e62a
    Apply suggestions from code review Emily Casey 2025-06-27 10:47:18 -0600
  • 4d690797f2 Bump model-distribution to point at main branch Emily Casey 2025-06-27 10:45:01 -0600
  • 247d9e06a3 Respect context size from model config Emily Casey 2025-06-25 12:36:25 -0600
  • 26a0a73fbb
    Merge pull request #95 from docker/config-list Jacob Howard 2025-06-27 08:29:17 -0600
  • 24299c17e9
    Merge pull request #96 from doringeman/openai-recorder Dorin-Andrei Geman 2025-06-27 17:27:35 +0300
  • ac055739df
    Implement Flusher interface for responseRecorder Dorin Geman 2025-06-27 17:16:04 +0300
  • fcf45f4271
    Merge pull request #94 from doringeman/misc Dorin-Andrei Geman 2025-06-27 11:58:02 +0300
  • cce6a71cc9
    Allow configuration through argument list (in addition to string) Jacob Howard 2025-06-26 15:19:04 -0600
  • cd8959b37e
    scheduler: move OpenAI recording after runner load/release Dorin Geman 2025-06-26 15:14:26 +0300
  • a6db262a4c
    Merge pull request #91 from doringeman/openai-recorder Dorin-Andrei Geman 2025-06-25 15:51:43 +0300
  • 5d31399dca Log Hub response in llama-server auto update failure path Piotr Stankiewicz 2025-06-25 12:38:47 +0200
  • 3a5500500e
    OpenAIRecorder: Capture User-Agent header in records Dorin Geman 2025-06-25 15:17:39 +0300
  • ea9a321253
    OpenAIRecorder: Include BackendConfiguration Dorin Geman 2025-06-25 14:54:23 +0300
  • 3904f2314d
    OpenAIRecorder: Remove records on model eviction/termination Dorin Geman 2025-06-25 14:29:16 +0300
  • 3e38960805 Log Hub response in llama-server auto update failure path Piotr Stankiewicz 2025-06-25 12:38:47 +0200
  • be8f3e6696
    Add OpenAIRecorder Dorin Geman 2025-06-24 11:59:21 +0300
  • b8561e1bfe
    Only run tests once (#89) Ignasi 2025-06-20 13:14:20 +0200
  • 927b1ce12b
    Adds newline at the end of the file ilopezluna 2025-06-20 13:11:29 +0200
  • 4a31c6442e
    Only run tests once ilopezluna 2025-06-20 13:02:06 +0200
  • e0d5ff8a3e
    Bump model-distribution (#88) Ignasi 2025-06-20 10:07:48 +0200
  • 605a8c427d
    go mod tidy ilopezluna 2025-06-20 10:03:00 +0200
  • ccc2a2bde6
    Merge pull request #87 from docker/cloud-ttl-8hour Jacob Howard 2025-06-18 07:49:17 -0600
  • 24a2a4b030
    Merge pull request #86 from docker/config-refinement Jacob Howard 2025-06-18 07:48:52 -0600
  • ceb2e557d6
    Bump model-distribution ilopezluna 2025-06-18 15:22:09 +0200
  • e57dcc0737 Update pkg/inference/scheduling/runner.go Piotr 2025-06-17 16:52:35 +0200
  • e249e49421 Fix stall in case a runner crashes while not in active use Piotr Stankiewicz 2025-06-17 16:14:58 +0200
  • f4a5e4b52d Return error in case of runner crash Piotr Stankiewicz 2025-06-05 13:56:16 +0200
  • e0ddfdf277
    Temporarily bump GPU-enabled cloud idle timeout to 8 hours. Jacob Howard 2025-06-17 10:45:12 -0600
  • 9f022114ba
    Try to make backend configuration a little more robust. Jacob Howard 2025-06-17 10:11:58 -0600
  • 5984f8a9a9
    Update pkg/inference/scheduling/runner.go Piotr 2025-06-17 16:52:35 +0200
  • 234d5aa387 Fix stall in case a runner crashes while not in active use Piotr Stankiewicz 2025-06-17 16:14:58 +0200
  • aaacee4160 Return error in case of runner crash Piotr Stankiewicz 2025-06-05 13:56:16 +0200
  • 64153a7b4a
    Create home folder (#85) Ignasi 2025-06-16 19:46:18 +0200
  • 1a5e637426
    Merge pull request #83 from docker/model-runner-environment Jacob Howard 2025-06-16 10:01:53 -0600
  • 0cd306e68b
    Detect operating environment and adjust certain behaviors. Jacob Howard 2025-06-13 16:31:48 -0600
  • 5f263055dc
    adds --home-dir to make it more clear ilopezluna 2025-06-16 16:06:04 +0200
  • 5363abf8d9
    Create home folder for modelrunner user ilopezluna 2025-06-16 16:01:50 +0200
  • 9933b7dabf
    Adds metrics endpoint (#78) Ignasi 2025-06-16 10:18:20 +0200
  • 9c4674edcf
    clean up ilopezluna 2025-06-16 09:32:42 +0200
  • 5a6fd96347
    wrap ResponseRecorder to avoid a race during tests bump-model-distribution ilopezluna 2025-06-13 17:29:59 +0200
  • 7e7c7d0131
    bump model-distribution ilopezluna 2025-06-13 15:49:43 +0200
  • bb394fb4e8
    remove unneeded dep ilopezluna 2025-06-13 12:30:38 +0200
  • c304305804
    I missed commiting this ilopezluna 2025-06-13 12:27:39 +0200
  • e92a9f0167
    Merge branch 'main' into add-metrics ilopezluna 2025-06-13 12:25:07 +0200
  • 17047a1413
    acquire/release the loader's lock ilopezluna 2025-06-13 12:23:59 +0200
  • d8e2620afe
    replace custom parser with official Prometheus libraries ilopezluna 2025-06-13 12:13:43 +0200
  • dac689435a
    Remove NewSchedulerMetricsHandler, not used ilopezluna 2025-06-13 10:56:27 +0200
  • 6d3a7aca16
    Merge pull request #77 from doringeman/defunct-runners Dorin-Andrei Geman 2025-06-13 11:52:52 +0300
  • 6cd0e61090 Parametrise llama-server version in release pipeline Piotr Stankiewicz 2025-06-13 10:43:35 +0200
  • 93c1dc3304 Parametrise llama-server version in release pipeline Piotr Stankiewicz 2025-06-13 10:43:35 +0200
  • 6b8c3b816f Update pkg/inference/scheduling/scheduler.go Piotr 2025-06-13 09:50:01 +0200
  • 099b12231d Support runner configuration (temporary solution) Piotr Stankiewicz 2025-06-11 15:20:21 +0200
  • 3dafee34b8
    Update pkg/inference/scheduling/scheduler.go Piotr 2025-06-13 09:50:01 +0200
  • 0e365a15a5 Support runner configuration (temporary solution) Piotr Stankiewicz 2025-06-11 15:20:21 +0200
  • 0130eb6cf6 Prepare Docker file ahead of multi-backend builds of llama-server Piotr Stankiewicz 2025-06-12 15:30:21 +0200