Commit Graph

  • 2d039ecb65
    Merge pull request #149 from docker/sandboxing main Jacob Howard 2025-09-02 08:26:04 -0600
  • 935aab9b56
    sandbox: rename New to Create Jacob Howard 2025-08-29 18:38:44 -0600
  • 08ba6d8f78
    sandbox: fix up test to be Windows-portable Jacob Howard 2025-08-29 18:06:10 -0600
  • fec154ead1
    deps: used patched go-winjob to support windows/arm64 Jacob Howard 2025-08-29 17:59:10 -0600
  • 5211e724c1
    sandbox: add basic testing Jacob Howard 2025-08-29 17:05:25 -0600
  • 1882c4e64e
    sandbox: adjust macOS sandboxing for Docker Desktop development Jacob Howard 2025-08-29 16:16:11 -0600
  • 4d922ff787
    sandbox: add test for Windows sandbox configuration parsing Jacob Howard 2025-08-29 16:10:44 -0600
  • 9238a83dd3
    sandbox: implement Windows sandboxing and refactor API to accommodate Jacob Howard 2025-08-29 15:45:16 -0600
  • 9741e9a734
    sandbox: enable sandboxing for llama.cpp processes on macOS Jacob Howard 2025-08-29 13:42:41 -0600
  • 3fb54b8295
    Merge e259edb647 into 229e081bc2 Piotr 2025-08-29 15:36:04 +0200
  • e259edb647 inference: Return memory requirement in estimation error ps-better-estimation-error Piotr Stankiewicz 2025-08-29 15:29:57 +0200
  • 229e081bc2
    Merge pull request #147 from docker/openairecorder Dorin-Andrei Geman 2025-08-28 17:06:22 +0300
  • 5449fc9dad refactor(OpenAIRecorder): use Unix timestamp instead of time.Time Dorin Geman 2025-08-28 16:22:00 +0300
  • 3d702d7aca
    Merge pull request #146 from docker/avoid-shallow-copy Jacob Howard 2025-08-26 11:21:10 -0600
  • 1e13a3cac5
    metrics: avoid an unnecessary request shallow copy Jacob Howard 2025-08-26 10:45:00 -0600
  • 9a2dcdfc16
    Merge pull request #145 from doringeman/openairecorder Dorin-Andrei Geman 2025-08-26 18:35:30 +0300
  • a13d77c153
    fix(OpenAIRecorder): set default status code for in progress or canceled HTTP requests Dorin Geman 2025-08-26 17:49:46 +0300
  • af4bb5194f
    fix: move CORS middleware to top level to handle preflight requests (#144) Alberto García Hierro 2025-08-26 12:34:32 +0200
  • bc74763e92 metrics: Record reasoning_content from streaming responses Piotr Stankiewicz 2025-08-25 13:04:34 +0200
  • a356bde677
    fix: move CORS middleware to top level to handle preflight requests Alberto Garcia Hierro 2025-08-25 13:00:05 +0100
  • 3d02fbba29 metrics: Record reasoning_content from streaming responses Piotr Stankiewicz 2025-08-25 13:04:34 +0200
  • 5341c9fc29
    Merge pull request #133 from docker/shards Emily Casey 2025-08-22 11:37:38 -0600
  • b9164891e7 Merge remote-tracking branch 'origin/main' into shards Emily Casey 2025-08-22 11:29:48 -0600
  • 877ea617d4 update distribution mod to point at main branch Emily Casey 2025-08-22 11:29:11 -0600
  • 0f01f66399
    Merge pull request #137 from docker/fix-blob-url Emily Casey 2025-08-22 11:22:11 -0600
  • f09c4b4c98 Update distribution ref to main branch Emily Casey 2025-08-22 11:19:53 -0600
  • 8584839332
    Update pkg/inference/backends/llamacpp/llamacpp_config.go Emily Casey 2025-08-22 11:09:50 -0600
  • bb7abccf47
    Update pkg/inference/backends/llamacpp/llamacpp.go Emily Casey 2025-08-22 10:38:25 -0600
  • 9f7f778e82 Fix remote memory estimation: Emily Casey 2025-08-22 10:35:47 -0600
  • 8d5f251df7 Merge remote-tracking branch 'origin/main' into shards Emily Casey 2025-08-22 09:27:07 -0600
  • 156686cc6f Run from bundle Emily Casey 2025-08-21 22:10:17 -0600
  • d8ed374455 inference: Use common system memory size getter in the loader Piotr Stankiewicz 2025-08-22 15:05:51 +0200
  • eb12528e3c inference: Use common system memory size getter in the loader Piotr Stankiewicz 2025-08-22 15:05:51 +0200
  • 03f7adc077 inference: Fix ignoring parse errors for unknown models Piotr Stankiewicz 2025-08-22 14:46:19 +0200
  • 6d72f943f6 Make sure I don't commit vendor/ again Piotr Stankiewicz 2025-08-22 12:07:38 +0200
  • 77e0de486f Remove vendor/ Piotr Stankiewicz 2025-08-22 12:00:52 +0200
  • d4e64465ea inference: Fix ignoring parse errors for unknown models Piotr Stankiewicz 2025-08-22 14:46:19 +0200
  • b3944c96a0 Make sure I don't commit vendor/ again Piotr Stankiewicz 2025-08-22 12:07:38 +0200
  • aa30bbd19a Remove vendor/ Piotr Stankiewicz 2025-08-22 12:00:52 +0200
  • 933edd2249 inference: Fix up review comments Piotr Stankiewicz 2025-08-21 11:25:43 +0200
  • 64c85dcd83 inference: Support disabling pre-pull memory checks Piotr Stankiewicz 2025-08-19 16:20:04 +0200
  • 15e31feb30 inference: Block pull if model requires too much memory to run Piotr Stankiewicz 2025-07-30 15:13:00 +0200
  • 880818f741 inference: Support memory estimation for remote models Piotr Stankiewicz 2025-07-30 13:11:57 +0200
  • 59da65a365 Bump docker/model-distribution Piotr Stankiewicz 2025-07-30 13:11:07 +0200
  • 44a1498e5b inference: Fix up review comments Piotr Stankiewicz 2025-08-21 11:25:43 +0200
  • e761a77518 inference: Support disabling pre-pull memory checks Piotr Stankiewicz 2025-08-19 16:20:04 +0200
  • 739146e2d5 inference: Block pull if model requires too much memory to run Piotr Stankiewicz 2025-07-30 15:13:00 +0200
  • 01ea183634 inference: Support memory estimation for remote models Piotr Stankiewicz 2025-07-30 13:11:57 +0200
  • fc70f078c6 Bump docker/model-distribution Piotr Stankiewicz 2025-07-30 13:11:07 +0200
  • 1c13e4fc61 inference: Ignore parse errors when estimating model memory Piotr Stankiewicz 2025-08-06 16:40:10 +0200
  • 33b40c0ce1 inference: Ignore parse errors when estimating model memory Piotr Stankiewicz 2025-08-06 16:40:10 +0200
  • d61ffd5311
    updated the RequestResponsePair struct to differentiate between successful responses and error responses (#128) Ignasi 2025-08-06 16:29:58 +0200
  • 8fbd9df988
    updated the RequestResponsePair struct to differentiate between successful responses and error responses ilopezluna 2025-08-06 13:31:42 +0200
  • 9e639fd253
    Merge pull request #124 from aivantsov/patch-1 Jacob Howard 2025-07-30 11:30:13 +0300
  • 29a306b5af
    Fix the broken link to the Helm chart README. Andrei Ivantsov 2025-07-30 10:14:38 +0200
  • 6b1cfee5a3
    Merge pull request #123 from docker/nicks/chart Jacob Howard 2025-07-30 10:55:53 +0300
  • b42f3a0cb5
    charts: add Kubernetes examples Nick Santos 2025-07-29 12:47:16 -0400
  • ecfa5e7e68 gpuinfo: Make CGO optional on darwin Piotr Stankiewicz 2025-07-24 14:15:48 +0200
  • 1afdd96e3b gpuinfo: Make CGO optional on darwin Piotr Stankiewicz 2025-07-24 14:15:48 +0200
  • 7777c22890
    Merge pull request #113 from docker/model-load Dorin-Andrei Geman 2025-07-24 14:52:22 +0300
  • e2a0473732 Bump model-distribution to a11d745e58 Dorin Geman 2025-07-24 14:45:48 +0300
  • e748a3c4de chore: group and sort imports Dorin Geman 2025-07-24 14:44:08 +0300
  • 602f657781
    Revert "models/load: ensure request body is closed" Dorin-Andrei Geman 2025-07-24 14:39:11 +0300
  • 43b96fc9a8 gpuinfo: Make building without cgo possible on Linux Piotr Stankiewicz 2025-07-24 11:03:35 +0200
  • db19d8318f
    models/load: ensure request body is closed Dorin Geman 2025-07-24 12:27:25 +0300
  • 31ddc69496 gpuinfo: Make building without cgo possible on Linux Piotr Stankiewicz 2025-07-24 11:03:35 +0200
  • 4215c129be add model/load endpoint Emily Casey 2025-07-18 07:37:18 -0600
  • fc9b2a7171 inference: Fix typo in log Piotr 2025-07-22 14:07:08 +0200
  • 47517fdefa inference: Fallback behaviour if reading RAM/VRAM size fails Piotr Stankiewicz 2025-07-22 12:00:21 +0200
  • 2810fc21bd inference: Always return 1 as VRAM size on win/arm64 Piotr Stankiewicz 2025-07-22 11:36:20 +0200
  • 5f7d3a22a9 gpuinfo: Use go:build instead of obsolete +build Piotr Stankiewicz 2025-07-22 11:29:53 +0200
  • 263e4c7732 inference: Fix nv-gpu-info path and wrap errors Piotr Stankiewicz 2025-07-17 13:38:35 +0200
  • ecc3f8dde4 inference: Fix failing llama_config unit tests Piotr Stankiewicz 2025-07-15 16:39:43 +0200
  • 3548e5f3e6 inference: Keep track of RAM allocated by runners Piotr Stankiewicz 2025-07-15 14:45:13 +0200
  • cc9656e64c inference, gpuinfo: Limit allowed models to 1 on windows/arm64 for now Piotr Stankiewicz 2025-07-14 15:41:19 +0200
  • 00e3d60de5 gpuinfo: Release Metal device handle in VRAM size getter Piotr Stankiewicz 2025-07-14 15:15:41 +0200
  • ea3bb71830 Use nv-gpu-info on Windows to get VRAM size Piotr Stankiewicz 2025-07-14 15:12:05 +0200
  • 6e096b2caa Move VRAM size getters to a separate package Piotr Stankiewicz 2025-07-14 14:22:18 +0200
  • 96ecef4eed VRAM size getter for windows Piotr Stankiewicz 2025-07-11 14:49:26 +0200
  • c458b232a8 VRAM size getter for linux Piotr Stankiewicz 2025-07-11 14:44:23 +0200
  • a4dc5834d1 Implement basic memory estimation in scheduler Piotr Stankiewicz 2025-07-11 13:24:05 +0200
  • 606aead0e5
    Merge pull request #117 from docker/config-delete Jacob Howard 2025-07-23 14:05:29 +0300
  • 77f24abb8b
    Unload configs based on model ID and for both modes. Jacob Howard 2025-07-23 13:57:32 +0300
  • 6a695dc026
    Merge pull request #116 from doringeman/lock Dorin-Andrei Geman 2025-07-22 15:32:43 +0300
  • 5fa2bee652
    fix: switch to a RWMutex for synchronizing the router rebuild Dorin Geman 2025-07-22 15:29:30 +0300
  • 2e872f9dd2
    inference: Fix typo in log Piotr 2025-07-22 14:07:08 +0200
  • 0c1a6b7bec
    Merge pull request #115 from doringeman/misc Dorin-Andrei Geman 2025-07-22 13:42:38 +0300
  • b6d86e5606 inference: Fallback behaviour if reading RAM/VRAM size fails Piotr Stankiewicz 2025-07-22 12:00:21 +0200
  • cd5f08d043 inference: Always return 1 as VRAM size on win/arm64 Piotr Stankiewicz 2025-07-22 11:36:20 +0200
  • 1d066f2137 gpuinfo: Use go:build instead of obsolete +build Piotr Stankiewicz 2025-07-22 11:29:53 +0200
  • ac9da883d3 inference: Fix nv-gpu-info path and wrap errors Piotr Stankiewicz 2025-07-17 13:38:35 +0200
  • 7d39c7624c inference: Fix failing llama_config unit tests Piotr Stankiewicz 2025-07-15 16:39:43 +0200
  • ca187f9908 inference: Keep track of RAM allocated by runners Piotr Stankiewicz 2025-07-15 14:45:13 +0200
  • a3b83a8afe inference, gpuinfo: Limit allowed models to 1 on windows/arm64 for now Piotr Stankiewicz 2025-07-14 15:41:19 +0200
  • f99d4a2ee3 gpuinfo: Release Metal device handle in VRAM size getter Piotr Stankiewicz 2025-07-14 15:15:41 +0200
  • 517484be03 Use nv-gpu-info on Windows to get VRAM size Piotr Stankiewicz 2025-07-14 15:12:05 +0200
  • 3b42bc26d0 Move VRAM size getters to a separate package Piotr Stankiewicz 2025-07-14 14:22:18 +0200
  • 9e6e41f3ac VRAM size getter for windows Piotr Stankiewicz 2025-07-11 14:49:26 +0200
  • d559e1b755 VRAM size getter for linux Piotr Stankiewicz 2025-07-11 14:44:23 +0200
  • f90e4703f5 Implement basic memory estimation in scheduler Piotr Stankiewicz 2025-07-11 13:24:05 +0200