Commit Graph

322 Commits

Author SHA1 Message Date
Jacob Howard 2d039ecb65
Merge pull request #149 from docker/sandboxing
Implement llama.cpp sandboxing for macOS and Windows
2025-09-02 08:26:04 -06:00
Jacob Howard 935aab9b56
sandbox: rename New to Create
Signed-off-by: Jacob Howard <jacob.howard@docker.com>
2025-08-29 18:39:32 -06:00
Jacob Howard 08ba6d8f78
sandbox: fix up test to be Windows-portable
We have to be a little careful what we choose on each platform since
we're using a fairly strict sandboxing profile.

Signed-off-by: Jacob Howard <jacob.howard@docker.com>
2025-08-29 18:06:10 -06:00
Jacob Howard fec154ead1
deps: used patched go-winjob to support windows/arm64
Signed-off-by: Jacob Howard <jacob.howard@docker.com>
2025-08-29 17:59:10 -06:00
Jacob Howard 5211e724c1
sandbox: add basic testing
Signed-off-by: Jacob Howard <jacob.howard@docker.com>
2025-08-29 17:33:58 -06:00
Jacob Howard 1882c4e64e
sandbox: adjust macOS sandboxing for Docker Desktop development
Signed-off-by: Jacob Howard <jacob.howard@docker.com>
2025-08-29 17:33:58 -06:00
Jacob Howard 4d922ff787
sandbox: add test for Windows sandbox configuration parsing
Signed-off-by: Jacob Howard <jacob.howard@docker.com>
2025-08-29 17:33:58 -06:00
Jacob Howard 9238a83dd3
sandbox: implement Windows sandboxing and refactor API to accommodate
Signed-off-by: Jacob Howard <jacob.howard@docker.com>
2025-08-29 15:45:16 -06:00
Jacob Howard 9741e9a734
sandbox: enable sandboxing for llama.cpp processes on macOS
Signed-off-by: Jacob Howard <jacob.howard@docker.com>
2025-08-29 13:42:41 -06:00
Dorin-Andrei Geman 229e081bc2
Merge pull request #147 from docker/openairecorder
refactor(OpenAIRecorder): use Unix timestamp instead of time.Time
2025-08-28 17:06:22 +03:00
Dorin Geman 5449fc9dad refactor(OpenAIRecorder): use Unix timestamp instead of time.Time
Signed-off-by: Dorin Geman <dorin.geman@docker.com>
2025-08-28 16:22:15 +03:00
Jacob Howard 3d702d7aca
Merge pull request #146 from docker/avoid-shallow-copy
metrics: avoid an unnecessary request shallow copy
2025-08-26 11:21:10 -06:00
Jacob Howard 1e13a3cac5
metrics: avoid an unnecessary request shallow copy
Signed-off-by: Jacob Howard <jacob.howard@docker.com>
2025-08-26 10:47:26 -06:00
Dorin-Andrei Geman 9a2dcdfc16
Merge pull request #145 from doringeman/openairecorder
fix(OpenAIRecorder): set default status code for in progress or canceled HTTP requests
2025-08-26 18:35:30 +03:00
Dorin Geman a13d77c153
fix(OpenAIRecorder): set default status code for in progress or canceled HTTP requests
Signed-off-by: Dorin Geman <dorin.geman@docker.com>
2025-08-26 17:49:46 +03:00
Alberto García Hierro af4bb5194f
fix: move CORS middleware to top level to handle preflight requests (#144)
Previously, CORS preflight requests were not working because the CORS
middleware was nested within the inference package. Moving it to a
dedicated middleware package at the top level ensures preflight
requests are properly handled before reaching route handlers.

- Move CORS implementation from pkg/inference to pkg/middleware
- Add comprehensive test coverage for CORS middleware
- Add unit tests for inference models manager and scheduler

Signed-off-by: Alberto Garcia Hierro <damaso.hierro@docker.com>
2025-08-26 11:34:32 +01:00
Piotr Stankiewicz bc74763e92 metrics: Record reasoning_content from streaming responses
Signed-off-by: Piotr Stankiewicz <piotr.stankiewicz@docker.com>
2025-08-26 09:52:36 +02:00
Emily Casey 5341c9fc29
Merge pull request #133 from docker/shards
Support GGUF shards
2025-08-22 11:37:38 -06:00
Emily Casey b9164891e7 Merge remote-tracking branch 'origin/main' into shards 2025-08-22 11:29:48 -06:00
Emily Casey 877ea617d4 update distribution mod to point at main branch
Signed-off-by: Emily Casey <emily.casey@docker.com>
2025-08-22 11:29:11 -06:00
Emily Casey 0f01f66399
Merge pull request #137 from docker/fix-blob-url
Fix remote memory estimation
2025-08-22 11:22:11 -06:00
Emily Casey f09c4b4c98 Update distribution ref to main branch
Signed-off-by: Emily Casey <emily.casey@docker.com>
2025-08-22 11:20:02 -06:00
Emily Casey 8584839332
Update pkg/inference/backends/llamacpp/llamacpp_config.go
Co-authored-by: Jacob Howard <jacob.howard@docker.com>
2025-08-22 11:09:50 -06:00
Emily Casey bb7abccf47
Update pkg/inference/backends/llamacpp/llamacpp.go
Co-authored-by: Jacob Howard <jacob.howard@docker.com>
2025-08-22 10:38:25 -06:00
Emily Casey 9f7f778e82 Fix remote memory estimation:
* pull in blob URL fix - https://github.com/docker/model-distribution/pull/123
* don't attempt to estimate sharded models

Signed-off-by: Emily Casey <emily.casey@docker.com>
2025-08-22 10:35:47 -06:00
Emily Casey 8d5f251df7 Merge remote-tracking branch 'origin/main' into shards 2025-08-22 09:27:07 -06:00
Emily Casey 156686cc6f Run from bundle
Signed-off-by: Emily Casey <emily.casey@docker.com>
2025-08-22 09:20:00 -06:00
Piotr Stankiewicz d8ed374455 inference: Use common system memory size getter in the loader
Signed-off-by: Piotr Stankiewicz <piotr.stankiewicz@docker.com>
2025-08-22 17:11:18 +02:00
Piotr Stankiewicz 03f7adc077 inference: Fix ignoring parse errors for unknown models
We ignore parse errors for models that gguf-parser-go can't parse yet,
for now. This regressed in the pre-pull memeory estimation PR.

Signed-off-by: Piotr Stankiewicz <piotr.stankiewicz@docker.com>
2025-08-22 17:06:23 +02:00
Piotr Stankiewicz 6d72f943f6 Make sure I don't commit vendor/ again
Signed-off-by: Piotr Stankiewicz <piotr.stankiewicz@docker.com>
2025-08-22 17:05:56 +02:00
Piotr Stankiewicz 77e0de486f Remove vendor/
Signed-off-by: Piotr Stankiewicz <piotr.stankiewicz@docker.com>
2025-08-22 17:05:56 +02:00
Piotr Stankiewicz 933edd2249 inference: Fix up review comments
Signed-off-by: Piotr Stankiewicz <piotr.stankiewicz@docker.com>
2025-08-22 10:15:03 +02:00
Piotr Stankiewicz 64c85dcd83 inference: Support disabling pre-pull memory checks
Signed-off-by: Piotr Stankiewicz <piotr.stankiewicz@docker.com>
2025-08-22 10:15:03 +02:00
Piotr Stankiewicz 15e31feb30 inference: Block pull if model requires too much memory to run
Signed-off-by: Piotr Stankiewicz <piotr.stankiewicz@docker.com>
2025-08-22 10:15:03 +02:00
Piotr Stankiewicz 880818f741 inference: Support memory estimation for remote models
Signed-off-by: Piotr Stankiewicz <piotr.stankiewicz@docker.com>
2025-08-22 10:15:03 +02:00
Piotr Stankiewicz 59da65a365 Bump docker/model-distribution
Signed-off-by: Piotr Stankiewicz <piotr.stankiewicz@docker.com>
2025-08-22 10:15:03 +02:00
Piotr Stankiewicz 1c13e4fc61 inference: Ignore parse errors when estimating model memory
We will run into cases where our model runner is ahead of
gguf-parser-go. In such cases we may want to load a model that will
cause the model parse to fail. So, for now, in such cases ignore model
parsing errors, and assume it takes no resources. In the future we
should come up with a cleaner way of dealing with this (e.g. ship a
model memory estimator along with the llama-server).

Signed-off-by: Piotr Stankiewicz <piotr.stankiewicz@docker.com>
2025-08-06 16:52:42 +02:00
Ignasi d61ffd5311
updated the RequestResponsePair struct to differentiate between successful responses and error responses (#128) 2025-08-06 16:29:58 +02:00
Jacob Howard 9e639fd253
Merge pull request #124 from aivantsov/patch-1
Fix the broken link to the Helm chart README.
2025-07-30 11:30:13 +03:00
Andrei Ivantsov 29a306b5af
Fix the broken link to the Helm chart README. 2025-07-30 10:14:38 +02:00
Jacob Howard 6b1cfee5a3
Merge pull request #123 from docker/nicks/chart
charts: add Kubernetes examples
2025-07-30 10:55:53 +03:00
Nick Santos b42f3a0cb5
charts: add Kubernetes examples
- a helm chart
- static Kubernetes configs for a few common setups

I put these under ./charts so we can expose
this as a Helm chart repo later if we want,
but for now we'll just tell people to install it
from source.

Signed-off-by: Nick Santos <nick.santos@docker.com>
2025-07-29 12:53:05 -04:00
Piotr Stankiewicz ecfa5e7e68 gpuinfo: Make CGO optional on darwin
Signed-off-by: Piotr Stankiewicz <piotr.stankiewicz@docker.com>
2025-07-24 14:24:32 +02:00
Dorin-Andrei Geman 7777c22890
Merge pull request #113 from docker/model-load
Adds model load endpoint to models API
2025-07-24 14:52:22 +03:00
Dorin Geman e2a0473732 Bump model-distribution to a11d745e58
Signed-off-by: Dorin Geman <dorin.geman@docker.com>
2025-07-24 14:48:53 +03:00
Dorin Geman e748a3c4de chore: group and sort imports
Signed-off-by: Dorin Geman <dorin.geman@docker.com>
2025-07-24 14:48:53 +03:00
Dorin-Andrei Geman 602f657781
Revert "models/load: ensure request body is closed"
Co-authored-by: Jacob Howard <jacob.howard@docker.com>
2025-07-24 14:39:11 +03:00
Piotr Stankiewicz 43b96fc9a8 gpuinfo: Make building without cgo possible on Linux
Signed-off-by: Piotr Stankiewicz <piotr.stankiewicz@docker.com>
2025-07-24 11:58:55 +02:00
Dorin Geman db19d8318f
models/load: ensure request body is closed
Signed-off-by: Dorin Geman <dorin.geman@docker.com>
2025-07-24 12:27:25 +03:00
Emily Casey 4215c129be add model/load endpoint
Signed-off-by: Emily Casey <emily.casey@docker.com>
2025-07-23 22:01:20 -06:00