Jacob Howard
2d039ecb65
Merge pull request #149 from docker/sandboxing
...
Implement llama.cpp sandboxing for macOS and Windows
2025-09-02 08:26:04 -06:00
Jacob Howard
935aab9b56
sandbox: rename New to Create
...
Signed-off-by: Jacob Howard <jacob.howard@docker.com>
2025-08-29 18:39:32 -06:00
Jacob Howard
08ba6d8f78
sandbox: fix up test to be Windows-portable
...
We have to be a little careful what we choose on each platform since
we're using a fairly strict sandboxing profile.
Signed-off-by: Jacob Howard <jacob.howard@docker.com>
2025-08-29 18:06:10 -06:00
Jacob Howard
fec154ead1
deps: used patched go-winjob to support windows/arm64
...
Signed-off-by: Jacob Howard <jacob.howard@docker.com>
2025-08-29 17:59:10 -06:00
Jacob Howard
5211e724c1
sandbox: add basic testing
...
Signed-off-by: Jacob Howard <jacob.howard@docker.com>
2025-08-29 17:33:58 -06:00
Jacob Howard
1882c4e64e
sandbox: adjust macOS sandboxing for Docker Desktop development
...
Signed-off-by: Jacob Howard <jacob.howard@docker.com>
2025-08-29 17:33:58 -06:00
Jacob Howard
4d922ff787
sandbox: add test for Windows sandbox configuration parsing
...
Signed-off-by: Jacob Howard <jacob.howard@docker.com>
2025-08-29 17:33:58 -06:00
Jacob Howard
9238a83dd3
sandbox: implement Windows sandboxing and refactor API to accommodate
...
Signed-off-by: Jacob Howard <jacob.howard@docker.com>
2025-08-29 15:45:16 -06:00
Jacob Howard
9741e9a734
sandbox: enable sandboxing for llama.cpp processes on macOS
...
Signed-off-by: Jacob Howard <jacob.howard@docker.com>
2025-08-29 13:42:41 -06:00
Dorin-Andrei Geman
229e081bc2
Merge pull request #147 from docker/openairecorder
...
refactor(OpenAIRecorder): use Unix timestamp instead of time.Time
2025-08-28 17:06:22 +03:00
Dorin Geman
5449fc9dad
refactor(OpenAIRecorder): use Unix timestamp instead of time.Time
...
Signed-off-by: Dorin Geman <dorin.geman@docker.com>
2025-08-28 16:22:15 +03:00
Jacob Howard
3d702d7aca
Merge pull request #146 from docker/avoid-shallow-copy
...
metrics: avoid an unnecessary request shallow copy
2025-08-26 11:21:10 -06:00
Jacob Howard
1e13a3cac5
metrics: avoid an unnecessary request shallow copy
...
Signed-off-by: Jacob Howard <jacob.howard@docker.com>
2025-08-26 10:47:26 -06:00
Dorin-Andrei Geman
9a2dcdfc16
Merge pull request #145 from doringeman/openairecorder
...
fix(OpenAIRecorder): set default status code for in progress or canceled HTTP requests
2025-08-26 18:35:30 +03:00
Dorin Geman
a13d77c153
fix(OpenAIRecorder): set default status code for in progress or canceled HTTP requests
...
Signed-off-by: Dorin Geman <dorin.geman@docker.com>
2025-08-26 17:49:46 +03:00
Alberto García Hierro
af4bb5194f
fix: move CORS middleware to top level to handle preflight requests ( #144 )
...
Previously, CORS preflight requests were not working because the CORS
middleware was nested within the inference package. Moving it to a
dedicated middleware package at the top level ensures preflight
requests are properly handled before reaching route handlers.
- Move CORS implementation from pkg/inference to pkg/middleware
- Add comprehensive test coverage for CORS middleware
- Add unit tests for inference models manager and scheduler
Signed-off-by: Alberto Garcia Hierro <damaso.hierro@docker.com>
2025-08-26 11:34:32 +01:00
Piotr Stankiewicz
bc74763e92
metrics: Record reasoning_content from streaming responses
...
Signed-off-by: Piotr Stankiewicz <piotr.stankiewicz@docker.com>
2025-08-26 09:52:36 +02:00
Emily Casey
5341c9fc29
Merge pull request #133 from docker/shards
...
Support GGUF shards
2025-08-22 11:37:38 -06:00
Emily Casey
b9164891e7
Merge remote-tracking branch 'origin/main' into shards
2025-08-22 11:29:48 -06:00
Emily Casey
877ea617d4
update distribution mod to point at main branch
...
Signed-off-by: Emily Casey <emily.casey@docker.com>
2025-08-22 11:29:11 -06:00
Emily Casey
0f01f66399
Merge pull request #137 from docker/fix-blob-url
...
Fix remote memory estimation
2025-08-22 11:22:11 -06:00
Emily Casey
f09c4b4c98
Update distribution ref to main branch
...
Signed-off-by: Emily Casey <emily.casey@docker.com>
2025-08-22 11:20:02 -06:00
Emily Casey
8584839332
Update pkg/inference/backends/llamacpp/llamacpp_config.go
...
Co-authored-by: Jacob Howard <jacob.howard@docker.com>
2025-08-22 11:09:50 -06:00
Emily Casey
bb7abccf47
Update pkg/inference/backends/llamacpp/llamacpp.go
...
Co-authored-by: Jacob Howard <jacob.howard@docker.com>
2025-08-22 10:38:25 -06:00
Emily Casey
9f7f778e82
Fix remote memory estimation:
...
* pull in blob URL fix - https://github.com/docker/model-distribution/pull/123
* don't attempt to estimate sharded models
Signed-off-by: Emily Casey <emily.casey@docker.com>
2025-08-22 10:35:47 -06:00
Emily Casey
8d5f251df7
Merge remote-tracking branch 'origin/main' into shards
2025-08-22 09:27:07 -06:00
Emily Casey
156686cc6f
Run from bundle
...
Signed-off-by: Emily Casey <emily.casey@docker.com>
2025-08-22 09:20:00 -06:00
Piotr Stankiewicz
d8ed374455
inference: Use common system memory size getter in the loader
...
Signed-off-by: Piotr Stankiewicz <piotr.stankiewicz@docker.com>
2025-08-22 17:11:18 +02:00
Piotr Stankiewicz
03f7adc077
inference: Fix ignoring parse errors for unknown models
...
We ignore parse errors for models that gguf-parser-go can't parse yet,
for now. This regressed in the pre-pull memeory estimation PR.
Signed-off-by: Piotr Stankiewicz <piotr.stankiewicz@docker.com>
2025-08-22 17:06:23 +02:00
Piotr Stankiewicz
6d72f943f6
Make sure I don't commit vendor/ again
...
Signed-off-by: Piotr Stankiewicz <piotr.stankiewicz@docker.com>
2025-08-22 17:05:56 +02:00
Piotr Stankiewicz
77e0de486f
Remove vendor/
...
Signed-off-by: Piotr Stankiewicz <piotr.stankiewicz@docker.com>
2025-08-22 17:05:56 +02:00
Piotr Stankiewicz
933edd2249
inference: Fix up review comments
...
Signed-off-by: Piotr Stankiewicz <piotr.stankiewicz@docker.com>
2025-08-22 10:15:03 +02:00
Piotr Stankiewicz
64c85dcd83
inference: Support disabling pre-pull memory checks
...
Signed-off-by: Piotr Stankiewicz <piotr.stankiewicz@docker.com>
2025-08-22 10:15:03 +02:00
Piotr Stankiewicz
15e31feb30
inference: Block pull if model requires too much memory to run
...
Signed-off-by: Piotr Stankiewicz <piotr.stankiewicz@docker.com>
2025-08-22 10:15:03 +02:00
Piotr Stankiewicz
880818f741
inference: Support memory estimation for remote models
...
Signed-off-by: Piotr Stankiewicz <piotr.stankiewicz@docker.com>
2025-08-22 10:15:03 +02:00
Piotr Stankiewicz
59da65a365
Bump docker/model-distribution
...
Signed-off-by: Piotr Stankiewicz <piotr.stankiewicz@docker.com>
2025-08-22 10:15:03 +02:00
Piotr Stankiewicz
1c13e4fc61
inference: Ignore parse errors when estimating model memory
...
We will run into cases where our model runner is ahead of
gguf-parser-go. In such cases we may want to load a model that will
cause the model parse to fail. So, for now, in such cases ignore model
parsing errors, and assume it takes no resources. In the future we
should come up with a cleaner way of dealing with this (e.g. ship a
model memory estimator along with the llama-server).
Signed-off-by: Piotr Stankiewicz <piotr.stankiewicz@docker.com>
2025-08-06 16:52:42 +02:00
Ignasi
d61ffd5311
updated the RequestResponsePair struct to differentiate between successful responses and error responses ( #128 )
2025-08-06 16:29:58 +02:00
Jacob Howard
9e639fd253
Merge pull request #124 from aivantsov/patch-1
...
Fix the broken link to the Helm chart README.
2025-07-30 11:30:13 +03:00
Andrei Ivantsov
29a306b5af
Fix the broken link to the Helm chart README.
2025-07-30 10:14:38 +02:00
Jacob Howard
6b1cfee5a3
Merge pull request #123 from docker/nicks/chart
...
charts: add Kubernetes examples
2025-07-30 10:55:53 +03:00
Nick Santos
b42f3a0cb5
charts: add Kubernetes examples
...
- a helm chart
- static Kubernetes configs for a few common setups
I put these under ./charts so we can expose
this as a Helm chart repo later if we want,
but for now we'll just tell people to install it
from source.
Signed-off-by: Nick Santos <nick.santos@docker.com>
2025-07-29 12:53:05 -04:00
Piotr Stankiewicz
ecfa5e7e68
gpuinfo: Make CGO optional on darwin
...
Signed-off-by: Piotr Stankiewicz <piotr.stankiewicz@docker.com>
2025-07-24 14:24:32 +02:00
Dorin-Andrei Geman
7777c22890
Merge pull request #113 from docker/model-load
...
Adds model load endpoint to models API
2025-07-24 14:52:22 +03:00
Dorin Geman
e2a0473732
Bump model-distribution to a11d745e58
...
Signed-off-by: Dorin Geman <dorin.geman@docker.com>
2025-07-24 14:48:53 +03:00
Dorin Geman
e748a3c4de
chore: group and sort imports
...
Signed-off-by: Dorin Geman <dorin.geman@docker.com>
2025-07-24 14:48:53 +03:00
Dorin-Andrei Geman
602f657781
Revert "models/load: ensure request body is closed"
...
Co-authored-by: Jacob Howard <jacob.howard@docker.com>
2025-07-24 14:39:11 +03:00
Piotr Stankiewicz
43b96fc9a8
gpuinfo: Make building without cgo possible on Linux
...
Signed-off-by: Piotr Stankiewicz <piotr.stankiewicz@docker.com>
2025-07-24 11:58:55 +02:00
Dorin Geman
db19d8318f
models/load: ensure request body is closed
...
Signed-off-by: Dorin Geman <dorin.geman@docker.com>
2025-07-24 12:27:25 +03:00
Emily Casey
4215c129be
add model/load endpoint
...
Signed-off-by: Emily Casey <emily.casey@docker.com>
2025-07-23 22:01:20 -06:00