model-runner

Commit Graph

Author	SHA1	Message	Date
Jacob Howard	2d039ecb65	Merge pull request #149 from docker/sandboxing Implement llama.cpp sandboxing for macOS and Windows	2025-09-02 08:26:04 -06:00
Jacob Howard	935aab9b56	sandbox: rename New to Create Signed-off-by: Jacob Howard <jacob.howard@docker.com>	2025-08-29 18:39:32 -06:00
Jacob Howard	08ba6d8f78	sandbox: fix up test to be Windows-portable We have to be a little careful what we choose on each platform since we're using a fairly strict sandboxing profile. Signed-off-by: Jacob Howard <jacob.howard@docker.com>	2025-08-29 18:06:10 -06:00
Jacob Howard	fec154ead1	deps: used patched go-winjob to support windows/arm64 Signed-off-by: Jacob Howard <jacob.howard@docker.com>	2025-08-29 17:59:10 -06:00
Jacob Howard	5211e724c1	sandbox: add basic testing Signed-off-by: Jacob Howard <jacob.howard@docker.com>	2025-08-29 17:33:58 -06:00
Jacob Howard	1882c4e64e	sandbox: adjust macOS sandboxing for Docker Desktop development Signed-off-by: Jacob Howard <jacob.howard@docker.com>	2025-08-29 17:33:58 -06:00
Jacob Howard	4d922ff787	sandbox: add test for Windows sandbox configuration parsing Signed-off-by: Jacob Howard <jacob.howard@docker.com>	2025-08-29 17:33:58 -06:00
Jacob Howard	9238a83dd3	sandbox: implement Windows sandboxing and refactor API to accommodate Signed-off-by: Jacob Howard <jacob.howard@docker.com>	2025-08-29 15:45:16 -06:00
Jacob Howard	9741e9a734	sandbox: enable sandboxing for llama.cpp processes on macOS Signed-off-by: Jacob Howard <jacob.howard@docker.com>	2025-08-29 13:42:41 -06:00
Dorin-Andrei Geman	229e081bc2	Merge pull request #147 from docker/openairecorder refactor(OpenAIRecorder): use Unix timestamp instead of time.Time	2025-08-28 17:06:22 +03:00
Dorin Geman	5449fc9dad	refactor(OpenAIRecorder): use Unix timestamp instead of time.Time Signed-off-by: Dorin Geman <dorin.geman@docker.com>	2025-08-28 16:22:15 +03:00
Jacob Howard	3d702d7aca	Merge pull request #146 from docker/avoid-shallow-copy metrics: avoid an unnecessary request shallow copy	2025-08-26 11:21:10 -06:00
Jacob Howard	1e13a3cac5	metrics: avoid an unnecessary request shallow copy Signed-off-by: Jacob Howard <jacob.howard@docker.com>	2025-08-26 10:47:26 -06:00
Dorin-Andrei Geman	9a2dcdfc16	Merge pull request #145 from doringeman/openairecorder fix(OpenAIRecorder): set default status code for in progress or canceled HTTP requests	2025-08-26 18:35:30 +03:00
Dorin Geman	a13d77c153	fix(OpenAIRecorder): set default status code for in progress or canceled HTTP requests Signed-off-by: Dorin Geman <dorin.geman@docker.com>	2025-08-26 17:49:46 +03:00
Alberto García Hierro	af4bb5194f	fix: move CORS middleware to top level to handle preflight requests (#144 ) Previously, CORS preflight requests were not working because the CORS middleware was nested within the inference package. Moving it to a dedicated middleware package at the top level ensures preflight requests are properly handled before reaching route handlers. - Move CORS implementation from pkg/inference to pkg/middleware - Add comprehensive test coverage for CORS middleware - Add unit tests for inference models manager and scheduler Signed-off-by: Alberto Garcia Hierro <damaso.hierro@docker.com>	2025-08-26 11:34:32 +01:00
Piotr Stankiewicz	bc74763e92	metrics: Record reasoning_content from streaming responses Signed-off-by: Piotr Stankiewicz <piotr.stankiewicz@docker.com>	2025-08-26 09:52:36 +02:00
Emily Casey	5341c9fc29	Merge pull request #133 from docker/shards Support GGUF shards	2025-08-22 11:37:38 -06:00
Emily Casey	b9164891e7	Merge remote-tracking branch 'origin/main' into shards	2025-08-22 11:29:48 -06:00
Emily Casey	877ea617d4	update distribution mod to point at main branch Signed-off-by: Emily Casey <emily.casey@docker.com>	2025-08-22 11:29:11 -06:00
Emily Casey	0f01f66399	Merge pull request #137 from docker/fix-blob-url Fix remote memory estimation	2025-08-22 11:22:11 -06:00
Emily Casey	f09c4b4c98	Update distribution ref to main branch Signed-off-by: Emily Casey <emily.casey@docker.com>	2025-08-22 11:20:02 -06:00
Emily Casey	8584839332	Update pkg/inference/backends/llamacpp/llamacpp_config.go Co-authored-by: Jacob Howard <jacob.howard@docker.com>	2025-08-22 11:09:50 -06:00
Emily Casey	bb7abccf47	Update pkg/inference/backends/llamacpp/llamacpp.go Co-authored-by: Jacob Howard <jacob.howard@docker.com>	2025-08-22 10:38:25 -06:00
Emily Casey	9f7f778e82	Fix remote memory estimation: * pull in blob URL fix - https://github.com/docker/model-distribution/pull/123 * don't attempt to estimate sharded models Signed-off-by: Emily Casey <emily.casey@docker.com>	2025-08-22 10:35:47 -06:00
Emily Casey	8d5f251df7	Merge remote-tracking branch 'origin/main' into shards	2025-08-22 09:27:07 -06:00
Emily Casey	156686cc6f	Run from bundle Signed-off-by: Emily Casey <emily.casey@docker.com>	2025-08-22 09:20:00 -06:00
Piotr Stankiewicz	d8ed374455	inference: Use common system memory size getter in the loader Signed-off-by: Piotr Stankiewicz <piotr.stankiewicz@docker.com>	2025-08-22 17:11:18 +02:00
Piotr Stankiewicz	03f7adc077	inference: Fix ignoring parse errors for unknown models We ignore parse errors for models that gguf-parser-go can't parse yet, for now. This regressed in the pre-pull memeory estimation PR. Signed-off-by: Piotr Stankiewicz <piotr.stankiewicz@docker.com>	2025-08-22 17:06:23 +02:00
Piotr Stankiewicz	6d72f943f6	Make sure I don't commit vendor/ again Signed-off-by: Piotr Stankiewicz <piotr.stankiewicz@docker.com>	2025-08-22 17:05:56 +02:00
Piotr Stankiewicz	77e0de486f	Remove vendor/ Signed-off-by: Piotr Stankiewicz <piotr.stankiewicz@docker.com>	2025-08-22 17:05:56 +02:00
Piotr Stankiewicz	933edd2249	inference: Fix up review comments Signed-off-by: Piotr Stankiewicz <piotr.stankiewicz@docker.com>	2025-08-22 10:15:03 +02:00
Piotr Stankiewicz	64c85dcd83	inference: Support disabling pre-pull memory checks Signed-off-by: Piotr Stankiewicz <piotr.stankiewicz@docker.com>	2025-08-22 10:15:03 +02:00
Piotr Stankiewicz	15e31feb30	inference: Block pull if model requires too much memory to run Signed-off-by: Piotr Stankiewicz <piotr.stankiewicz@docker.com>	2025-08-22 10:15:03 +02:00
Piotr Stankiewicz	880818f741	inference: Support memory estimation for remote models Signed-off-by: Piotr Stankiewicz <piotr.stankiewicz@docker.com>	2025-08-22 10:15:03 +02:00
Piotr Stankiewicz	59da65a365	Bump docker/model-distribution Signed-off-by: Piotr Stankiewicz <piotr.stankiewicz@docker.com>	2025-08-22 10:15:03 +02:00
Piotr Stankiewicz	1c13e4fc61	inference: Ignore parse errors when estimating model memory We will run into cases where our model runner is ahead of gguf-parser-go. In such cases we may want to load a model that will cause the model parse to fail. So, for now, in such cases ignore model parsing errors, and assume it takes no resources. In the future we should come up with a cleaner way of dealing with this (e.g. ship a model memory estimator along with the llama-server). Signed-off-by: Piotr Stankiewicz <piotr.stankiewicz@docker.com>	2025-08-06 16:52:42 +02:00
Ignasi	d61ffd5311	updated the RequestResponsePair struct to differentiate between successful responses and error responses (#128 )	2025-08-06 16:29:58 +02:00
Jacob Howard	9e639fd253	Merge pull request #124 from aivantsov/patch-1 Fix the broken link to the Helm chart README.	2025-07-30 11:30:13 +03:00
Andrei Ivantsov	29a306b5af	Fix the broken link to the Helm chart README.	2025-07-30 10:14:38 +02:00
Jacob Howard	6b1cfee5a3	Merge pull request #123 from docker/nicks/chart charts: add Kubernetes examples	2025-07-30 10:55:53 +03:00
Nick Santos	b42f3a0cb5	charts: add Kubernetes examples - a helm chart - static Kubernetes configs for a few common setups I put these under ./charts so we can expose this as a Helm chart repo later if we want, but for now we'll just tell people to install it from source. Signed-off-by: Nick Santos <nick.santos@docker.com>	2025-07-29 12:53:05 -04:00
Piotr Stankiewicz	ecfa5e7e68	gpuinfo: Make CGO optional on darwin Signed-off-by: Piotr Stankiewicz <piotr.stankiewicz@docker.com>	2025-07-24 14:24:32 +02:00
Dorin-Andrei Geman	7777c22890	Merge pull request #113 from docker/model-load Adds model load endpoint to models API	2025-07-24 14:52:22 +03:00
Dorin Geman	e2a0473732	Bump model-distribution to `a11d745e58` Signed-off-by: Dorin Geman <dorin.geman@docker.com>	2025-07-24 14:48:53 +03:00
Dorin Geman	e748a3c4de	chore: group and sort imports Signed-off-by: Dorin Geman <dorin.geman@docker.com>	2025-07-24 14:48:53 +03:00
Dorin-Andrei Geman	602f657781	Revert "models/load: ensure request body is closed" Co-authored-by: Jacob Howard <jacob.howard@docker.com>	2025-07-24 14:39:11 +03:00
Piotr Stankiewicz	43b96fc9a8	gpuinfo: Make building without cgo possible on Linux Signed-off-by: Piotr Stankiewicz <piotr.stankiewicz@docker.com>	2025-07-24 11:58:55 +02:00
Dorin Geman	db19d8318f	models/load: ensure request body is closed Signed-off-by: Dorin Geman <dorin.geman@docker.com>	2025-07-24 12:27:25 +03:00
Emily Casey	4215c129be	add model/load endpoint Signed-off-by: Emily Casey <emily.casey@docker.com>	2025-07-23 22:01:20 -06:00

1 2 3 4 5 ...

322 Commits All Branches Search

322 Commits

All Branches