model-runner

Commit Graph

Author	SHA1	Message	Date
Piotr Stankiewicz	880818f741	inference: Support memory estimation for remote models Signed-off-by: Piotr Stankiewicz <piotr.stankiewicz@docker.com>	2025-08-22 10:15:03 +02:00
Piotr Stankiewicz	a4dc5834d1	Implement basic memory estimation in scheduler First pass implementation of memory estimation logic in model scheduler. This change heavily relies on gguf-parser-go to calculate estimated peak memory requirement for running inference with a given model. It adds GetRequiredMemoryForModel() to the Backend interface to allow each backend to deal with its config and calculate required memory usage based on it. Signed-off-by: Piotr Stankiewicz <piotr.stankiewicz@docker.com>	2025-07-23 13:50:20 +02:00
Piotr Stankiewicz	099b12231d	Support runner configuration (temporary solution) We need to allow users to configure the model runtime. Whether to control inference settings, or low-level llama.cpp specific settings. In the interest of unblocking users quickly, this patch adds a very simple mechanism to configure the runtime settings. A `_configure` endpoint is added per-engine, and acceps POST requests to set context-size and raw runtime CLI flags. Those settings will be applied to any run of a given model, until unload is called for that model or model-runner is terminated. This is a temporary solution and therefore subject to change once a design for specifying runtime settings is finalised. Signed-off-by: Piotr Stankiewicz <piotr.stankiewicz@docker.com>	2025-06-13 10:36:29 +02:00
Piotr Stankiewicz	0343b8cbea	Use int64 for size instead of float64 Representing byte sizes as float64's can be confusing and potentially inefficient. So, use an integer type for representing byte sizes. Signed-off-by: Piotr Stankiewicz <piotr.stankiewicz@docker.com>	2025-05-29 09:09:14 +02:00
Dorin Geman	b881521c88	Add /engines/df Signed-off-by: Dorin Geman <dorin.geman@docker.com>	2025-05-21 09:59:37 +03:00
Dorin Geman	e5d5ccf2dd	Add Status to Backend interface Signed-off-by: Dorin Geman <dorin.geman@docker.com>	2025-04-17 19:11:20 +02:00
Jacob Howard	36ae1e3b30	inference: adjust for lack of logger and paths packages Signed-off-by: Jacob Howard <jacob.howard@docker.com>	2025-03-28 18:01:42 -06:00
Jacob Howard	f7cec84173	deps: remove errordef package references Signed-off-by: Jacob Howard <jacob.howard@docker.com>	2025-03-28 17:53:13 -06:00
Jacob Howard	ac5324bd3a	[AIE-52] inference: add separate completion/embedding backend modes Signed-off-by: Jacob Howard <jacob.howard@docker.com>	2025-03-28 17:53:08 -06:00
Jacob Howard	d6b1191a01	inference: refactor scheduler to a more modular design This new design will allow for concurrent runner operation (eventually) on systems that support it. Signed-off-by: Jacob Howard <jacob.howard@docker.com>	2025-03-28 17:53:06 -06:00
Jacob Howard	f8cdbc4d81	inference: refactor service and implement scheduling mechanism Signed-off-by: Jacob Howard <jacob.howard@docker.com>	2025-03-28 17:53:05 -06:00
Jacob Howard	21e10c378a	inference: move to modular backend structure and implement stubs Signed-off-by: Jacob Howard <jacob.howard@docker.com>	2025-03-28 17:53:00 -06:00

12 Commits