Jacob Howard
fe715cd9e6
inference: add routes for a default inference backend
...
Signed-off-by: Jacob Howard <jacob.howard@docker.com>
2025-03-28 17:53:08 -06:00
Jacob Howard
dba8db4f8f
[AIE-41] inference: disable automatic model pulls on inference calls
...
Signed-off-by: Jacob Howard <jacob.howard@docker.com>
2025-03-28 17:53:08 -06:00
Jacob Howard
4403a2a9f9
inference: hide and disable inference services on unsupported platforms
...
Signed-off-by: Jacob Howard <jacob.howard@docker.com>
2025-03-28 17:53:07 -06:00
Jacob Howard
910f9350f9
inference: disable pulls on Windows pending docker/model-distribution
...
Signed-off-by: Jacob Howard <jacob.howard@docker.com>
2025-03-28 17:53:07 -06:00
Piotr Stankiewicz
9abc853ec3
inference: Bump llama.cpp runtime to 0.0.0-experimental2
...
Signed-off-by: Piotr Stankiewicz <piotr.stankiewicz@docker.com>
2025-03-28 17:53:07 -06:00
Dorin Geman
3201fb5049
inference: Update telemetry
...
Signed-off-by: Dorin Geman <dorin.geman@docker.com>
2025-03-28 17:53:07 -06:00
Jacob Howard
7a93a6e3db
inference: disable llama.cpp installs on unsupported platforms
...
Signed-off-by: Jacob Howard <jacob.howard@docker.com>
2025-03-28 17:53:07 -06:00
Jacob Howard
7c351f6aa0
inference: add minor optimization to loader
...
Signed-off-by: Jacob Howard <jacob.howard@docker.com>
2025-03-28 17:53:07 -06:00
Dorin Geman
5c7f902bfe
inference/llamacpp: Remove socket before starting
...
Signed-off-by: Dorin Geman <dorin.geman@docker.com>
2025-03-28 17:53:06 -06:00
Dorin Geman
ae9d65a364
inference: Installer only runs once
...
Signed-off-by: Dorin Geman <dorin.geman@docker.com>
2025-03-28 17:53:06 -06:00
Dorin Geman
0f6f2f863c
inference: Tiny loader fix
...
Signed-off-by: Dorin Geman <dorin.geman@docker.com>
2025-03-28 17:53:06 -06:00
Dorin Geman
7f7aa129fa
inference/llamacpp: Remove debug log
...
Signed-off-by: Dorin Geman <dorin.geman@docker.com>
2025-03-28 17:53:06 -06:00
Jacob Howard
69971fb598
inference: two small fixes to the scheduler
...
Signed-off-by: Jacob Howard <jacob.howard@docker.com>
2025-03-28 17:53:06 -06:00
Jacob Howard
348f46991c
inference: wire up model deletion endpoint
...
This endpoint's implementation will wait until we have our official
local model store.
Signed-off-by: Jacob Howard <jacob.howard@docker.com>
2025-03-28 17:53:06 -06:00
Jacob Howard
d6b1191a01
inference: refactor scheduler to a more modular design
...
This new design will allow for concurrent runner operation (eventually)
on systems that support it.
Signed-off-by: Jacob Howard <jacob.howard@docker.com>
2025-03-28 17:53:06 -06:00
Dorin Geman
a14517d6bf
inference: Add stub for llama.cpp backend
...
Signed-off-by: Dorin Geman <dorin.geman@docker.com>
2025-03-28 17:53:05 -06:00
Dorin Geman
8dd1f8dbce
inference/scheduler: Cancel backend context to avoid leaks
...
Signed-off-by: Dorin Geman <dorin.geman@docker.com>
2025-03-28 17:53:05 -06:00
Dorin Geman
c8a97ae68d
inference: Require "model" field for completion or embedding
...
Signed-off-by: Dorin Geman <dorin.geman@docker.com>
2025-03-28 17:53:05 -06:00
Dorin Geman
842ce2ddbb
inference: Handle /v1/completions
...
Signed-off-by: Dorin Geman <dorin.geman@docker.com>
2025-03-28 17:53:05 -06:00
Dorin Geman
450b828845
inference: Register /models/{namespace}/{name}
...
Signed-off-by: Dorin Geman <dorin.geman@docker.com>
2025-03-28 17:53:05 -06:00
Jacob Howard
f8cdbc4d81
inference: refactor service and implement scheduling mechanism
...
Signed-off-by: Jacob Howard <jacob.howard@docker.com>
2025-03-28 17:53:05 -06:00
Jacob Howard
21e10c378a
inference: move to modular backend structure and implement stubs
...
Signed-off-by: Jacob Howard <jacob.howard@docker.com>
2025-03-28 17:53:00 -06:00