Commit Graph

322 Commits

Author SHA1 Message Date
Jacob Howard fe715cd9e6
inference: add routes for a default inference backend
Signed-off-by: Jacob Howard <jacob.howard@docker.com>
2025-03-28 17:53:08 -06:00
Jacob Howard dba8db4f8f
[AIE-41] inference: disable automatic model pulls on inference calls
Signed-off-by: Jacob Howard <jacob.howard@docker.com>
2025-03-28 17:53:08 -06:00
Jacob Howard 4403a2a9f9
inference: hide and disable inference services on unsupported platforms
Signed-off-by: Jacob Howard <jacob.howard@docker.com>
2025-03-28 17:53:07 -06:00
Jacob Howard 910f9350f9
inference: disable pulls on Windows pending docker/model-distribution
Signed-off-by: Jacob Howard <jacob.howard@docker.com>
2025-03-28 17:53:07 -06:00
Piotr Stankiewicz 9abc853ec3
inference: Bump llama.cpp runtime to 0.0.0-experimental2
Signed-off-by: Piotr Stankiewicz <piotr.stankiewicz@docker.com>
2025-03-28 17:53:07 -06:00
Dorin Geman 3201fb5049
inference: Update telemetry
Signed-off-by: Dorin Geman <dorin.geman@docker.com>
2025-03-28 17:53:07 -06:00
Jacob Howard 7a93a6e3db
inference: disable llama.cpp installs on unsupported platforms
Signed-off-by: Jacob Howard <jacob.howard@docker.com>
2025-03-28 17:53:07 -06:00
Jacob Howard 7c351f6aa0
inference: add minor optimization to loader
Signed-off-by: Jacob Howard <jacob.howard@docker.com>
2025-03-28 17:53:07 -06:00
Dorin Geman 5c7f902bfe
inference/llamacpp: Remove socket before starting
Signed-off-by: Dorin Geman <dorin.geman@docker.com>
2025-03-28 17:53:06 -06:00
Dorin Geman ae9d65a364
inference: Installer only runs once
Signed-off-by: Dorin Geman <dorin.geman@docker.com>
2025-03-28 17:53:06 -06:00
Dorin Geman 0f6f2f863c
inference: Tiny loader fix
Signed-off-by: Dorin Geman <dorin.geman@docker.com>
2025-03-28 17:53:06 -06:00
Dorin Geman 7f7aa129fa
inference/llamacpp: Remove debug log
Signed-off-by: Dorin Geman <dorin.geman@docker.com>
2025-03-28 17:53:06 -06:00
Jacob Howard 69971fb598
inference: two small fixes to the scheduler
Signed-off-by: Jacob Howard <jacob.howard@docker.com>
2025-03-28 17:53:06 -06:00
Jacob Howard 348f46991c
inference: wire up model deletion endpoint
This endpoint's implementation will wait until we have our official
local model store.

Signed-off-by: Jacob Howard <jacob.howard@docker.com>
2025-03-28 17:53:06 -06:00
Jacob Howard d6b1191a01
inference: refactor scheduler to a more modular design
This new design will allow for concurrent runner operation (eventually)
on systems that support it.

Signed-off-by: Jacob Howard <jacob.howard@docker.com>
2025-03-28 17:53:06 -06:00
Dorin Geman a14517d6bf
inference: Add stub for llama.cpp backend
Signed-off-by: Dorin Geman <dorin.geman@docker.com>
2025-03-28 17:53:05 -06:00
Dorin Geman 8dd1f8dbce
inference/scheduler: Cancel backend context to avoid leaks
Signed-off-by: Dorin Geman <dorin.geman@docker.com>
2025-03-28 17:53:05 -06:00
Dorin Geman c8a97ae68d
inference: Require "model" field for completion or embedding
Signed-off-by: Dorin Geman <dorin.geman@docker.com>
2025-03-28 17:53:05 -06:00
Dorin Geman 842ce2ddbb
inference: Handle /v1/completions
Signed-off-by: Dorin Geman <dorin.geman@docker.com>
2025-03-28 17:53:05 -06:00
Dorin Geman 450b828845
inference: Register /models/{namespace}/{name}
Signed-off-by: Dorin Geman <dorin.geman@docker.com>
2025-03-28 17:53:05 -06:00
Jacob Howard f8cdbc4d81
inference: refactor service and implement scheduling mechanism
Signed-off-by: Jacob Howard <jacob.howard@docker.com>
2025-03-28 17:53:05 -06:00
Jacob Howard 21e10c378a
inference: move to modular backend structure and implement stubs
Signed-off-by: Jacob Howard <jacob.howard@docker.com>
2025-03-28 17:53:00 -06:00