model-runner

Commit Graph

Select branches

Hide Pull Requests

AIE-86/notify-errors-via-sse

AIE-87/model-not-found-on-rm

add-distribution-initializer

adds-makefile

bump-model-distribution

bump-model-distribution-progress

dockerize

dorin.geman/llamacpp-paths

force-delete

gitignore

handle-missing-dir

improvements

main

main.go

openai-passthrough

ps-beter-server-error

ps-better-estimation-error

ps-enable-fa

ps-fix-auto-update

ps-no-warmup

ps-win-arm64-support

revert-11-tag-model

support-registry-in-model-name

tag-and-push

tag-model

untagged-openai-models

#1

#10

#100

#101

#102

#103

#104

#105

#106

#107

#11

#110

#111

#112

#113

#115

#116

#117

#119

#12

#120

#123

#124

#125

#128

#129

#13

#133

#134

#135

#136

#137

#14

#142

#144

#145

#146

#147

#148

#148

#149

#15

#16

#17

#18

#19

#2

#20

#21

#22

#23

#24

#26

#27

#28

#29

#3

#30

#31

#32

#33

#34

#35

#36

#37

#38

#39

#4

#40

#41

#42

#43

#44

#44

#45

#46

#47

#47

#48

#48

#49

#5

#50

#51

#52

#53

#54

#55

#56

#57

#58

#59

#6

#60

#61

#62

#63

#65

#66

#67

#68

#69

#7

#70

#72

#73

#74

#75

#76

#77

#78

#79

#8

#80

#81

#83

#85

#86

#87

#88

#89

#9

#91

#92

#93

#94

#95

#96

#98

#99

2d039ecb65

Merge pull request #149 from docker/sandboxing main Jacob Howard 2025-09-02 08:26:04 -0600
935aab9b56

sandbox: rename New to Create Jacob Howard 2025-08-29 18:38:44 -0600
08ba6d8f78

sandbox: fix up test to be Windows-portable Jacob Howard 2025-08-29 18:06:10 -0600
fec154ead1

deps: used patched go-winjob to support windows/arm64 Jacob Howard 2025-08-29 17:59:10 -0600
5211e724c1

sandbox: add basic testing Jacob Howard 2025-08-29 17:05:25 -0600
1882c4e64e

sandbox: adjust macOS sandboxing for Docker Desktop development Jacob Howard 2025-08-29 16:16:11 -0600
4d922ff787

sandbox: add test for Windows sandbox configuration parsing Jacob Howard 2025-08-29 16:10:44 -0600
9238a83dd3

sandbox: implement Windows sandboxing and refactor API to accommodate Jacob Howard 2025-08-29 15:45:16 -0600
9741e9a734

sandbox: enable sandboxing for llama.cpp processes on macOS Jacob Howard 2025-08-29 13:42:41 -0600
3fb54b8295

Merge e259edb647 into 229e081bc2 Piotr 2025-08-29 15:36:04 +0200
e259edb647 inference: Return memory requirement in estimation error ps-better-estimation-error Piotr Stankiewicz 2025-08-29 15:29:57 +0200
229e081bc2

Merge pull request #147 from docker/openairecorder Dorin-Andrei Geman 2025-08-28 17:06:22 +0300
5449fc9dad refactor(OpenAIRecorder): use Unix timestamp instead of time.Time Dorin Geman 2025-08-28 16:22:00 +0300
3d702d7aca

Merge pull request #146 from docker/avoid-shallow-copy Jacob Howard 2025-08-26 11:21:10 -0600
1e13a3cac5

metrics: avoid an unnecessary request shallow copy Jacob Howard 2025-08-26 10:45:00 -0600
9a2dcdfc16

Merge pull request #145 from doringeman/openairecorder Dorin-Andrei Geman 2025-08-26 18:35:30 +0300
a13d77c153

fix(OpenAIRecorder): set default status code for in progress or canceled HTTP requests Dorin Geman 2025-08-26 17:49:46 +0300
af4bb5194f

fix: move CORS middleware to top level to handle preflight requests (#144) Alberto García Hierro 2025-08-26 12:34:32 +0200
bc74763e92 metrics: Record reasoning_content from streaming responses Piotr Stankiewicz 2025-08-25 13:04:34 +0200
a356bde677

fix: move CORS middleware to top level to handle preflight requests Alberto Garcia Hierro 2025-08-25 13:00:05 +0100
3d02fbba29 metrics: Record reasoning_content from streaming responses Piotr Stankiewicz 2025-08-25 13:04:34 +0200
5341c9fc29

Merge pull request #133 from docker/shards Emily Casey 2025-08-22 11:37:38 -0600
b9164891e7 Merge remote-tracking branch 'origin/main' into shards Emily Casey 2025-08-22 11:29:48 -0600
877ea617d4 update distribution mod to point at main branch Emily Casey 2025-08-22 11:29:11 -0600
0f01f66399

Merge pull request #137 from docker/fix-blob-url Emily Casey 2025-08-22 11:22:11 -0600
f09c4b4c98 Update distribution ref to main branch Emily Casey 2025-08-22 11:19:53 -0600
8584839332

Update pkg/inference/backends/llamacpp/llamacpp_config.go Emily Casey 2025-08-22 11:09:50 -0600
bb7abccf47

Update pkg/inference/backends/llamacpp/llamacpp.go Emily Casey 2025-08-22 10:38:25 -0600
9f7f778e82 Fix remote memory estimation: Emily Casey 2025-08-22 10:35:47 -0600
8d5f251df7 Merge remote-tracking branch 'origin/main' into shards Emily Casey 2025-08-22 09:27:07 -0600
156686cc6f Run from bundle Emily Casey 2025-08-21 22:10:17 -0600
d8ed374455 inference: Use common system memory size getter in the loader Piotr Stankiewicz 2025-08-22 15:05:51 +0200
eb12528e3c inference: Use common system memory size getter in the loader Piotr Stankiewicz 2025-08-22 15:05:51 +0200
03f7adc077 inference: Fix ignoring parse errors for unknown models Piotr Stankiewicz 2025-08-22 14:46:19 +0200
6d72f943f6 Make sure I don't commit vendor/ again Piotr Stankiewicz 2025-08-22 12:07:38 +0200
77e0de486f Remove vendor/ Piotr Stankiewicz 2025-08-22 12:00:52 +0200
d4e64465ea inference: Fix ignoring parse errors for unknown models Piotr Stankiewicz 2025-08-22 14:46:19 +0200
b3944c96a0 Make sure I don't commit vendor/ again Piotr Stankiewicz 2025-08-22 12:07:38 +0200
aa30bbd19a Remove vendor/ Piotr Stankiewicz 2025-08-22 12:00:52 +0200
933edd2249 inference: Fix up review comments Piotr Stankiewicz 2025-08-21 11:25:43 +0200
64c85dcd83 inference: Support disabling pre-pull memory checks Piotr Stankiewicz 2025-08-19 16:20:04 +0200
15e31feb30 inference: Block pull if model requires too much memory to run Piotr Stankiewicz 2025-07-30 15:13:00 +0200
880818f741 inference: Support memory estimation for remote models Piotr Stankiewicz 2025-07-30 13:11:57 +0200
59da65a365 Bump docker/model-distribution Piotr Stankiewicz 2025-07-30 13:11:07 +0200
44a1498e5b inference: Fix up review comments Piotr Stankiewicz 2025-08-21 11:25:43 +0200
e761a77518 inference: Support disabling pre-pull memory checks Piotr Stankiewicz 2025-08-19 16:20:04 +0200
739146e2d5 inference: Block pull if model requires too much memory to run Piotr Stankiewicz 2025-07-30 15:13:00 +0200
01ea183634 inference: Support memory estimation for remote models Piotr Stankiewicz 2025-07-30 13:11:57 +0200
fc70f078c6 Bump docker/model-distribution Piotr Stankiewicz 2025-07-30 13:11:07 +0200
1c13e4fc61 inference: Ignore parse errors when estimating model memory Piotr Stankiewicz 2025-08-06 16:40:10 +0200
33b40c0ce1 inference: Ignore parse errors when estimating model memory Piotr Stankiewicz 2025-08-06 16:40:10 +0200
d61ffd5311

updated the RequestResponsePair struct to differentiate between successful responses and error responses (#128) Ignasi 2025-08-06 16:29:58 +0200
8fbd9df988

updated the RequestResponsePair struct to differentiate between successful responses and error responses ilopezluna 2025-08-06 13:31:42 +0200
9e639fd253

Merge pull request #124 from aivantsov/patch-1 Jacob Howard 2025-07-30 11:30:13 +0300
29a306b5af

Fix the broken link to the Helm chart README. Andrei Ivantsov 2025-07-30 10:14:38 +0200
6b1cfee5a3

Merge pull request #123 from docker/nicks/chart Jacob Howard 2025-07-30 10:55:53 +0300
b42f3a0cb5

charts: add Kubernetes examples Nick Santos 2025-07-29 12:47:16 -0400
ecfa5e7e68 gpuinfo: Make CGO optional on darwin Piotr Stankiewicz 2025-07-24 14:15:48 +0200
1afdd96e3b gpuinfo: Make CGO optional on darwin Piotr Stankiewicz 2025-07-24 14:15:48 +0200
7777c22890

Merge pull request #113 from docker/model-load Dorin-Andrei Geman 2025-07-24 14:52:22 +0300
e2a0473732 Bump model-distribution to a11d745e58 Dorin Geman 2025-07-24 14:45:48 +0300
e748a3c4de chore: group and sort imports Dorin Geman 2025-07-24 14:44:08 +0300
602f657781

Revert "models/load: ensure request body is closed" Dorin-Andrei Geman 2025-07-24 14:39:11 +0300
43b96fc9a8 gpuinfo: Make building without cgo possible on Linux Piotr Stankiewicz 2025-07-24 11:03:35 +0200
db19d8318f

models/load: ensure request body is closed Dorin Geman 2025-07-24 12:27:25 +0300
31ddc69496 gpuinfo: Make building without cgo possible on Linux Piotr Stankiewicz 2025-07-24 11:03:35 +0200
4215c129be add model/load endpoint Emily Casey 2025-07-18 07:37:18 -0600
fc9b2a7171 inference: Fix typo in log Piotr 2025-07-22 14:07:08 +0200
47517fdefa inference: Fallback behaviour if reading RAM/VRAM size fails Piotr Stankiewicz 2025-07-22 12:00:21 +0200
2810fc21bd inference: Always return 1 as VRAM size on win/arm64 Piotr Stankiewicz 2025-07-22 11:36:20 +0200
5f7d3a22a9 gpuinfo: Use go:build instead of obsolete +build Piotr Stankiewicz 2025-07-22 11:29:53 +0200
263e4c7732 inference: Fix nv-gpu-info path and wrap errors Piotr Stankiewicz 2025-07-17 13:38:35 +0200
ecc3f8dde4 inference: Fix failing llama_config unit tests Piotr Stankiewicz 2025-07-15 16:39:43 +0200
3548e5f3e6 inference: Keep track of RAM allocated by runners Piotr Stankiewicz 2025-07-15 14:45:13 +0200
cc9656e64c inference, gpuinfo: Limit allowed models to 1 on windows/arm64 for now Piotr Stankiewicz 2025-07-14 15:41:19 +0200
00e3d60de5 gpuinfo: Release Metal device handle in VRAM size getter Piotr Stankiewicz 2025-07-14 15:15:41 +0200
ea3bb71830 Use nv-gpu-info on Windows to get VRAM size Piotr Stankiewicz 2025-07-14 15:12:05 +0200
6e096b2caa Move VRAM size getters to a separate package Piotr Stankiewicz 2025-07-14 14:22:18 +0200
96ecef4eed VRAM size getter for windows Piotr Stankiewicz 2025-07-11 14:49:26 +0200
c458b232a8 VRAM size getter for linux Piotr Stankiewicz 2025-07-11 14:44:23 +0200
a4dc5834d1 Implement basic memory estimation in scheduler Piotr Stankiewicz 2025-07-11 13:24:05 +0200
606aead0e5

Merge pull request #117 from docker/config-delete Jacob Howard 2025-07-23 14:05:29 +0300
77f24abb8b

Unload configs based on model ID and for both modes. Jacob Howard 2025-07-23 13:57:32 +0300
6a695dc026

Merge pull request #116 from doringeman/lock Dorin-Andrei Geman 2025-07-22 15:32:43 +0300
5fa2bee652

fix: switch to a RWMutex for synchronizing the router rebuild Dorin Geman 2025-07-22 15:29:30 +0300
2e872f9dd2

inference: Fix typo in log Piotr 2025-07-22 14:07:08 +0200
0c1a6b7bec

Merge pull request #115 from doringeman/misc Dorin-Andrei Geman 2025-07-22 13:42:38 +0300
b6d86e5606 inference: Fallback behaviour if reading RAM/VRAM size fails Piotr Stankiewicz 2025-07-22 12:00:21 +0200
cd5f08d043 inference: Always return 1 as VRAM size on win/arm64 Piotr Stankiewicz 2025-07-22 11:36:20 +0200
1d066f2137 gpuinfo: Use go:build instead of obsolete +build Piotr Stankiewicz 2025-07-22 11:29:53 +0200
ac9da883d3 inference: Fix nv-gpu-info path and wrap errors Piotr Stankiewicz 2025-07-17 13:38:35 +0200
7d39c7624c inference: Fix failing llama_config unit tests Piotr Stankiewicz 2025-07-15 16:39:43 +0200
ca187f9908 inference: Keep track of RAM allocated by runners Piotr Stankiewicz 2025-07-15 14:45:13 +0200
a3b83a8afe inference, gpuinfo: Limit allowed models to 1 on windows/arm64 for now Piotr Stankiewicz 2025-07-14 15:41:19 +0200
f99d4a2ee3 gpuinfo: Release Metal device handle in VRAM size getter Piotr Stankiewicz 2025-07-14 15:15:41 +0200
517484be03 Use nv-gpu-info on Windows to get VRAM size Piotr Stankiewicz 2025-07-14 15:12:05 +0200
3b42bc26d0 Move VRAM size getters to a separate package Piotr Stankiewicz 2025-07-14 14:22:18 +0200
9e6e41f3ac VRAM size getter for windows Piotr Stankiewicz 2025-07-11 14:49:26 +0200
d559e1b755 VRAM size getter for linux Piotr Stankiewicz 2025-07-11 14:44:23 +0200
f90e4703f5 Implement basic memory estimation in scheduler Piotr Stankiewicz 2025-07-11 13:24:05 +0200