model-runner

Commit Graph

Select branches

Hide Pull Requests

AIE-86/notify-errors-via-sse

AIE-87/model-not-found-on-rm

add-distribution-initializer

adds-makefile

bump-model-distribution

bump-model-distribution-progress

dockerize

dorin.geman/llamacpp-paths

force-delete

gitignore

handle-missing-dir

improvements

main

main.go

openai-passthrough

ps-beter-server-error

ps-better-estimation-error

ps-enable-fa

ps-fix-auto-update

ps-no-warmup

ps-win-arm64-support

revert-11-tag-model

support-registry-in-model-name

tag-and-push

tag-model

untagged-openai-models

#1

#10

#100

#101

#102

#103

#104

#105

#106

#107

#11

#110

#111

#112

#113

#115

#116

#117

#119

#12

#120

#123

#124

#125

#128

#129

#13

#133

#134

#135

#136

#137

#14

#142

#144

#145

#146

#147

#148

#148

#149

#15

#16

#17

#18

#19

#2

#20

#21

#22

#23

#24

#26

#27

#28

#29

#3

#30

#31

#32

#33

#34

#35

#36

#37

#38

#39

#4

#40

#41

#42

#43

#44

#44

#45

#46

#47

#47

#48

#48

#49

#5

#50

#51

#52

#53

#54

#55

#56

#57

#58

#59

#6

#60

#61

#62

#63

#65

#66

#67

#68

#69

#7

#70

#72

#73

#74

#75

#76

#77

#78

#79

#8

#80

#81

#83

#85

#86

#87

#88

#89

#9

#91

#92

#93

#94

#95

#96

#98

#99

0eff85e3c2

models: avoid error response when request is canceled/timed out Dorin Geman 2025-07-22 09:43:39 +0300
505613344c

Merge pull request #111 from docker/recorder-use-model-ref Jacob Howard 2025-07-18 18:28:59 +0300
3b5b4b58e4

recorder: use model ref rather than model ID Jacob Howard 2025-07-18 12:28:51 +0300
c873021e07

Merge pull request #112 from docker/context-regulation Jacob Howard 2025-07-18 18:19:30 +0300
843a94cf1b

Merge pull request #110 from docker/recorder-memory-leak Jacob Howard 2025-07-18 18:18:52 +0300
3fef8f2980

Use more aggressive server shutdown and resequence termination Jacob Howard 2025-07-18 13:17:47 +0300
2fd8517720

recorder: fix memory leak due to slice appends Jacob Howard 2025-07-18 12:24:31 +0300
a8437d373b

Use modelID instead of model tag (#98) Ignasi 2025-07-14 15:34:04 +0200
566ec6f11a

Merge pull request #107 from doringeman/cors Dorin-Andrei Geman 2025-07-14 14:28:10 +0300
3d466a2d80

runner: Do not allow CORS headers set by the inference engines Dorin Geman 2025-07-14 14:16:54 +0300
8907b3ddf8

Merge pull request #100 from doringeman/rm-untagged-msg Dorin-Andrei Geman 2025-07-11 16:08:25 +0300
2bd09df3ea

models: return untagged model information on delete Dorin Geman 2025-07-04 14:27:54 +0300
d370bbddbc

Move passthrough model name to constant and clarify error message. openai-passthrough Jacob Howard 2025-07-11 15:05:37 +0300
74c9839a5b

Update pkg/inference/scheduling/scheduler.go Jacob Howard 2025-07-11 05:59:59 -0600
fb9e1b83ef

Update pkg/inference/backends/openai/openai.go Jacob Howard 2025-07-11 05:57:24 -0600
e9f3b2f9e9

Include all backends' disk usage in total. Jacob Howard 2025-07-10 17:25:14 +0300
c05842c0a5

Don't attempt to log nil runner configurations. Jacob Howard 2025-07-10 17:16:27 +0300
8a5365132a

Add OpenAI passthrough backend. Jacob Howard 2025-07-10 17:09:37 +0300
9806a43e79

Adds support for Multimodal projector file (#104) Ignasi 2025-07-10 14:41:28 +0200
c532a6f835

Fix test ilopezluna 2025-07-10 11:37:40 +0200
17fe98de99

Includes --proj if the model includes a Multimodal projector file ilopezluna 2025-07-10 11:32:30 +0200
8b5d6aedfb

When the wf is triggered by the cron we are not setting the default model (#103) Ignasi 2025-07-10 10:05:45 +0200
e137b6b4e0

When the wf is triggered by the cron we are not setting the default model ilopezluna 2025-07-10 09:05:17 +0200
51a93c0aff

Adds daily check (#102) Ignasi 2025-07-09 13:55:29 +0200
d1cb1af584

Potential fix for code scanning alert no. 52: Workflow does not contain permissions Ignasi 2025-07-09 13:47:45 +0200
3c7613f115

no need to execute the wf on push anymore ilopezluna 2025-07-09 13:15:10 +0200
faedd52810

changing the JSON payload to use double quotes and proper variable expansion ilopezluna 2025-07-09 13:14:52 +0200
2438078cd4

Adds apt repo ilopezluna 2025-07-09 13:10:02 +0200
1070c64b35

Adds daily check ilopezluna 2025-07-09 12:53:15 +0200
a23dc5e514

Include the ID of the model and the actual reference in the logs ilopezluna 2025-07-09 10:42:04 +0200
296d29d7ee

Merge branch 'main' into use-model-id ilopezluna 2025-07-09 10:21:20 +0200
ebdff5c136

Merge pull request #101 from doringeman/tracking Dorin-Andrei Geman 2025-07-09 10:04:44 +0300
e21bd8f5de

metrics: include current request's User-Agent Dorin Geman 2025-07-08 15:22:38 +0300
66ce06c63b

Bump model-distribution to include total on pull/push progress messages bump-model-distribution-progress ilopezluna 2025-07-03 15:45:15 +0200
dfeb5677b4

Uses runnerInfo as a value type for runner map, to store runtime info of the runner. Currently the slot and the model reference used. ilopezluna 2025-07-02 16:07:41 +0200
d47c0f84bb

Rename model to modelID to make it clear ilopezluna 2025-07-02 15:50:00 +0200
d285a7fcc8

already sanitized ilopezluna 2025-07-02 15:30:25 +0200
28b1fd04a9

Potential fix for code scanning alert no. 43: Log entries created from user input Ignasi 2025-07-02 15:27:50 +0200
77b2752e89

Potential fix for code scanning alert no. 42: Log entries created from user input Ignasi 2025-07-02 15:27:34 +0200
12f80d12da

Potential fix for code scanning alert no. 40: Log entries created from user input Ignasi 2025-07-02 15:27:06 +0200
b6bf9a1463

Use modelID as key for runners & runnerConfigs map ilopezluna 2025-07-02 12:12:43 +0200
721058167a

Use modelID as key to record req/resp instead of model tag ilopezluna 2025-07-02 11:56:47 +0200
495cdb02ec

Add modelID resolver in model manager ilopezluna 2025-07-02 11:56:16 +0200
de1ef38e5c

Merge pull request #93 from docker/context-size Emily Casey 2025-06-27 11:06:14 -0600
cbdbb83bf7 Incorporate review feedback Emily Casey 2025-06-27 10:54:59 -0600
aac988e62a

Apply suggestions from code review Emily Casey 2025-06-27 10:47:18 -0600
4d690797f2 Bump model-distribution to point at main branch Emily Casey 2025-06-27 10:45:01 -0600
247d9e06a3 Respect context size from model config Emily Casey 2025-06-25 12:36:25 -0600
26a0a73fbb

Merge pull request #95 from docker/config-list Jacob Howard 2025-06-27 08:29:17 -0600
24299c17e9

Merge pull request #96 from doringeman/openai-recorder Dorin-Andrei Geman 2025-06-27 17:27:35 +0300
ac055739df

Implement Flusher interface for responseRecorder Dorin Geman 2025-06-27 17:16:04 +0300
fcf45f4271

Merge pull request #94 from doringeman/misc Dorin-Andrei Geman 2025-06-27 11:58:02 +0300
cce6a71cc9

Allow configuration through argument list (in addition to string) Jacob Howard 2025-06-26 15:19:04 -0600
cd8959b37e

scheduler: move OpenAI recording after runner load/release Dorin Geman 2025-06-26 15:14:26 +0300
a6db262a4c

Merge pull request #91 from doringeman/openai-recorder Dorin-Andrei Geman 2025-06-25 15:51:43 +0300
5d31399dca Log Hub response in llama-server auto update failure path Piotr Stankiewicz 2025-06-25 12:38:47 +0200
3a5500500e

OpenAIRecorder: Capture User-Agent header in records Dorin Geman 2025-06-25 15:17:39 +0300
ea9a321253

OpenAIRecorder: Include BackendConfiguration Dorin Geman 2025-06-25 14:54:23 +0300
3904f2314d

OpenAIRecorder: Remove records on model eviction/termination Dorin Geman 2025-06-25 14:29:16 +0300
3e38960805 Log Hub response in llama-server auto update failure path Piotr Stankiewicz 2025-06-25 12:38:47 +0200
be8f3e6696

Add OpenAIRecorder Dorin Geman 2025-06-24 11:59:21 +0300
b8561e1bfe

Only run tests once (#89) Ignasi 2025-06-20 13:14:20 +0200
927b1ce12b

Adds newline at the end of the file ilopezluna 2025-06-20 13:11:29 +0200
4a31c6442e

Only run tests once ilopezluna 2025-06-20 13:02:06 +0200
e0d5ff8a3e

Bump model-distribution (#88) Ignasi 2025-06-20 10:07:48 +0200
605a8c427d

go mod tidy ilopezluna 2025-06-20 10:03:00 +0200
ccc2a2bde6

Merge pull request #87 from docker/cloud-ttl-8hour Jacob Howard 2025-06-18 07:49:17 -0600
24a2a4b030

Merge pull request #86 from docker/config-refinement Jacob Howard 2025-06-18 07:48:52 -0600
ceb2e557d6

Bump model-distribution ilopezluna 2025-06-18 15:22:09 +0200
e57dcc0737 Update pkg/inference/scheduling/runner.go Piotr 2025-06-17 16:52:35 +0200
e249e49421 Fix stall in case a runner crashes while not in active use Piotr Stankiewicz 2025-06-17 16:14:58 +0200
f4a5e4b52d Return error in case of runner crash Piotr Stankiewicz 2025-06-05 13:56:16 +0200
e0ddfdf277

Temporarily bump GPU-enabled cloud idle timeout to 8 hours. Jacob Howard 2025-06-17 10:45:12 -0600
9f022114ba

Try to make backend configuration a little more robust. Jacob Howard 2025-06-17 10:11:58 -0600
5984f8a9a9

Update pkg/inference/scheduling/runner.go Piotr 2025-06-17 16:52:35 +0200
234d5aa387 Fix stall in case a runner crashes while not in active use Piotr Stankiewicz 2025-06-17 16:14:58 +0200
aaacee4160 Return error in case of runner crash Piotr Stankiewicz 2025-06-05 13:56:16 +0200
64153a7b4a

Create home folder (#85) Ignasi 2025-06-16 19:46:18 +0200
1a5e637426

Merge pull request #83 from docker/model-runner-environment Jacob Howard 2025-06-16 10:01:53 -0600
0cd306e68b

Detect operating environment and adjust certain behaviors. Jacob Howard 2025-06-13 16:31:48 -0600
5f263055dc

adds --home-dir to make it more clear ilopezluna 2025-06-16 16:06:04 +0200
5363abf8d9

Create home folder for modelrunner user ilopezluna 2025-06-16 16:01:50 +0200
9933b7dabf

Adds metrics endpoint (#78) Ignasi 2025-06-16 10:18:20 +0200
9c4674edcf

clean up ilopezluna 2025-06-16 09:32:42 +0200
5a6fd96347

wrap ResponseRecorder to avoid a race during tests bump-model-distribution ilopezluna 2025-06-13 17:29:59 +0200
7e7c7d0131

bump model-distribution ilopezluna 2025-06-13 15:49:43 +0200
bb394fb4e8

remove unneeded dep ilopezluna 2025-06-13 12:30:38 +0200
c304305804

I missed commiting this ilopezluna 2025-06-13 12:27:39 +0200
e92a9f0167

Merge branch 'main' into add-metrics ilopezluna 2025-06-13 12:25:07 +0200
17047a1413

acquire/release the loader's lock ilopezluna 2025-06-13 12:23:59 +0200
d8e2620afe

replace custom parser with official Prometheus libraries ilopezluna 2025-06-13 12:13:43 +0200
dac689435a

Remove NewSchedulerMetricsHandler, not used ilopezluna 2025-06-13 10:56:27 +0200
6d3a7aca16

Merge pull request #77 from doringeman/defunct-runners Dorin-Andrei Geman 2025-06-13 11:52:52 +0300
6cd0e61090 Parametrise llama-server version in release pipeline Piotr Stankiewicz 2025-06-13 10:43:35 +0200
93c1dc3304 Parametrise llama-server version in release pipeline Piotr Stankiewicz 2025-06-13 10:43:35 +0200
6b8c3b816f Update pkg/inference/scheduling/scheduler.go Piotr 2025-06-13 09:50:01 +0200
099b12231d Support runner configuration (temporary solution) Piotr Stankiewicz 2025-06-11 15:20:21 +0200
3dafee34b8

Update pkg/inference/scheduling/scheduler.go Piotr 2025-06-13 09:50:01 +0200
0e365a15a5 Support runner configuration (temporary solution) Piotr Stankiewicz 2025-06-11 15:20:21 +0200
0130eb6cf6 Prepare Docker file ahead of multi-backend builds of llama-server Piotr Stankiewicz 2025-06-12 15:30:21 +0200