vLLM/vllm - vllm - Gitea: Git with a cup of tea

Commit Graph

Author	SHA1	Message	Date
Michael Goin	0ddf88e16e	[CI] Enable test_initialization to run on V1 (#16736 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-05-23 15:09:44 -07:00
Huy Do	1645b60196	Use prebuilt FlashInfer x86_64 PyTorch 2.7 CUDA 12.8 wheel for CI (#18537 ) Signed-off-by: Huy Do <huydhn@gmail.com>	2025-05-23 21:17:16 +00:00
Jiayi Yao	2628a69e35	[V1] Support Deepseek MTP (#18435 ) Signed-off-by: Rui Qiao <ruisearch42@gmail.com> Signed-off-by: YaoJiayi <120040070@link.cuhk.edu.cn> Co-authored-by: Rui Qiao <ruisearch42@gmail.com>	2025-05-23 10:26:28 -07:00
Cyrus Leung	371f7e4ca2	[Doc] Fix broken links and unlinked docs, add shortcuts to home sidebar (#18627 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-05-23 10:22:40 -07:00
Cyrus Leung	15b45ffb9a	[Doc] Avoid documenting dynamic / internal modules (#18626 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-05-23 09:58:02 -07:00
Cyrus Leung	273cb3b4d9	[Doc] Fix top-level API links/docs (#18621 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-05-23 09:46:56 -07:00
David Xia	8ddd1cf26a	[Doc] fix list formatting (#18624 ) Signed-off-by: David Xia <david@davidxia.com>	2025-05-23 09:41:17 -07:00
Chen Zhang	6550114c9c	[v1] Redo "Support multiple KV cache groups in GPU model runner (#17945 )" (#18593 ) Signed-off-by: Chen Zhang <zhangch99@outlook.com>	2025-05-23 09:39:47 -07:00
Michael Goin	9520a989df	[Docs] Change mkdocs to not use directory urls (#18622 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-05-23 09:33:21 -07:00
Harry Mellor	3d28ad343f	Fix figures in design doc (#18612 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-05-23 09:09:54 -07:00
youkaichao	6a7988c55b	Refactor pplx init logic to make it modular (prepare for deepep) (#18200 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2025-05-23 23:43:43 +08:00
Cyrus Leung	022d8abe29	[Doc] Use a different color for the announcement (#18616 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-05-23 08:25:03 -07:00
Hyogeun Oh (오효근)	5221815a00	[Doc] Fix markdown list indentation for MkDocs rendering (#18620 ) Signed-off-by: Zerohertz <ohg3417@gmail.com>	2025-05-23 08:23:21 -07:00
Simon Mo	1068556b2c	[Bugfix][Build/CI] Fixup CUDA compiler version check for CUDA_SUPPORTED_ARCHS (#18579 )	2025-05-23 07:43:58 -07:00
Reid	2cd1fa4556	[Misc] add Haystack integration (#18601 ) Signed-off-by: reidliu41 <reid201711@gmail.com> Co-authored-by: reidliu41 <reid201711@gmail.com>	2025-05-23 06:21:19 -07:00
Harry Mellor	d4c2919760	Include private attributes in API documentation (#18614 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-05-23 06:18:31 -07:00
Tristan Leclercq	6220f3c6b0	[Bugfix] Fix transformers model impl ignored for mixtral quant (#18602 ) Signed-off-by: Tristan Leclercq <tristanleclercq@gmail.com>	2025-05-23 05:54:13 -07:00
Harry Mellor	52fb23f47e	Fix examples with code blocks in docs (#18609 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-05-23 05:53:44 -07:00
Cyrus Leung	6dd51c7ef1	[CI/Build] Fix V1 flag being set in entrypoints tests (#18598 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-05-23 05:51:53 -07:00
Harry Mellor	2edb533af2	Replace `{func}` with mkdocs style links (#18610 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-05-23 05:51:38 -07:00
Hyogeun Oh (오효근)	38a95cb4a8	[Doc] Fix indent of contributing to vllm (#18611 ) Signed-off-by: Zerohertz <ohg3417@gmail.com>	2025-05-23 05:50:07 -07:00
Ning Xie	cd821ea5d2	[CI] fix kv_cache_type argument (#18594 ) Signed-off-by: Andy Xie <andy.xning@gmail.com>	2025-05-23 04:49:18 -07:00
Kay Yan	7ab056c273	[Hardware][CPU] Update intel_extension_for_pytorch 2.7.0 and move to `requirements/cpu.txt` (#18542 ) Signed-off-by: Kay Yan <kay.yan@daocloud.io>	2025-05-23 04:38:42 -07:00
Harry Mellor	6526e05111	Add myself as docs code owner (#18605 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-05-23 04:08:31 -07:00
Madeesh Kannan	e493e48524	[V0][Bugfix] Fix parallel sampling performance regression when guided decoding is enabled (#17731 ) Signed-off-by: Madeesh Kannan <shadeMe@users.noreply.github.com> Co-authored-by: Russell Bryant <rbryant@redhat.com>	2025-05-23 03:38:23 -07:00
Mengqing Cao	4ce64e2df4	[Bugfix][Model] Fix baichuan model loader for tp (#18597 ) Signed-off-by: Mengqing Cao <cmq0113@163.com>	2025-05-23 02:39:05 -07:00
Cyrus Leung	fbb13a2c15	Revert "[V1] [Bugfix] eagle bugfix and enable correct lm_head for multimodal (#18034 )" (#18600 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-05-23 02:18:22 -07:00
Harry Mellor	a1fe24d961	Migrate docs from Sphinx to MkDocs (#18145 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-05-23 02:09:53 -07:00
Yuqi Zhang	d0bc2f810b	[Bugfix] Add half type support in reshape_and_cache_cpu_impl on x86 cpu platform (#18430 ) Signed-off-by: Yuqi Zhang <yuqizhang@google.com> Co-authored-by: Yuqi Zhang <yuqizhang@google.com>	2025-05-23 01:41:37 -07:00
Chauncey	b046cf792d	[Feature][V1]: suupports cached_tokens in response usage (#18149 ) Co-authored-by: simon-mo <xmo@berkeley.edu>	2025-05-23 01:41:03 -07:00
Michael Goin	54af915949	[Doc] Update quickstart and install for cu128 using `--torch-backend=auto` (#18505 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-05-23 08:36:37 +00:00
cascade	71ea614d4a	[Feature]Add async tensor parallelism using compilation pass (#17882 ) Signed-off-by: cascade812 <cascade812@outlook.com>	2025-05-23 01:03:34 -07:00
RonaldBXu	4c611348a7	[V1] [Bugfix] eagle bugfix and enable correct lm_head for multimodal (#18034 ) Signed-off-by: Ronald Xu <ronaldxu@amazon.com>	2025-05-23 00:37:18 -07:00
Ning Xie	60cad94b86	[Hardware] correct method signatures for HPU,ROCm,XPU (#18551 ) Signed-off-by: Andy Xie <andy.xning@gmail.com>	2025-05-22 22:31:59 -07:00
Shanshan Shen	9c1baa5bc6	[Misc] Replace `cuda` hard code with `current_platform` (#16983 ) Signed-off-by: shen-shanshan <467638484@qq.com>	2025-05-23 04:38:50 +00:00
Teruaki Ishizaki	4be2255c81	[Bugfix][Benchmarks] Fix a benchmark of deepspeed-mii backend to use api_key (#17291 ) Signed-off-by: Teruaki Ishizaki <teruaki.ishizaki@ntt.com>	2025-05-23 12:30:47 +08:00
aws-elaineyz	ed5d408255	[Neuron] Remove bypass on EAGLEConfig and add a test (#18514 ) Signed-off-by: Elaine Zhao <elaineyz@amazon.com>	2025-05-22 21:26:32 -07:00
Benjamin Chislett	583507d130	[Spec Decode] Make EAGLE3 draft token ID mapping optional (#18488 ) Signed-off-by: Benjamin Chislett <benjamin.chislett@centml.ai> Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-05-22 20:17:39 -07:00
lkchen	e44d8ce8c7	[Bugfix] Set `KVTransferConfig.engine_id` in post_init (#18576 ) Signed-off-by: Linkun Chen <github@lkchen.net>	2025-05-23 02:54:42 +00:00
Nick Hill	93ecb8139c	[BugFix] Increase TP execute_model timeout (#18558 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-05-23 10:22:11 +08:00
CYJiang	fae453f8ce	[Misc] refactor: simplify input validation and num_requests handling in _convert_v1_inputs (#18482 ) Signed-off-by: googs1025 <googs1025@gmail.com>	2025-05-23 10:15:32 +08:00
Harry Mellor	4b0da7b60e	Enable hybrid attention models for Transformers backend (#18494 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-05-23 10:12:08 +08:00
Mark McLoughlin	c6b636f9fb	[V1][Spec Decoding] Use model_loader.get_model() to load models (#18273 ) Signed-off-by: Mark McLoughlin <markmc@redhat.com>	2025-05-23 02:05:44 +00:00
Chenheli Hua	04eb88dc80	Re-submit: Fix: Proper RGBA -> RGB conversion for PIL images. (#18569 ) Signed-off-by: Chenheli Hua <huachenheli@outlook.com>	2025-05-23 01:59:18 +00:00
rasmith	46791e1b4b	[AMD] [P/D] Compute num gpus for ROCm correctly in run_accuracy_test.sh (#18568 ) Signed-off-by: Randall Smith <Randall.Smith@amd.com>	2025-05-22 18:45:35 -07:00
Sanger Steel	c32e249a23	[Frontend] [Core] Add Tensorizer support for V1, LoRA adapter serialization and deserialization (#17926 ) Signed-off-by: Sanger Steel <sangersteel@gmail.com>	2025-05-22 18:44:18 -07:00
Kai Wu	c91fe7b1b9	[Frontend][Bug Fix] Update llama4 pythonic jinja template and llama4_pythonic parser (#17917 ) Signed-off-by: Kai Wu <kaiwu@meta.com>	2025-05-22 16:44:08 -07:00
Ekagra Ranjan	a04720bc36	[V1][Spec Decode][Bugfix] Load quantize weights for EAGLE (#18290 )	2025-05-22 15:17:33 -07:00
lkchen	7b9d832c80	[Tool] Add NIXL installation script (#18172 ) Signed-off-by: Linkun <github@lkchen.net>	2025-05-22 14:33:16 -07:00
Tyler Michael Smith	6e588da0f4	[Build/CI] Fix CUDA 11.8 build (#17679 ) Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com> Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com> Signed-off-by: Tyler Michael Smith <tysmith@redhat.com> Co-authored-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>	2025-05-22 12:13:54 -07:00

1 2 3 4 5 ...

6796 Commits All Branches Search

6796 Commits

All Branches