vLLM/vllm - vllm - Gitea: Git with a cup of tea

Commit Graph

Author	SHA1	Message	Date
Chengji Yao	b61dc5f972	[TPU] update torch_xla pin (#19231 ) Signed-off-by: Chengji Yao <chengjiyao@google.com>	2025-06-06 04:27:38 +00:00
Chen Zhang	f8a1a2d108	[v1] Hybrid Memory Allocator (#17996 ) Signed-off-by: Chen Zhang <zhangch99@outlook.com>	2025-06-05 20:47:09 -07:00
Benjamin Chislett	3465b87ef8	[Bugfix] Fix EAGLE vocab embedding construction for Llama 70B (#19033 ) Signed-off-by: Benjamin Chislett <benjamin.chislett@centml.ai>	2025-06-05 19:10:08 -07:00
Jerry Zhang	c8134bea15	Fix AOPerModuleConfig name changes (#18869 ) Signed-off-by: Jerry Zhang <jerryzh168@gmail.com>	2025-06-05 18:51:32 -07:00
Luis Vega	cb6d572e85	[Model] NemotronH support (#18863 ) Signed-off-by: Luis Vega <2478335+vegaluisjose@users.noreply.github.com> Co-authored-by: Luis Vega <2478335+vegaluisjose@users.noreply.github.com>	2025-06-05 21:29:28 +00:00
Dipika Sikka	aa49f14832	[Quantization] Skip Fp4 Test for `compressed-tensors` (#19217 )	2025-06-05 18:21:53 +00:00
Povilas Kanapickas	85e2b7bb13	[MISC][Bugfix] Use less CPU when message queue has been empty for some time (#16226 ) Signed-off-by: Povilas Kanapickas <povilas@radix.lt>	2025-06-05 16:53:08 +00:00
Chiyue Wei	61059bee40	[Hardware][NVIDIA] FP4 MoE kernel optimization (#19110 ) Signed-off-by: Chiyue Wei <chiyuew@nvidia.com> Co-authored-by: Chiyue Wei <chiyuew@nvidia.com>	2025-06-05 09:48:26 -07:00
Guillaume Calmettes	9bc8bb07cf	[Bugfix] properly catch PIL-related errors for vision models when incorrect data urls are provided (#19202 ) Signed-off-by: Guillaume Calmettes <gcalmettes@scaleway.com>	2025-06-05 12:59:28 +00:00
Chauncey	8fc57501d3	[Bugfix]: Fix the incompatibility issue with stream when Thinking is disabled (#19135 ) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>	2025-06-05 06:24:24 +00:00
Robert Shaw	c56ed8bb0e	[Bugfix][Nixl] Fix full prefix cache hit bug (#18632 ) Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> Signed-off-by: Nick Hill <nhill@redhat.com> Co-authored-by: Nick Hill <nhill@redhat.com>	2025-06-05 02:07:32 +00:00
Nicolò Lucchesi	b2fac67130	[P/D] Heterogeneous TP (#18833 ) Signed-off-by: nicklucche <nlucches@redhat.com>	2025-06-04 23:25:34 +00:00
Varun Sundar Rabindranath	c3fd4d669a	[Kernel] Integrate batched/masked deepgemm kernel (#19111 ) Signed-off-by: Varun <vsundarr@redhat.com> Co-authored-by: Varun <vsundarr@redhat.com>	2025-06-04 21:59:18 +00:00
Siyuan Liu	7ee2590478	[TPU] Update dynamo dump file name in compilation test (#19108 ) Signed-off-by: Siyuan Liu <lsiyuan@google.com>	2025-06-04 16:13:43 -04:00
jmswen	c8dcc15921	Allow AsyncLLMEngine.generate to target a specific DP rank (#19102 ) Signed-off-by: Jon Swenson <jmswen@gmail.com>	2025-06-04 08:26:47 -07:00
Cyrus Leung	01dc9a76db	[CI/Build][Bugfix] Ensure compatibility with transformers 4.52 (#18678 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-06-04 04:49:20 -07:00
wang.yuqi	35cf32df30	Improve the output precision of embedding models (#19092 )	2025-06-04 11:48:57 +00:00
Seiji Eicher	2669a0d7b5	Fix ValueError: Missing value for tag key(s): model_name,engine. (#19113 ) Signed-off-by: Seiji Eicher <seiji@anyscale.com>	2025-06-04 17:10:45 +08:00
Siyuan Liu	8e972d9c44	[TPU] Skip hanging tests (#19115 ) Signed-off-by: Siyuan Liu <lsiyuan@google.com>	2025-06-04 01:43:00 -07:00
Woosuk Kwon	b124e1085b	[Bugfix] Fix FA3 full cuda graph correctness (#19106 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-06-03 23:10:15 -07:00
Kaixi Hou	41aa578428	[NVIDIA] Add Cutlass MLA backend (#17625 )	2025-06-03 21:40:26 -07:00
Vadim Gimpelson	5d6d1adf15	[KERNEL] Sampler. CUDA kernel for applying repetition penalty (#18437 )	2025-06-03 21:13:01 -07:00
Li, Jiang	4555143ea7	[CPU] V1 support for the CPU backend (#16441 )	2025-06-03 18:43:01 -07:00
Yan Ru Pei	b712be98c7	feat: add data parallel rank to KVEventBatch (#18925 )	2025-06-03 17:14:20 -07:00
Chen Zhang	a8da78eac9	[Bugfix] Max concurrency estimation and check_enough_kv_cache_memory for models with sliding window layers (#19029 ) Signed-off-by: Chen Zhang <zhangch99@outlook.com>	2025-06-04 00:14:06 +00:00
Chauncey	4de790fcad	[Bugfix]: Fix the incompatibility issue with tool_choice 'required' when Thinking is enabled (#19075 ) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>	2025-06-03 23:27:24 +00:00
Chen Zhang	b5fd9506c1	[Bugfix] get_num_blocks_to_allocate with null_block (#19031 ) Signed-off-by: Chen Zhang <zhangch99@outlook.com>	2025-06-03 15:30:55 -07:00
Chen Zhang	6cac54f4d1	[v1] Re-init input batch for multiple kv cache groups (#18654 ) Signed-off-by: Chen Zhang <zhangch99@outlook.com>	2025-06-03 21:41:36 +00:00
Harry Mellor	6865fe0074	Fix interaction between `Optional` and `Annotated` in CLI typing (#19093 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Yikun Jiang <yikun@apache.org>	2025-06-03 21:07:19 +00:00
Yong Hoon Shin	bdf13965ab	[V1] Support cross-layer KV sharing (#18212 ) Signed-off-by: Yong Hoon Shin <yhshin@meta.com>	2025-06-03 20:33:07 +00:00
Varun Sundar Rabindranath	fa98d77773	[Kernel] DeepEP dispatch-combine kernel integration (#18434 ) Signed-off-by: Varun <vsundarr@redhat.com> Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>	2025-06-03 12:30:02 -07:00
Simon Mo	02f0c7b220	[Misc] Add SPDX-FileCopyrightText (#19100 ) Signed-off-by: simon-mo <simon.mo@hey.com>	2025-06-03 11:20:17 -07:00
Harry Mellor	476844d44c	Fix underscores in dict keys passed via CLI (#19030 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-06-03 14:39:24 +00:00
Jee Jee Li	4e68ae5e59	[CI/Build] Remove V0 LoRA test (#19066 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-06-03 14:30:18 +00:00
Chen Zhang	f32fcd9444	[v1][KVCacheManager] Rename BlockHashType to BlockHash (#19015 ) Signed-off-by: Chen Zhang <zhangch99@outlook.com>	2025-06-03 08:01:48 +00:00
汪志鹏	1282bd812e	Add tarsier model support (#18985 ) Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>	2025-06-03 13:13:13 +08:00
Rui Qiao	bdce64f236	[V1] Support DP with Ray (#18779 )	2025-06-02 21:15:13 -07:00
Siyuan Liu	9112b443a0	[Hardware][TPU] Initial support of model parallelism with single worker using SPMD (#18011 ) Signed-off-by: Siyuan Liu <lsiyuan@google.com> Co-authored-by: Hossein Sarshar <hossein.sarshar@gmail.com> Co-authored-by: Chengji Yao <chengjiyao@google.com>	2025-06-03 00:06:20 +00:00
22quinn	9760fd8f6a	[Core] Support inplace model weights loading (#18745 ) Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com>	2025-06-02 17:38:50 +08:00
Cyrus Leung	6aa8f9a4e7	[Core] Rework dtype resolution (#18751 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-06-01 11:04:23 +08:00
Reid	20079c6e36	[Misc] add return token strs for tokenize (#18941 ) Signed-off-by: reidliu41 <reid201711@gmail.com> Co-authored-by: reidliu41 <reid201711@gmail.com>	2025-05-31 18:00:11 +00:00
Charlie Fu	306d60401d	[ROCm][Kernel] Add gfx950 support for skinny gemms (#18010 ) Signed-off-by: charlifu <charlifu@amd.com>	2025-05-31 07:40:05 -07:00
vllmellm	0f5e0d567e	[FEAT][ROCm] Add AITER grouped topk for DeepSeekV2 (#18825 ) Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>	2025-05-31 03:39:31 -07:00
Pooya Davoodi	dff80b0e42	[Frontend] Add rerank support to run_batch endpoint (#16278 ) Signed-off-by: Pooya Davoodi <pooya.davoodi@parasail.io>	2025-05-31 07:40:01 +00:00
Will Eaton	1dab4d5718	Tool parser regex timeout handling (#18960 ) Signed-off-by: Will Eaton <weaton@redhat.com>	2025-05-30 21:02:54 +00:00
Isotr0py	5a8641638a	[VLM] Add PP support and fix GPTQ inference for Ovis models (#18958 ) Signed-off-by: isotr0py <2037008807@qq.com> Signed-off-by: Isotr0py <2037008807@qq.com>	2025-05-30 17:11:44 +00:00
Nick Hill	2dbe8c0774	[Perf] API-server scaleout with many-to-many server-engine comms (#17546 )	2025-05-30 08:17:00 -07:00
Shawn Huang	e1fadf1197	[Feature] minicpm eagle support (#18943 ) Signed-off-by: huangyuxiang03 <huangyx0321@gmail.com> Co-authored-by: huangyuxiang03 <huangyx0321@gmail.com>	2025-05-30 06:45:56 -07:00
Carol Zheng	fba02e3bd1	[Bugfix][TPU] Fix tpu model runner testcase failure (#18810 ) Signed-off-by: Carol Zheng <cazheng@google.com>	2025-05-30 18:04:03 +08:00
Cyrus Leung	1aa2f81b43	[Misc] Update type annotation for rotary embedding `base` (#18914 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-05-30 10:17:01 +08:00
Chengji Yao	a1cc9f33a3	[TPU] remove transpose ops in moe kernel (#18923 ) Signed-off-by: Chengji Yao <chengjiyao@google.com>	2025-05-29 23:00:11 +00:00
Nick Hill	d1d61f3351	[BugFix] Make DP work with connector-delayed new requests (#18559 ) Signed-off-by: Nick Hill <nhill@redhat.com> Co-authored-by: Will Eaton <weaton@redhat.com>	2025-05-29 18:04:18 +00:00
Nicolò Lucchesi	32ce3cf7c9	[V1] Allocate kv_cache with stride order for V1 (#18775 ) Signed-off-by: nicklucche <nlucches@redhat.com>	2025-05-29 17:54:16 +00:00
Cyrus Leung	c29034037d	[Deprecation] Disallow pos-args other than `model` when initializing `LLM` (#18802 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-05-29 09:36:58 -07:00
Isotr0py	c9479b2920	[Bugfix] Fix the failing gte embedding test (#18720 ) Signed-off-by: Isotr0py <2037008807@qq.com>	2025-05-29 07:39:25 -07:00
Satyajith Chilappagari	972eddf7c9	[Neuron] Add multi-LoRA support for Neuron. (#18284 ) Signed-off-by: Satyajith Chilappagari <satchill@amazon.com>	2025-05-29 16:41:22 +08:00
Richard Zou	26b4fa45be	Add ability to use CUDAGraphs with use_inductor=False (#17345 ) Signed-off-by: rzou <zou3519@gmail.com>	2025-05-29 10:16:52 +08:00
Hongxia Yang	269d901734	[Bugfix][ROCm] fix the power of 2 exception from triton_unified_attention.py when running llama4 models and unit test fix (#18100 ) Signed-off-by: Hongxia Yang <hongxia.yang@amd.com> Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com> Co-authored-by: tjtanaa <tunjian.tan@embeddedllm.com>	2025-05-29 07:21:46 +08:00
Akshat Tripathi	643622ba46	[Hardware][TPU][V1] Multi-LoRA Optimisations for the V1 TPU backend (#15655 ) Signed-off-by: Akshat Tripathi <akshat@krai.ai> Signed-off-by: Chengji Yao <chengjiyao@google.com> Signed-off-by: xihajun <junfan@krai.ai> Signed-off-by: Jorge de Freitas <jorge.de-freitas22@imperial.ac.uk> Signed-off-by: Jorge de Freitas <jorge@krai.ai> Co-authored-by: Chengji Yao <chengjiyao@google.com> Co-authored-by: xihajun <junfan@krai.ai> Co-authored-by: Jorge de Freitas <jorge.de-freitas22@imperial.ac.uk> Co-authored-by: Jorge de Freitas <jorge@krai.ai>	2025-05-28 19:59:09 +00:00
Mark McLoughlin	0e98964e94	[V1][Metrics] Remove metrics that were deprecated in 0.8 (#18837 ) Signed-off-by: Mark McLoughlin <markmc@redhat.com>	2025-05-28 18:54:12 +00:00
Alex Brooks	321331b8ae	[Core] Add Lora Support to Beam Search (#18346 ) Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>	2025-05-28 08:58:24 -07:00
Reid	435fa95444	[Frontend] add run batch to CLI (#18804 ) Signed-off-by: reidliu41 <reid201711@gmail.com> Co-authored-by: reidliu41 <reid201711@gmail.com>	2025-05-28 07:08:57 -07:00
Harry Mellor	4c2b38ce9e	Enable Pydantic mypy checks and convert configs to Pydantic dataclasses (#17599 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-05-28 12:46:04 +00:00
wang.yuqi	de65fc8e1e	[CI] improve embed testing (#18747 )	2025-05-28 00:16:35 -07:00
Rabi Mishra	b78f844a67	[Bugfix][FailingTest]Fix test_model_load_with_params.py (#18758 ) Signed-off-by: rabi <ramishra@redhat.com>	2025-05-28 05:42:54 +00:00
wang.yuqi	3e9ce609bd	[Bugfix] Fix nomic max_model_len (#18755 )	2025-05-27 20:29:53 -07:00
Satyajith Chilappagari	e0cbad4e30	[Neuron] Support quantization on neuron (#18283 ) Signed-off-by: Satyajith Chilappagari <satchill@amazon.com>	2025-05-27 22:10:33 +00:00
Michael Goin	5873877241	[Bugfix] Mistral tool calling when content is list (#18729 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-05-27 09:05:37 -07:00
Mark McLoughlin	06a0338015	[V1][Metrics] Add API for accessing in-memory Prometheus metrics (#17010 ) Signed-off-by: Mark McLoughlin <markmc@redhat.com>	2025-05-27 09:37:06 +00:00
Isotr0py	1f1b1bc03b	[V1][Quantization] Add CUDA graph compatible v1 GGUF support (#18646 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> Signed-off-by: Isotr0py <2037008807@qq.com>	2025-05-27 04:40:28 +00:00
Cyrus Leung	82e2339b06	[Doc] Move examples and further reorganize user guide (#18666 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-05-26 07:38:04 -07:00
Ning Xie	5a2c76cbe1	[CI] fix dump_input for str type (#18697 ) Signed-off-by: Andy Xie <andy.xning@gmail.com>	2025-05-26 18:23:35 +08:00
Cyrus Leung	38b13dfe78	[CI/Build] Replace `math.isclose` with `pytest.approx` (#18703 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-05-26 02:05:17 -07:00
Ning Xie	4ea62c0ea0	[CI] add missing argument (#18694 ) Signed-off-by: Andy Xie <andy.xning@gmail.com>	2025-05-26 00:22:04 -07:00
Cyrus Leung	fba0642704	[CI/Build][Doc] Update `gte-Qwen2-1.5B-instruct` usage (#18683 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: Isotr0py <2037008807@qq.com> Co-authored-by: Isotr0py <2037008807@qq.com>	2025-05-25 20:27:50 -07:00
Cyrus Leung	57fd13a707	[Bugfix] Fix profiling dummy data for Pixtral (#18677 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-05-25 14:05:30 +00:00
Michael Goin	63934543a0	Speed up the `kernels/quantization/` tests (#18669 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-05-25 05:02:59 +00:00
Isotr0py	75f81750f3	[VLM] Initialize video input support for InternVL models (#18499 ) Signed-off-by: Isotr0py <2037008807@qq.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>	2025-05-25 04:51:25 +00:00
Mengqing Cao	6ab681bcbe	[Misc][ModelScope] Change to use runtime VLLM_USE_MODELSCOPE (#18655 ) Signed-off-by: Mengqing Cao <cmq0113@163.com> Signed-off-by: Isotr0py <2037008807@qq.com> Co-authored-by: Isotr0py <2037008807@qq.com>	2025-05-25 04:51:21 +00:00
qizixi	c1e4a4052d	[V1][Spec Decode] Support multi-layer eagle draft model (#18030 ) Signed-off-by: qizixi <qizixi@meta.com>	2025-05-24 09:45:34 +00:00
Yuanhao WU	a859320575	[Model] Add support for Qwen2.5-Omni-7B-AWQ (Qwen2_5OmniForConditionalGeneration) (#18647 )	2025-05-24 09:15:36 +00:00
qizixi	d55e446d13	[V1][Spec Decode] Small refactors to improve eagle bookkeeping performance (#18424 ) Signed-off-by: qizixi <qizixi@meta.com>	2025-05-24 06:51:22 +00:00
Robert Shaw	2b10ba7491	[Bugfix][Nixl] Fix Preemption Bug (#18631 ) Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>	2025-05-23 23:30:16 +00:00
Feng XiaoLong	4fc1bf813a	[Bugfix] Migrate to REGEX Library to prevent catastrophic backtracking (#18454 ) Signed-off-by: Crucifixion-Fxl <xmufxl@gmail.com> Co-authored-by: Crucifixion-Fxl <xmufxl@gmail.com>	2025-05-23 16:16:26 -07:00
Michael Goin	0ddf88e16e	[CI] Enable test_initialization to run on V1 (#16736 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-05-23 15:09:44 -07:00
Chen Zhang	6550114c9c	[v1] Redo "Support multiple KV cache groups in GPU model runner (#17945 )" (#18593 ) Signed-off-by: Chen Zhang <zhangch99@outlook.com>	2025-05-23 09:39:47 -07:00
Ning Xie	cd821ea5d2	[CI] fix kv_cache_type argument (#18594 ) Signed-off-by: Andy Xie <andy.xning@gmail.com>	2025-05-23 04:49:18 -07:00
Chauncey	b046cf792d	[Feature][V1]: suupports cached_tokens in response usage (#18149 ) Co-authored-by: simon-mo <xmo@berkeley.edu>	2025-05-23 01:41:03 -07:00
cascade	71ea614d4a	[Feature]Add async tensor parallelism using compilation pass (#17882 ) Signed-off-by: cascade812 <cascade812@outlook.com>	2025-05-23 01:03:34 -07:00
aws-elaineyz	ed5d408255	[Neuron] Remove bypass on EAGLEConfig and add a test (#18514 ) Signed-off-by: Elaine Zhao <elaineyz@amazon.com>	2025-05-22 21:26:32 -07:00
lkchen	e44d8ce8c7	[Bugfix] Set `KVTransferConfig.engine_id` in post_init (#18576 ) Signed-off-by: Linkun Chen <github@lkchen.net>	2025-05-23 02:54:42 +00:00
Harry Mellor	4b0da7b60e	Enable hybrid attention models for Transformers backend (#18494 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-05-23 10:12:08 +08:00
Mark McLoughlin	c6b636f9fb	[V1][Spec Decoding] Use model_loader.get_model() to load models (#18273 ) Signed-off-by: Mark McLoughlin <markmc@redhat.com>	2025-05-23 02:05:44 +00:00
Chenheli Hua	04eb88dc80	Re-submit: Fix: Proper RGBA -> RGB conversion for PIL images. (#18569 ) Signed-off-by: Chenheli Hua <huachenheli@outlook.com>	2025-05-23 01:59:18 +00:00
rasmith	46791e1b4b	[AMD] [P/D] Compute num gpus for ROCm correctly in run_accuracy_test.sh (#18568 ) Signed-off-by: Randall Smith <Randall.Smith@amd.com>	2025-05-22 18:45:35 -07:00
Sanger Steel	c32e249a23	[Frontend] [Core] Add Tensorizer support for V1, LoRA adapter serialization and deserialization (#17926 ) Signed-off-by: Sanger Steel <sangersteel@gmail.com>	2025-05-22 18:44:18 -07:00
Kai Wu	c91fe7b1b9	[Frontend][Bug Fix] Update llama4 pythonic jinja template and llama4_pythonic parser (#17917 ) Signed-off-by: Kai Wu <kaiwu@meta.com>	2025-05-22 16:44:08 -07:00
Tyler Michael Smith	6e588da0f4	[Build/CI] Fix CUDA 11.8 build (#17679 ) Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com> Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com> Signed-off-by: Tyler Michael Smith <tysmith@redhat.com> Co-authored-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>	2025-05-22 12:13:54 -07:00
David Xia	1f3a1200e4	[Bugfix] make `test_openai_schema.py` pass (#18224 ) Signed-off-by: David Xia <david@davidxia.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-05-22 18:34:06 +00:00
Harry Mellor	ca86a7cf6e	[CI/Build] Update bamba test model location (#18544 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-05-22 06:01:07 -07:00
lkchen	a35a494745	[Bugfix] Add kwargs to RequestOutput __init__ to be forward compatible (#18513 ) Signed-off-by: Linkun <github@lkchen.net>	2025-05-22 05:24:43 -07:00
aws-elaineyz	fa72f9a812	Order sequence ids + config update to support specifying custom quantization layers (#18279 ) Signed-off-by: Elaine Zhao <elaineyz@amazon.com> Co-authored-by: Tailin Pan <tailinpa@amazon.com> Co-authored-by: Rishabh Rajesh <rishyraj@amazon.com> Co-authored-by: Yishan McNabb <yishanm@amazon.com> Co-authored-by: Patrick Lange <patlange@amazon.com> Co-authored-by: Maxwell Goldberg <mgld@amazon.com> Co-authored-by: Aakash Shetty <sheaak@amazon.com>	2025-05-22 02:20:36 -07:00
Jee Jee Li	db5a29ba19	[Bugfix] Fix LoRA test (#18518 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-05-21 21:48:53 -07:00
Russell Bryant	6e0fd34d3c	[CI] Fix race condition with StatelessProcessGroup.barrier (#18506 ) Signed-off-by: Russell Bryant <rbryant@redhat.com>	2025-05-21 20:19:13 -07:00
Mark McLoughlin	bb0a311213	Revert "[v1] Support multiple KV cache groups in GPU model runner (#17945 ) (#18459 ) Signed-off-by: Mark McLoughlin <markmc@redhat.com>	2025-05-21 10:25:23 -07:00
Hosang	dd5fa7e04f	[ROCm][Kernel][V1] Enable AMD Radeon GPU Custom Paged Attention on v1 (#17004 ) Signed-off-by: Hosang Yoon <hosang.yoon@amd.com>	2025-05-21 08:35:00 -07:00
bnellnm	c6c10ca920	[Bugfix] Reduce moe_sum test size to avoid OOM (#18484 ) Signed-off-by: Bill Nell <bnell@redhat.com>	2025-05-21 06:46:39 -07:00
Dhia Eddine Rhaiem	eca18691d2	[MODEL] FalconH1 (#18406 ) Signed-off-by: dhia.rhaiem <dhia.rhaiem@tii.ae> Co-authored-by: younesbelkada <younesbelkada@gmail.com> Co-authored-by: Ilyas Chahed <ilyas.chahed@tii.ae> Co-authored-by: Jingwei Zuo <jingwei.zuo@tii.ae>	2025-05-21 04:59:06 -07:00
Rabi Mishra	61acfc45bc	[Bugfix][Failing Test] Fix test_events.py (#18460 ) Signed-off-by: rabi <ramishra@redhat.com>	2025-05-21 04:57:28 -07:00
bnellnm	92247c522e	[Bug] Fix moe_sum signature (#18440 ) Signed-off-by: Bill Nell <bnell@redhat.com>	2025-05-20 22:37:08 -07:00
Michael Goin	f4a8a37465	[Minor] Rename quantization nvfp4 to modelopt_fp4 (#18356 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-05-20 09:08:37 -07:00
wang.yuqi	86847700d7	[CI] Add mteb testing to test the accuracy of the embedding model (#17175 )	2025-05-20 06:51:12 -07:00
Jee Jee Li	6b35cb10a0	[Misc] Add LoRA code owner (#18387 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-05-20 03:27:30 -07:00
Nan Qin	9609327fa4	[Core] [Bugfix]: tensor parallel with prompt embeds (#18171 ) Signed-off-by: Nan2018 <nan@protopia.ai> Co-authored-by: Andrew Sansom <andrew@protopia.ai>	2025-05-19 20:21:27 -07:00
Isotr0py	f07a673eb2	[Misc] Allow `AutoWeightsLoader` to skip loading weights with specific substr in name (#18358 ) Signed-off-by: Isotr0py <2037008807@qq.com>	2025-05-19 20:20:12 -07:00
Satyajith Chilappagari	dc1440cf9f	Neuron up mistral (#18222 ) Signed-off-by: Satyajith Chilappagari <satchill@amazon.com>	2025-05-19 09:54:47 -07:00
Wenhua Cheng	e2ee1e8e9e	[Feature]Add support for models quantized with AutoRound (#17850 ) Signed-off-by: wenhuach21 <wenhua.cheng@intel.com>	2025-05-19 09:38:53 -07:00
Jee Jee Li	6781af5608	[Quantization] Pool model support bitsandbytes (#18087 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-05-19 09:03:43 -07:00
Nan Qin	221cfc2fea	Feature/vllm/input embedding completion api (#17590 ) Signed-off-by: Andrew Sansom <andrew@protopia.ai> Signed-off-by: Nan2018 <nan@protopia.ai> Co-authored-by: 临景 <linjing.yx@alibaba-inc.com> Co-authored-by: Bryce1010 <bryceyx@gmail.com> Co-authored-by: Andrew Sansom <andrew@protopia.ai> Co-authored-by: Andrew Sansom <qthequartermasterman@gmail.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>	2025-05-18 20:18:05 -07:00
wwl2755	9da1095daf	[Spec Decode][V0] Fix spec decode correctness test in V0 eagle/medusa (#18175 ) Signed-off-by: wwl2755 <wangwenlong2755@gmail.com>	2025-05-18 19:49:46 -07:00
cascade	9ab2c02ff8	Support sequence parallelism combined with pipeline parallelism (#18243 ) Signed-off-by: cascade812 <cascade812@outlook.com>	2025-05-17 22:47:25 +00:00
Jinzhen Lin	e73b7dfd69	[Bugfix] fix `an illegal memory access was encountered` of marlin kernel + act_order (#18245 )	2025-05-16 16:02:44 -07:00
Bowen Wang	7fdfa01530	[Sampler] Adapt to FlashInfer 0.2.3 sampler API (#15777 ) Signed-off-by: Bowen Wang <abmfy@icloud.com> Co-authored-by: mgoin <mgoin64@gmail.com>	2025-05-16 15:14:03 -07:00
Isotr0py	390ec88905	[Misc] Consolidate Audio tests into multimodal common generation tests (#18214 ) Signed-off-by: Isotr0py <2037008807@qq.com>	2025-05-16 09:18:08 +00:00
Seiji Eicher	541817670c	[Misc] Add Ray Prometheus logger to V1 (#17925 ) Signed-off-by: Seiji Eicher <seiji@anyscale.com>	2025-05-16 01:02:42 -07:00
Lucia Fang	3d2779c29a	[Feature] Support Pipeline Parallism in torchrun SPMD offline inference for V1 (#17827 ) Signed-off-by: Lucia Fang <fanglu@fb.com>	2025-05-15 22:28:27 -07:00
Will Eaton	6b31c84aff	Throw better error for when running into k8s service discovery issue (#18209 ) Signed-off-by: Will Eaton <weaton@redhat.com>	2025-05-15 21:07:28 -07:00
Harry Mellor	b18201fe06	Allow users to pass arbitrary JSON keys from CLI (#18208 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-05-15 21:05:34 -07:00
Lucas Wilkinson	4e1c6a0264	[Bugfix] fix rotary embedding test for _get_padded_tensor_shape (#18229 ) Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>	2025-05-16 01:32:45 +00:00
Lucia Fang	8795eb9975	[Bugfix] Fix test_eagle test (#18223 ) Signed-off-by: Lucia Fang <fanglu@fb.com>	2025-05-15 15:59:42 -07:00
Alexei-V-Ivanov-AMD	566ec04c3d	Adding "Basic Models Test" and "Multi-Modal Models Test (Extended) 3" in AMD Pipeline (#18106 ) Signed-off-by: Alexei V. Ivanov <alexei.ivanov@amd.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>	2025-05-15 08:49:23 -07:00
hustxiayang	451da4bcbd	add tools into TokenizeChatRequest (#18187 ) Signed-off-by: yangxia <yangxiast@gmail.com>	2025-05-15 04:01:49 -07:00
omahs	a9944aabfa	fix: typos (#18151 ) Signed-off-by: omahs <73983677+omahs@users.noreply.github.com>	2025-05-15 02:16:15 -07:00
Russell Bryant	a8f5aec20a	[V1] Update zmq socket creation in nixl connector (#18148 ) Signed-off-by: Russell Bryant <rbryant@redhat.com>	2025-05-14 23:17:57 -07:00
David Xia	de71fec81b	[CI] don't skip fixed `test_kv_cache_events()` (#18183 ) Signed-off-by: David Xia <david@davidxia.com>	2025-05-14 23:17:16 -07:00
Ning Xie	420caf7557	[UT] Add ut for none hash (#17892 ) Signed-off-by: Andy Xie <andy.xning@gmail.com>	2025-05-15 13:28:11 +08:00
Chenheli Hua	4f07a64075	Support custom implementations of VideoLoader backends. (#18091 )	2025-05-15 13:26:49 +08:00
Thomas Parnell	e6b8e65d2d	[Bugfix] Fix fp8 tests for triton_unified_attention for Triton 3.3 (#18013 ) Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com> Co-authored-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>	2025-05-15 13:26:34 +08:00
Mark McLoughlin	65334ef3b9	[V1][Metrics] Remove unused code (#18158 ) Signed-off-by: Mark McLoughlin <markmc@redhat.com>	2025-05-14 20:13:17 -07:00
Chen Zhang	e60f550b38	[v1] Support multiple KV cache groups in GPU model runner (#17945 ) Signed-off-by: Chen Zhang <zhangch99@outlook.com>	2025-05-14 18:54:54 -07:00
Michael Goin	2142035b51	[V1] Support multiple kv connectors (#17564 ) Signed-off-by: mgoin <mgoin64@gmail.com> Signed-off-by: Nick Hill <nhill@redhat.com> Co-authored-by: Nick Hill <nhill@redhat.com>	2025-05-14 16:28:02 -07:00
Russell Bryant	78aa341d12	[CI] Fix race condition in test_kv_cache_events test (#18169 ) Signed-off-by: Russell Bryant <rbryant@redhat.com>	2025-05-14 16:27:48 -07:00
Jerry Zhang	7974736740	Add support for loading torchao models with `AOPerModuleConfig` (#17826 ) Signed-off-by: Jerry Zhang <jerryzh168@gmail.com>	2025-05-14 16:24:59 -07:00
Aaron Pham	2fc9075b82	[V1] Structured Outputs + Thinking compatibility (#16577 ) Signed-off-by: Aaron Pham <contact@aarnphm.xyz> Co-authored-by: Russell Bryant <rbryant@redhat.com>	2025-05-14 15:45:24 -07:00
Lucas Wilkinson	d93c976a0d	[Kernel] Have rotary embeddings support tensors (#18046 ) Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>	2025-05-14 15:43:55 -07:00
Robert Shaw	856865008e	[CI] Disable Failing Tests (#18165 )	2025-05-14 13:49:56 -07:00
bnellnm	f9c069c85e	Modularize fused experts and integrate PPLX kernels (#15956 )	2025-05-14 13:11:54 -07:00
Nick Hill	59dd311cf5	[KVConnector] Keep KVTransferParams as a dict (#18033 )	2025-05-14 08:05:57 -07:00
Cyrus Leung	d066e52013	[Bugfix] Fix chat utils tests (#18139 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-05-14 05:38:21 -07:00
Cyrus Leung	d62a076e84	[Model] GritLM supports other attention backends (#18109 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-05-14 03:33:19 -07:00
Jee Jee Li	259127f8b8	[Bugfix] Fix LoRA test (#18123 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-05-14 10:25:47 +00:00
TJian	612c2edb4f	[FEAT] [ROCm]: Add AITER CK 2 Stages MoE support (#17110 ) Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com> Co-authored-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>	2025-05-14 03:03:11 -07:00
rongfu.leng	82e7f9bb03	[Misc] replace does not exist model (#18119 ) Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io>	2025-05-14 02:13:47 -07:00
Cyrus Leung	8f5dc41481	[Bugfix] Fix entrypoints audio test failure (#18111 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-05-14 09:08:07 +00:00
wang.yuqi	63ad622233	[New Model]: support GTE NewModel (#17986 )	2025-05-14 01:31:31 -07:00
lkchen	6685890d11	[Fix] Move "model_config" as keyword args in chat_utils.py (#18098 ) Signed-off-by: Linkun <github@lkchen.net>	2025-05-13 23:27:26 -07:00
Charlie Fu	7b2f28deba	[AMD][torch.compile] Enable silu+fp8_quant fusion for rocm (#18082 ) Signed-off-by: charlifu <charlifu@amd.com>	2025-05-13 22:13:56 -07:00
vllmellm	2d912fb66f	[FEAT] [ROCm] [V1]: Add AITER biased group topk for DeepSeekV3 (#17955 ) Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com> Co-authored-by: tjtanaa <tunjian.tan@embeddedllm.com>	2025-05-13 22:03:47 -07:00
Chen Zhang	f2ae883b67	[v1][KVCacheManager] pass num_new_computed_tokens to kv cache manager (#18001 ) Signed-off-by: Chen Zhang <zhangch99@outlook.com>	2025-05-13 19:09:39 -07:00
vllmellm	40de1ef455	[FEAT] [ROCm]: Add AITER Block-Scaled GEMM Feature (#14968 ) Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com> Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com> Co-authored-by: tjtanaa <tunjian.tan@embeddedllm.com>	2025-05-13 19:08:20 -07:00
Nick Hill	55aa7af994	[V1] DP scale-out (2/N): Decouple engine process management and comms (#15977 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-05-13 10:48:21 -07:00
Aaron Pham	cb528d0585	[Fix] check to make sure processor has chat templates (#18047 ) Signed-off-by: Aaron Pham <contact@aarnphm.xyz>	2025-05-13 03:04:10 -07:00
Michael Goin	ea6ae8cb45	[Bugfix] Fix marlin moe fallback logic for llama4 (#18042 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-05-13 07:53:28 +00:00
Chen Zhang	f0d610a8ae	[v1][KVCacheManager] Avoid full cache hit by controlling max_length (#17999 ) Signed-off-by: Chen Zhang <zhangch99@outlook.com> Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-05-13 06:50:38 +00:00
Chauncey	dc1a821768	[Feature][V1] Support `tool_choice: required` when using Xgrammar as the `StructuredOutputBackend`. (#17845 ) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>	2025-05-12 23:01:31 -07:00
hissu-hyvarinen	f6518b2b48	[ROCm] Skip tests for quantizations incompatible with ROCm (#17905 ) Signed-off-by: Hissu Hyvarinen <hissu.hyvarinen@amd.com>	2025-05-12 18:39:28 -06:00
bwshen-mi	acee8f48aa	[Model] Support MiMo-7B inference with MTP (#17433 ) Signed-off-by: wp-alpha <wangpeng66@xiaomi.com> Co-authored-by: wangpeng66 <wangpeng66@xiaomi.com>	2025-05-12 23:25:33 +00:00
wwl2755	dc9905368d	[V1][Spec Decode] Eagle unit tests (#17350 ) Signed-off-by: wwl2755 <wangwenlong2755@gmail.com>	2025-05-12 23:01:17 +00:00
Russell Bryant	ebab1ac37c	[CI] Make JSON output tests less likely to fail (#17859 ) Signed-off-by: Russell Bryant <rbryant@redhat.com>	2025-05-12 22:31:54 +00:00
Jonathan Berkhahn	98ea35601c	[Lora][Frontend]Add default local directory LoRA resolver plugin. (#16855 ) Signed-off-by: jberkhahn <jaberkha@us.ibm.com>	2025-05-12 10:39:10 -07:00
Robert Shaw	d19110204c	[P/D] NIXL Integration (#17751 ) Signed-off-by: ApostaC <yihua98@uchicago.edu> Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com> Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> Signed-off-by: Robert Shaw <rshaw@neuralmagic.com> Signed-off-by: mgoin <mgoin64@gmail.com> Signed-off-by: Nick Hill <nhill@redhat.com> Signed-off-by: Brent Salisbury <bsalisbu@redhat.com> Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com> Co-authored-by: ApostaC <yihua98@uchicago.edu> Co-authored-by: Robert Shaw <rshaw@neuralmagic.com> Co-authored-by: mgoin <mgoin64@gmail.com> Co-authored-by: Nick Hill <nhill@redhat.com> Co-authored-by: Tyler Michael Smith <tysmith@redhat.com> Co-authored-by: Brent Salisbury <bsalisbu@redhat.com>	2025-05-12 09:46:16 -07:00
Maximilien de Bayser	05a4324f8e	Initialize the delta tool call fields explicitly (#17340 ) Signed-off-by: Max de Bayser <mbayser@br.ibm.com> Co-authored-by: igmainc <igmainc@icloud.com>	2025-05-12 13:28:58 +00:00
Cheng Kuan Yong Jason	08bf784078	[Bugfix] validate grammar and throw 400 error instead of crashing the engine when xgrammar validation fails (#17623 ) Signed-off-by: Jason Cheng <jasoncky96@gmail.com> Co-authored-by: Russell Bryant <rbryant@redhat.com>	2025-05-12 09:06:10 +08:00
Isotr0py	021c16c7ca	[Model] Broadcast Ovis2 implementation to fit Ovis1.6 (#17861 ) Signed-off-by: Isotr0py <2037008807@qq.com>	2025-05-11 17:56:30 -07:00
TJian	a810b5b088	[BugFix] [ROCm]: Bugfix and handle addition case of input for `rocm_aiter_rms_norm` (#17857 ) Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>	2025-05-11 04:17:11 -07:00
wang.yuqi	e4b8713380	[New Model]: nomic-embed-text-v2-moe (#17785 )	2025-05-11 00:59:43 -07:00
Dipika Sikka	cd3edfc908	[Misc] Add compressed-tensors NVFP4A16 emulation support (#17914 ) Signed-off-by: Dipika Sikka <dipikasikka1@gmail.com> Signed-off-by: Dipika <dipikasikka1@gmail.com>	2025-05-11 15:58:38 +08:00
Frieda Huang	9cea90eab4	[Frontend] Add /classify endpoint (#17032 ) Signed-off-by: Frieda (Jingying) Huang <jingyingfhuang@gmail.com>	2025-05-11 07:57:07 +00:00
Ben Browning	8132365b74	[Bugfix]: v1 engine - consider lora adapters in allowed_token_ids (#17855 ) Signed-off-by: Ben Browning <bbrownin@redhat.com>	2025-05-11 00:53:58 -07:00
Jinzhen Lin	d74e5f37bc	[Kernel] fp4 marlin kernel (#17687 ) Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com>	2025-05-10 19:58:49 -07:00
Chen Zhang	ca66a1674c	[v1] Rename specialized_manager.py to single_type_kv_cache_manager.py (#17946 ) Signed-off-by: Chen Zhang <zhangch99@outlook.com>	2025-05-10 16:14:12 -07:00
Chen Zhang	950751a987	[v1] Pass BlockTable and KVCacheSpec to AttentionMetadataBuilders (#17483 ) Signed-off-by: Chen Zhang <zhangch99@outlook.com>	2025-05-10 16:12:04 -07:00
Ximo Guanter	fc4441a4ee	Add missing content type headers to /ping and /health (#17036 ) (#17786 ) Signed-off-by: Ximo Guanter <ximo.guanter@gmail.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-05-10 07:13:32 +01:00
tracelogfb	246e3e0a36	fix broken test vllm:test_kernels - test_attention_selector.py::test_flash_attn (#17873 ) Co-authored-by: Stephen Chen <tracelog@meta.com>	2025-05-10 10:46:54 +08:00
Pavani Majety	0c0fdae84f	[Hardware/NVIDIA/Kernel] Enable nvidia/DeepSeek-R1-FP4 Model (#16362 )	2025-05-09 16:24:41 -07:00
Harry Mellor	4b2ed7926a	Improve configs - the rest! (#17562 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-05-09 15:18:44 -07:00
Cyrus Leung	6e5595ca39	[CI/Build] Automatically retry flaky tests (#17856 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-05-09 09:55:17 -06:00
Chen Zhang	200da9a517	[v1] Move block management logic from KVCacheManager to SpecializedManager (#17474 ) Signed-off-by: Chen Zhang <zhangch99@outlook.com>	2025-05-09 15:25:34 +00:00
Harry Mellor	c6798baa9c	Change `top_k` to be disabled with `0` (still accept `-1` for now) (#17773 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-05-09 10:01:49 +00:00
Ning Xie	d310e6de98	[BUGFIX]: return fast when request requires prompt logprobs (#17251 )	2025-05-08 21:25:41 -07:00
vllmellm	3c9396a64f	[FEAT][ROCm]: Support AITER MLA on V1 Engine (#17523 ) Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com> Co-authored-by: qli88 <qiang.li2@amd.com> Co-authored-by: Hongxia Yang <62075498+hongxiayang@users.noreply.github.com>	2025-05-09 10:42:05 +08:00
Shu Wang	376786fac1	Add cutlass support for blackwell fp8 blockwise gemm (#14383 ) Signed-off-by: Shu Wang <shuw@nvidia.com>	2025-05-08 15:09:55 -07:00
Russell Bryant	ec54d73c31	[CI] Fix test_collective_rpc (#17858 ) Signed-off-by: Russell Bryant <rbryant@redhat.com>	2025-05-08 16:47:12 +00:00
fxmarty-amd	bb239a730f	[Bugfix] Fix quark fp8 format loading on AMD GPUs (#12612 ) Signed-off-by: Felix Marty <felmarty@amd.com> Signed-off-by: kewang2 <kewang2@amd.com> Co-authored-by: kewang2 <kewang2@amd.com>	2025-05-08 02:53:53 -07:00
Jevin Jiang	a463555dee	[TPU] Fix the test_sampler (#17820 )	2025-05-08 05:51:33 -04:00
Cyrus Leung	96722aa81d	[Frontend] Chat template fallbacks for multimodal models (#17805 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-05-07 23:05:54 -07:00
Hashem Hashemi	5a499e70d5	[Kernel][Hardware][AMD] Bf16 mfma opt for ROCm skinny GEMMs (#17071 ) Signed-off-by: Hashem Hashemi <hashem.hashemi@amd.com> Signed-off-by: charlifu <charlifu@amd.com> Co-authored-by: charlifu <charlifu@amd.com>	2025-05-07 22:34:49 -07:00
Russell Bryant	6930a41116	[V1] Add VLLM_ALLOW_INSECURE_SERIALIZATION env var (#17490 ) Signed-off-by: Russell Bryant <rbryant@redhat.com> Signed-off-by: Nick Hill <nhill@redhat.com> Co-authored-by: Nick Hill <nhill@redhat.com>	2025-05-08 13:34:02 +08:00
Chanh Nguyen	7ea2adb802	[Core] Support full cuda graph in v1 (#16072 ) Signed-off-by: Chanh Nguyen <cnguyen@linkedin.com> Co-authored-by: Chanh Nguyen <cnguyen@linkedin.com>	2025-05-07 22:30:15 -07:00
Wallas Henrique	d43f914d42	[Core][Feature] Input metadata dump on crash (#13407 ) Signed-off-by: Wallas Santos <wallashss@ibm.com>	2025-05-07 22:15:09 +00:00

... 2 3 4 5 6 ...

2233 Commits