vllm/offline_inference at 55f1a468d97fbf9387e577e901b3f290ed8aa15b - vllm

Lucia Fang 3d2779c29a [Feature] Support Pipeline Parallism in torchrun SPMD offline inference for V1 (#17827 ) Signed-off-by: Lucia Fang <fanglu@fb.com>	2025-05-15 22:28:27 -07:00
..
basic	Fix and simplify `deprecated=True` CLI `kwarg` (#17781 )	2025-05-07 16:51:06 +01:00
disaggregated-prefill-v1	Improve examples rendering in docs and GitHub (#18203 )	2025-05-15 15:57:49 +00:00
openai_batch	Improve examples rendering in docs and GitHub (#18203 )	2025-05-15 15:57:49 +00:00
profiling_tpu	Update deprecated Python 3.8 typing (#13971 )	2025-03-02 17:34:51 -08:00
qwen2_5_omni	[Misc] Rename assets for testing (#17575 )	2025-05-02 03:29:25 -07:00
audio_language.py	[Model] Add Granite Speech Support (#16246 )	2025-04-28 10:05:00 +00:00
batch_llm_inference.py	[Ray] Improve documentation on batch inference (#16609 )	2025-04-16 22:19:26 -07:00
chat_with_tools.py	fix: typos (#18151 )	2025-05-15 02:16:15 -07:00
data_parallel.py	Modularize fused experts and integrate PPLX kernels (#15956 )	2025-05-14 13:11:54 -07:00
disaggregated_prefill.py	Construct `KVTransferConfig` properly from Python instead of using JSON blobs without CLI (#17994 )	2025-05-12 11:25:33 -07:00
eagle.py	[V1][Spec Decode] Share input embedding of target model with EAGLE draft model to free ~1GB for llama 3 model (#17326 )	2025-05-14 12:31:46 -07:00
embed_jina_embeddings_v3.py	[Misc] refactor argument parsing in examples (#16635 )	2025-04-15 08:05:30 +00:00
embed_matryoshka_fy.py	[Misc] refactor argument parsing in examples (#16635 )	2025-04-15 08:05:30 +00:00
encoder_decoder.py	[Misc] refactor argument parsing in examples (#16635 )	2025-04-15 08:05:30 +00:00
encoder_decoder_multimodal.py	[Bugfix] Update Florence-2 tokenizer to make grounding tasks work (#16734 )	2025-04-17 04:17:39 +00:00
llm_engine_example.py	[Misc] refactor examples series (#16708 )	2025-04-16 10:16:36 +00:00
load_sharded_state.py	[Bugfix][V1] Fix bug from putting llm_engine.model_executor in a background process (#15367 )	2025-04-03 07:32:10 +00:00
lora_with_quantization_inference.py	[Misc] Remove qlora_adapter_name_or_path (#17699 )	2025-05-06 23:10:37 -07:00
mistral-small.py	[VLM] Clean up models (#16873 )	2025-04-19 12:13:06 +00:00
mlpspeculator.py	[Misc] refactor argument parsing in examples (#16635 )	2025-04-15 08:05:30 +00:00
multilora_inference.py	[Misc] format and refactor some examples (#16252 )	2025-04-08 10:42:32 +00:00
neuron.py	[Misc] format and refactor some examples (#16252 )	2025-04-08 10:42:32 +00:00
neuron_eagle.py	Add NeuronxDistributedInference support, Speculative Decoding, Dynamic on-device sampling (#16357 )	2025-05-07 00:07:30 -07:00
neuron_int8_quantization.py	[Misc] format and refactor some examples (#16252 )	2025-04-08 10:42:32 +00:00
neuron_speculation.py	Add NeuronxDistributedInference support, Speculative Decoding, Dynamic on-device sampling (#16357 )	2025-05-07 00:07:30 -07:00
prefix_caching.py	[Misc] format and refactor some examples (#16252 )	2025-04-08 10:42:32 +00:00
prithvi_geospatial_mae.py	[Misc] refactor argument parsing in examples (#16635 )	2025-04-15 08:05:30 +00:00
profiling.py	Fix broken example: examples/offline_inference/profiling at scheduler_config (#18117 )	2025-05-13 23:19:14 -07:00
qwen_1m.py	Implements dual-chunk-flash-attn backend for dual chunk attention with sparse attention support (#11844 )	2025-05-12 19:52:47 -07:00
reproducibility.py	[Doc] Fix a typo in the file name (#17836 )	2025-05-08 18:04:18 +08:00
rlhf.py	[Misc] format and refactor some examples (#16252 )	2025-04-08 10:42:32 +00:00
rlhf_colocate.py	[RLHF] use worker_extension_cls for compatibility with V0 and V1 (#14185 )	2025-03-07 00:32:46 +08:00
rlhf_utils.py	[RLHF] use worker_extension_cls for compatibility with V0 and V1 (#14185 )	2025-03-07 00:32:46 +08:00
save_sharded_state.py	[Misc] refactor argument parsing in examples (#16635 )	2025-04-15 08:05:30 +00:00
simple_profiling.py	[Misc] refactor argument parsing in examples (#16635 )	2025-04-15 08:05:30 +00:00
structured_outputs.py	[Misc] refactor Structured Outputs example (#16322 )	2025-04-09 23:32:42 +00:00
torchrun_example.py	[Feature] Support Pipeline Parallism in torchrun SPMD offline inference for V1 (#17827 )	2025-05-15 22:28:27 -07:00
tpu.py	[TPU] Increase block size and reset block shapes (#16458 )	2025-05-06 13:55:04 -04:00
vision_language.py	[Model] Broadcast Ovis2 implementation to fit Ovis1.6 (#17861 )	2025-05-11 17:56:30 -07:00
vision_language_embedding.py	[Misc] refactor argument parsing in examples (#16635 )	2025-04-15 08:05:30 +00:00
vision_language_multi_image.py	[Model] Broadcast Ovis2 implementation to fit Ovis1.6 (#17861 )	2025-05-11 17:56:30 -07:00