vllm.utils ¶
Modules:
| Name | Description |
|---|---|
argparse_utils | Argument parsing utilities for vLLM. |
async_utils | Contains helpers related to asynchronous code. |
cache | |
collection_utils | Contains helpers that are applied to collections. |
counter | |
deep_gemm | Compatibility wrapper for DeepGEMM API changes. |
flashinfer | Compatibility wrapper for FlashInfer API changes. |
func_utils | Contains helpers that are applied to functions. |
gc_utils | |
hashing | |
import_utils | Contains helpers related to importing modules. |
jsontree | Helper functions to work with nested JSON structures. |
math_utils | Math utility functions for vLLM. |
mem_constants | |
mem_utils | |
nccl | |
network_utils | |
platform_utils | |
profiling | |
registry | |
serial_utils | |
system_utils | |
tensor_schema | |
torch_utils | |
_DEPRECATED_MAPPINGS module-attribute ¶
_DEPRECATED_MAPPINGS = {
"cprofile": "profiling",
"cprofile_context": "profiling",
"get_open_port": "network_utils",
}
__dir__ ¶
__getattr__ ¶
Module-level getattr to handle deprecated utilities.
Source code in vllm/utils/__init__.py
length_from_prompt_token_ids_or_embeds ¶
length_from_prompt_token_ids_or_embeds(
prompt_token_ids: list[int] | None,
prompt_embeds: Tensor | None,
) -> int
Calculate the request length (in number of tokens) give either prompt_token_ids or prompt_embeds.