CVE-2026-44223
Published: May 12, 2026
Modified: May 15, 2026
CVSS v3.1
6.5
Description
vLLM is an inference and serving engine for large language models (LLMs). From to before 0.20.0, the extract_hidden_states speculative decoding proposer in vLLM returns a tensor with an incorrect shape after the first decode step, causing a RuntimeError that crashes the EngineCore process. The crash is triggered when any request in the batch uses sampling penalty parameters (repetition_penalty, frequency_penalty, or presence_penalty). A single request with a penalty parameter (e.g., "repetition_penalty": 1.1) is sufficient to crash the server. This vulnerability is fixed in 0.20.0.
| Vendor | Product | Versions |
|---|---|---|
vllm-project | vllm | affected >= 0.18.0, < 0.20.0 |
CVSS v3.1 Details
CVSS v3.1 Vector
CVSS:3.1/AV:N/AC:L/PR:L/UI:N/S:U/C:N/I:N/A:H
Attack Vector
Attack Complexity
Privileges Required
User Interaction
Scope
Confidentiality
Integrity
Availability
References
Security Training
Train your team to recognize and prevent security threats with our comprehensive security awareness program.
Start TrainingVulnerability Scanning
Discover vulnerabilities in your applications and infrastructure before attackers do.
Scan Now