vllm.attention.layers.encoder_only_attention ¶
EncoderOnlyAttention ¶
Bases: Attention
Encoder attention is a special case that doesn't need a KV Cache.
Source code in vllm/attention/layers/encoder_only_attention.py
__init__ ¶
__init__(
num_heads: int,
head_size: int,
scale: float,
cache_config: CacheConfig | None = None,
attn_type: str | None = None,
**kwargs,
)
Source code in vllm/attention/layers/encoder_only_attention.py
get_kv_cache_spec ¶
get_kv_cache_spec(vllm_config: VllmConfig) -> KVCacheSpec
create_encoder_only_attention_backend cached ¶
create_encoder_only_attention_backend(
underlying_attn_backend: AttentionBackend,
) -> type[AttentionBackend]