[2025/05] 🔥 We explore the Scaling Law for Quantization-Aware Training, which offers insights and instruction for LLMs QAT. [2025/1] Support the learnable activation cliping for dynamic quantization.
[2024-11-12]: Support for sageattn_varlen is available now. For SageAttention V1 in Triton (slower than SageAttention V2/V2++/V3), refer to SageAttention-1 branch and install using pip: pip install ...