🦨 Alpha's Tech Garden

❯

❯

GPTQ

Nov 15, 20251 min read

links
papers
llm
gpt

GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers

https://arxiv.org/abs/2210.17323

GPTQ, a new one-shot weight quantization method based on approximate second-order information, that is both highly-accurate and highly-efficient. Specifically, GPTQ can quantize GPT models with 175 billion parameters in approximately four GPU hours, reducing the bitwidth down to 3 or 4 bits per weight, with negligible accuracy degradation relative to the uncompressed baseline.

Graph View

Backlinks

LLM Speed Performance

Created with Quartz v4.4.0 © 2025

GitHub