GLM-5.2 can run locally with Unsloth Dynamic GGUFs
GLM-5.2, Z.ai’s new open model, is available for local deployment using Unsloth Dynamic GGUFs. The model is built for long-horizon coding, reasoning, and agentic tasks, with 744B parameters, 40B active parameters, and a 1M context window. Unsloth says it performs on par with Claude 4.8 Opus, GPT-5.5, and Gemini 3.1 Pro across Artificial Analysis and other benchmarks.
Dynamic quantization is central to making the model runnable on local systems. Dynamic 1-bit reaches ~76.2% top-1 accuracy while being 86% smaller, while Dynamic 2-bit reaches ~82% accuracy while being 84% smaller. The 2-bit dynamic quant UD-IQ2_M uses 239GB of disk space and can fit on a 256GB unified memory Mac, or run with a 1x24GB GPU and 256GB of RAM using MoE offloading.
GLM-5.2 supports non-thinking mode and reasoning modes labeled High and Max, with Max Thinking recommended for complicated tasks. Unsloth Studio provides a web UI for local AI on MacOS, Windows, and Linux, while llama.cpp support enables local inference with downloadable GGUF variants and KV cache quantization for longer context use.