GLM-5.2 can run locally with Unsloth Dynamic GGUFs

26 June 2026, 07:17·1 min read

GLM-5.2, Z.ai’s new open model, is available for local deployment using Unsloth Dynamic GGUFs. The model is built for long-horizon coding, reasoning, and agentic tasks, with 744B parameters, 40B active parameters, and a 1M context window. Unsloth says it performs on par with Claude 4.8 Opus, GPT-5.5, and Gemini 3.1 Pro across Artificial Analysis and other benchmarks.

Dynamic quantization is central to making the model runnable on local systems. Dynamic 1-bit reaches ~76.2% top-1 accuracy while being 86% smaller, while Dynamic 2-bit reaches ~82% accuracy while being 84% smaller. The 2-bit dynamic quant UD-IQ2_M uses 239GB of disk space and can fit on a 256GB unified memory Mac, or run with a 1x24GB GPU and 256GB of RAM using MoE offloading.

GLM-5.2 supports non-thinking mode and reasoning modes labeled High and Max, with Max Thinking recommended for complicated tasks. Unsloth Studio provides a web UI for local AI on MacOS, Windows, and Linux, while llama.cpp support enables local inference with downloadable GGUF variants and KV cache quantization for longer context use.

Originally reported by unsloth.aiRead the source →

Related coverage

Chips

GLM-5.2 can run locally with Unsloth Dynamic GGUFs

OpenAI and Broadcom unveil Jalapeño inference chip

European companies diversify AI suppliers as US access curbs bite

OpenAI and Broadcom unveil Jalapeño AI chip

NVIDIA Rubin pushes AI systems beyond chip benchmarks