NVDA 199.00 ▼0.52%GOOGL 345.29 ▼0.24%MSFT 365.46 ▼2.27%AMD 519.74 ▼0.02%INTC 131.65 ▼0.48%TSMC 440.83 ▲1.02%AMZN 234.27 ▲0.07%META 557.67 ▼0.81%AAPL 293.08 ▼0.41%PLTR 113.50 ▼2.74%
Markets at last close

Models

GLM-5.2 can run locally with Unsloth Dynamic GGUFs

·1 min read

GLM-5.2, Z.ai’s new open model, is available for local deployment using Unsloth Dynamic GGUFs. The model is built for long-horizon coding, reasoning, and agentic tasks, with 744B parameters, 40B active parameters, and a 1M context window. Unsloth says it performs on par with Claude 4.8 Opus, GPT-5.5, and Gemini 3.1 Pro across Artificial Analysis and other benchmarks.

Dynamic quantization is central to making the model runnable on local systems. Dynamic 1-bit reaches ~76.2% top-1 accuracy while being 86% smaller, while Dynamic 2-bit reaches ~82% accuracy while being 84% smaller. The 2-bit dynamic quant UD-IQ2_M uses 239GB of disk space and can fit on a 256GB unified memory Mac, or run with a 1x24GB GPU and 256GB of RAM using MoE offloading.

GLM-5.2 supports non-thinking mode and reasoning modes labeled High and Max, with Max Thinking recommended for complicated tasks. Unsloth Studio provides a web UI for local AI on MacOS, Windows, and Linux, while llama.cpp support enables local inference with downloadable GGUF variants and KV cache quantization for longer context use.

Originally reported by unsloth.aiRead the source →
Related coverage