IBM’s Granite 4.1 family shifts to dense decoder-only models and a staged training recipe aimed at stronger instruction following and tool use. The write-up is useful for builders tracking how enterprise LLMs are being optimized for efficiency without leaning on MoE scale.

Granite 4.1 LLMs utilize a dense, decoder-only architecture with models of 3B, 8B, and 30B parameters, trained on 15 trillion tokens and using a five-phase pre-training approach. The 8B model matches the performance of the previous 32B Mixture-of-Experts model through a multi-stage reinforcement learning pipeline focused on data quality. These models, designed for efficient, reliable enterprise use, demonstrate competitive instruction-following and tool performance while maintaining cost…