README

Research on Apple’s MLX framework, optimizations for Apple Silicon, and performance characteristics for local LLM inference.

Documents

Document	Description
Quantization Bit-Width Performance	Why sub-4bit quantization is suboptimal for MLX and Apple Silicon

Key Finding

MLX and Apple Silicon are optimized for 4-bit quantization. Sub-4bit quantization (2-bit, 3-bit) suffers significant performance degradation due to lookup table emulation overhead, making 4-bit the practical lower bound for optimal performance.

Local LLM Inference
M3 Ultra Performance Benchmarks
GLM Models

README

Documents

Key Finding

Related Research