README
Research on Appleās MLX framework, optimizations for Apple Silicon, and performance characteristics for local LLM inference.
Documents
| Document | Description |
|---|---|
| Quantization Bit-Width Performance | Why sub-4bit quantization is suboptimal for MLX and Apple Silicon |
Key Finding
MLX and Apple Silicon are optimized for 4-bit quantization. Sub-4bit quantization (2-bit, 3-bit) suffers significant performance degradation due to lookup table emulation overhead, making 4-bit the practical lower bound for optimal performance.