README
Purpose
Research into running Large Language Models (LLMs) directly in web browsers using WebGPU technology, including browser compatibility, GPU driver configuration, and practical deployment on AMD hardware with performance optimization strategies.
Contents
- webgpu-llm-research.md - Complete investigation and troubleshooting guide for running WebLLM with WebGPU on Linux with AMD GPUs, including benchmark results and model recommendations
Key Findings
- WebGPU is Production Ready: Available in Chrome/Edge (v113+), Safari (v26+), and Firefox (Windows only, v141+)
- Critical Vulkan Backend: Performance improvement of 117-200x when using Vulkan backend vs ANGLE/OpenGL fallback (from 1.6 tok/s to 320.8 tok/s prefill on AMD RX 7800 XT)
- GPU Driver Setup: Linux AMD users require Mesa Vulkan drivers and explicit Chrome flags to enable WebGPU hardware acceleration
- Qwen3 Superior to Qwen2.5: Updated model recommendations favor newer Qwen3 family over Qwen2.5, offering better training and reasoning capabilities at same speed
- Practical Browser LLM: Successfully tested Llama-3.2-1B with full WebGPU acceleration, achieving 47-320 tok/s on discrete AMD RX 7800 XT GPU
Quick Start
Enable WebGPU on Linux (AMD GPU)
-
Install Vulkan drivers:
Terminal window sudo apt install -y mesa-vulkan-drivers vulkan-tools -
Enable in Chrome:
- Visit
chrome://flags - Enable:
#enable-unsafe-webgpu - Enable:
#enable-vulkan - Enable:
#default-angle-vulkan - Restart Chrome
- Visit
-
Verify setup:
- Visit https://webgpureport.org/
- Open
chrome://gputo confirm WebGPU is “Hardware accelerated”
-
Test WebLLM:
- Visit https://chat.webllm.ai
- Start with Qwen3-4B (recommended) or Qwen3-1.7B
- Monitor console (F12) for adapter info
Check Active GPU in Browser
Open DevTools (F12) and run:
navigator.gpu.requestAdapter().then(adapter => { console.log('GPU:', adapter.name); adapter.requestDevice().then(device => { console.log('Device:', device); });});Model Recommendations (2025)
For WebLLM Browser Deployment
Top Choice (Newest, Best Quality):
- 🥇 Qwen3-4B - From #3 ranked Arena family, excellent balance
- 🥈 Qwen3-8B - Higher quality for 16GB+ VRAM systems
- 🥉 Qwen3-1.7B - Lightweight alternative, still very capable
Coding Specialist:
- Qwen2.5-Coder-7B - Best for code generation tasks
Efficient/Legacy:
- Llama 3.1-8B - Solid general purpose
- Gemma 3-4B - Lightweight option
Performance Expectations (AMD RX 7800 XT)
- Qwen3-0.6B: ~80-100 tok/s
- Qwen3-1.7B: ~70-80 tok/s
- Qwen3-4B: ~60-70 tok/s
- Qwen3-8B: ~40-50 tok/s
Troubleshooting
WebGPU Still Shows Disabled?
- Check
chrome://gpu- WebGPU should show “Hardware accelerated” - Try disabling extensions temporarily (they can block WebGPU)
- Update GPU drivers to latest Mesa version
Slow Performance?
- Ensure Vulkan is enabled in Chrome flags
- Check DevTools for which GPU adapter is being used
- ANGLE/OpenGL backend will be 100-200x slower than Vulkan
No GPU Detected?
- Verify GPU drivers:
vulkaninfo --summary - Check AMD drivers are installed:
lspci | grep VGA - Ensure browser is using discrete GPU (not integrated)
Related Research Topics
- transformers.js - Alternative Hugging Face browser ML library
- onnx-runtime-web - Microsoft web inference runtime
- tensorflow-js - Google browser ML framework
- llm-browser-deployment - Broader LLM browser strategies
Sources
- WebGPU Specification
- WebLLM Documentation
- WebLLM GitHub
- WebGPU Report
- LMSYS Chatbot Arena - Oct 2025 Rankings
- Browser Compatibility
Last updated: November 24, 2025