README

Purpose

Research into running Large Language Models (LLMs) directly in web browsers using WebGPU technology, including browser compatibility, GPU driver configuration, and practical deployment on AMD hardware with performance optimization strategies.

webgpu-llm-research.md - Complete investigation and troubleshooting guide for running WebLLM with WebGPU on Linux with AMD GPUs, including benchmark results and model recommendations

Key Findings

WebGPU is Production Ready: Available in Chrome/Edge (v113+), Safari (v26+), and Firefox (Windows only, v141+)
Critical Vulkan Backend: Performance improvement of 117-200x when using Vulkan backend vs ANGLE/OpenGL fallback (from 1.6 tok/s to 320.8 tok/s prefill on AMD RX 7800 XT)
GPU Driver Setup: Linux AMD users require Mesa Vulkan drivers and explicit Chrome flags to enable WebGPU hardware acceleration
Qwen3 Superior to Qwen2.5: Updated model recommendations favor newer Qwen3 family over Qwen2.5, offering better training and reasoning capabilities at same speed
Practical Browser LLM: Successfully tested Llama-3.2-1B with full WebGPU acceleration, achieving 47-320 tok/s on discrete AMD RX 7800 XT GPU

Quick Start

Enable WebGPU on Linux (AMD GPU)

Install Vulkan drivers:

sudo apt install -y mesa-vulkan-drivers vulkan-tools

Enable in Chrome:
- Visit chrome://flags
- Enable: #enable-unsafe-webgpu
- Enable: #enable-vulkan
- Enable: #default-angle-vulkan
- Restart Chrome
Verify setup:
- Visit https://webgpureport.org/
- Open chrome://gpu to confirm WebGPU is “Hardware accelerated”
Test WebLLM:
- Visit https://chat.webllm.ai
- Start with Qwen3-4B (recommended) or Qwen3-1.7B
- Monitor console (F12) for adapter info

Check Active GPU in Browser

Open DevTools (F12) and run:

navigator.gpu.requestAdapter().then(adapter => {
  console.log('GPU:', adapter.name);
  adapter.requestDevice().then(device => {
    console.log('Device:', device);
  });
});

Model Recommendations (2025)

For WebLLM Browser Deployment

Top Choice (Newest, Best Quality):

🥇 Qwen3-4B - From #3 ranked Arena family, excellent balance
🥈 Qwen3-8B - Higher quality for 16GB+ VRAM systems
🥉 Qwen3-1.7B - Lightweight alternative, still very capable

Coding Specialist:

Qwen2.5-Coder-7B - Best for code generation tasks

Efficient/Legacy:

Llama 3.1-8B - Solid general purpose
Gemma 3-4B - Lightweight option

Performance Expectations (AMD RX 7800 XT)

Qwen3-0.6B: ~80-100 tok/s
Qwen3-1.7B: ~70-80 tok/s
Qwen3-4B: ~60-70 tok/s
Qwen3-8B: ~40-50 tok/s

Troubleshooting

WebGPU Still Shows Disabled?

Check chrome://gpu - WebGPU should show “Hardware accelerated”
Try disabling extensions temporarily (they can block WebGPU)
Update GPU drivers to latest Mesa version

Slow Performance?

Ensure Vulkan is enabled in Chrome flags
Check DevTools for which GPU adapter is being used
ANGLE/OpenGL backend will be 100-200x slower than Vulkan

No GPU Detected?

Verify GPU drivers: vulkaninfo --summary
Check AMD drivers are installed: lspci | grep VGA
Ensure browser is using discrete GPU (not integrated)

transformers.js - Alternative Hugging Face browser ML library
onnx-runtime-web - Microsoft web inference runtime
tensorflow-js - Google browser ML framework
llm-browser-deployment - Broader LLM browser strategies

Sources

Last updated: November 24, 2025