Purpose

Research into running Large Language Models (LLMs) directly in web browsers using WebGPU technology, including browser compatibility, GPU driver configuration, and practical deployment on AMD hardware with performance optimization strategies.

Contents

  • webgpu-llm-research.md - Complete investigation and troubleshooting guide for running WebLLM with WebGPU on Linux with AMD GPUs, including benchmark results and model recommendations

Key Findings

  • WebGPU is Production Ready: Available in Chrome/Edge (v113+), Safari (v26+), and Firefox (Windows only, v141+)
  • Critical Vulkan Backend: Performance improvement of 117-200x when using Vulkan backend vs ANGLE/OpenGL fallback (from 1.6 tok/s to 320.8 tok/s prefill on AMD RX 7800 XT)
  • GPU Driver Setup: Linux AMD users require Mesa Vulkan drivers and explicit Chrome flags to enable WebGPU hardware acceleration
  • Qwen3 Superior to Qwen2.5: Updated model recommendations favor newer Qwen3 family over Qwen2.5, offering better training and reasoning capabilities at same speed
  • Practical Browser LLM: Successfully tested Llama-3.2-1B with full WebGPU acceleration, achieving 47-320 tok/s on discrete AMD RX 7800 XT GPU

Quick Start

Enable WebGPU on Linux (AMD GPU)

  1. Install Vulkan drivers:

    Terminal window
    sudo apt install -y mesa-vulkan-drivers vulkan-tools
  2. Enable in Chrome:

  3. Verify setup:

  4. Test WebLLM:

    • Visit https://chat.webllm.ai
    • Start with Qwen3-4B (recommended) or Qwen3-1.7B
    • Monitor console (F12) for adapter info

Check Active GPU in Browser

Open DevTools (F12) and run:

navigator.gpu.requestAdapter().then(adapter => {
console.log('GPU:', adapter.name);
adapter.requestDevice().then(device => {
console.log('Device:', device);
});
});

Model Recommendations (2025)

For WebLLM Browser Deployment

Top Choice (Newest, Best Quality):

  • 🥇 Qwen3-4B - From ranked Arena family, excellent balance
  • 🥈 Qwen3-8B - Higher quality for 16GB+ VRAM systems
  • 🥉 Qwen3-1.7B - Lightweight alternative, still very capable

Coding Specialist:

  • Qwen2.5-Coder-7B - Best for code generation tasks

Efficient/Legacy:

  • Llama 3.1-8B - Solid general purpose
  • Gemma 3-4B - Lightweight option

Performance Expectations (AMD RX 7800 XT)

  • Qwen3-0.6B: ~80-100 tok/s
  • Qwen3-1.7B: ~70-80 tok/s
  • Qwen3-4B: ~60-70 tok/s
  • Qwen3-8B: ~40-50 tok/s

Troubleshooting

WebGPU Still Shows Disabled?

  • Check chrome://gpu - WebGPU should show “Hardware accelerated”
  • Try disabling extensions temporarily (they can block WebGPU)
  • Update GPU drivers to latest Mesa version

Slow Performance?

  • Ensure Vulkan is enabled in Chrome flags
  • Check DevTools for which GPU adapter is being used
  • ANGLE/OpenGL backend will be 100-200x slower than Vulkan

No GPU Detected?

  • Verify GPU drivers: vulkaninfo --summary
  • Check AMD drivers are installed: lspci | grep VGA
  • Ensure browser is using discrete GPU (not integrated)
  • transformers.js - Alternative Hugging Face browser ML library
  • onnx-runtime-web - Microsoft web inference runtime
  • tensorflow-js - Google browser ML framework
  • llm-browser-deployment - Broader LLM browser strategies

Sources


Last updated: November 24, 2025