Running OpenAI’s GPT-OSS-20B Locally with Open WebUI (Full Setup Guide)

Running large language models (LLMs) locally is one of the best ways to experiment, benchmark, and understand how different models perform.

💻 In this video, you’ll learn:
• How to create a Docker container with Llama.cpp and Open WebUI
• How to expose Open WebUI to macOS via an NVIDIA Sync SSH tunnel
• How to download and run GPT-OSS-20B on your own hardware

I’m running this on an NVIDIA DGX Spark, a desktop-class Blackwell system built for AI engineers — but you can follow along on any machine--even a Windows or macOS laptop with a desktop-class GPU.

If you’re interested in local AI development, NVIDIA DGX systems, or open-source LLMs, this video will walk you through everything you need to get started.

Here's a link to the NVIDA Sync custom app start script for Open WebUI used in the walkthrough:

Running OpenAI’s GPT-OSS-20B Locally with Open WebUI (Full Setup Guide)

NVIDIA DGX Spark is the data-center AI lab that fits on your desk, but is it right for you?

OpenAI Atlas browser has launched and could change everything. Is it the real deal?

Building Scalable Generative AI Teams: A Framework for success

From Text to SQL: Fine-Tuning Phi-4-Mini-Instruct with LoRA and PyTorch (Deep Dive)

NVIDIA DGX Spark is the data-center AI lab that fits on your desk, but is it right for you?