Running large language models (LLMs) locally is one of the best ways to experiment, benchmark, and understand how different models perform.

💻 In this video, you’ll learn:
• How to create a Docker container with Llama.cpp and Open WebUI
• How to expose Open WebUI to macOS via an NVIDIA Sync SSH tunnel
• How to download and run GPT-OSS-20B on your own hardware

I’m running this on an NVIDIA DGX Spark, a desktop-class Blackwell system built for AI engineers — but you can follow along on any machine--even a Windows or macOS laptop with a desktop-class GPU.

If you’re interested in local AI development, NVIDIA DGX systems, or open-source LLMs, this video will walk you through everything you need to get started.

Here's a link to the NVIDA Sync custom app start script for Open WebUI used in the walkthrough: