Tutorials AI & LLM Engineering for .NET Architects

Running AI Locally with ONNX and LocalLLM

8 min read Updated 7/5/2026

On this page

Local AI in .NET

You don't need a REST API to run AI. With ONNX Runtime and Microsoft.Extensions.AI, you can run models directly inside your C# process.

1. What is ONNX?

ONNX (Open Neural Network Exchange) is a universal format for AI models. It allows you to take a model trained in Python/PyTorch and run it in a C# app with high performance. It uses specialized hardware accelerators like **CUDA** (Nvidia) or **DirectML** (Windows) to run incredibly fast.

2. Microsoft.ML.OnnxRuntime.GenAI

This is the new "One-liner" library for running LLMs in C#.

using var model = new Model("phi-3-mini-onnx");
using var tokenizer = new Tokenizer(model);
var generator = new Generator(model, tokenizer);
// Generate text locally!

4. Interview Mastery

Q: "What are the hardware requirements for running a local LLM?"

Architect Answer: "The most important factor is **VRAM (Video RAM)** on the GPU. A 7B parameter model (quantized) needs about 5-6GB of RAM. If the model fits in VRAM, it runs instantly. If it overflows into system RAM, it becomes 10x slower. For a professional AI workstation, we recommend at least 16GB of VRAM (RTX 4080 or better) to run modern SLMs comfortably."

Questions on this lesson 0

No questions yet — be the first to ask!