Sign in to track progress and bookmarks.
You don't need a REST API to run AI. With ONNX Runtime and Microsoft.Extensions.AI, you can run models directly inside your C# process.
ONNX (Open Neural Network Exchange) is a universal format for AI models. It allows you to take a model trained in Python/PyTorch and run it in a C# app with high performance. It uses specialized hardware accelerators like **CUDA** (Nvidia) or **DirectML** (Windows) to run incredibly fast.
This is the new "One-liner" library for running LLMs in C#.
using var model = new Model("phi-3-mini-onnx");
using var tokenizer = new Tokenizer(model);
var generator = new Generator(model, tokenizer);
// Generate text locally!
Q: "What are the hardware requirements for running a local LLM?"
Architect Answer: "The most important factor is **VRAM (Video RAM)** on the GPU. A 7B parameter model (quantized) needs about 5-6GB of RAM. If the model fits in VRAM, it runs instantly. If it overflows into system RAM, it becomes 10x slower. For a professional AI workstation, we recommend at least 16GB of VRAM (RTX 4080 or better) to run modern SLMs comfortably."
Quizzes linked to this course—pass to earn certificates.
On this page
1. What is ONNX? 2. Microsoft.ML.OnnxRuntime.GenAI 4. Interview Mastery