Why RAG beats fine-tuning for most .NET internal tools
Building RAG applications with ASP.NET Core and Azure OpenAI is the pragmatic path when your knowledge changes weekly—policy PDFs, API docs, support macros—not when you need a new personality. Fine-tuning a model on stale course material is expensive and hard to audit; retrieval-augmented generation keeps answers tied to chunks you can cite. We walked a Toolliyo cohort through this case study: an LMS platform support assistant that answers instructor questions about enrollment rules, refund windows, and SCORM upload limits without hallucinating features that do not exist.
This is not a toy chatbot. Requirements mirrored real LMS ops: 2,000 markdown and PDF pages, role-aware answers (admin vs instructor), and logging for compliance review in India and EU deployments.
Architecture overview
- Ingestion worker: .NET 8 console or Azure Function reading blob storage
- Chunking: 512–800 token segments with 100-token overlap, metadata (courseId, role, version)
- Embeddings: Azure OpenAI
text-embedding-3-smallor ada-002 - Vector store: Azure AI Search index with hybrid search (vector + BM25)
- API: ASP.NET Core minimal API with JWT auth, rate limits, prompt guardrails
- Observability: Application Insights + stored citations per response
Alternative: PostgreSQL with pgvector if you already run Postgres for the LMS—trade-off is ops familiarity vs managed search features.
Step 1 — Project setup and secrets
dotnet new web -n LmsSupportRag
dotnet add package Azure.AI.OpenAI
dotnet add package Azure.Search.Documents
dotnet add package Azure.Storage.Blobs
Store keys in Azure Key Vault; bind via DefaultAzureCredential in staging and production. Never commit endpoint URLs with keys to Git—even in spike branches. Local dev uses dotnet user-secrets.
Step 2 — Document ingestion pipeline
Parse and normalize
PDFs from legal and HR need text extraction (Azure Document Intelligence or open-source with manual QA). Markdown from engineering docs strips front matter and preserves heading hierarchy in metadata for better chunk boundaries.
Chunk with structure awareness
Split on H2/H3 before naive token windows. A refund policy section should not merge with unrelated grading rubric text. Store source_uri, doc_version, and last_modified on every chunk.
public record DocumentChunk(
string Id,
string Content,
string SourceUri,
string DocVersion,
IReadOnlyDictionary<string, string> Metadata);
Step 3 — Generate embeddings and index
Batch embed with retry and idempotency: chunk hash as document key prevents duplicate vectors on re-ingest. Azure AI Search schema example fields: content, contentVector, courseId, audienceRole, docVersion.
Run incremental index updates on blob upload events rather than full rebuilds nightly—full rebuilds do not scale past tens of thousands of chunks without maintenance windows.
Step 4 — Retrieval strategy that actually works
Pure vector search misses exact SKU codes and policy clause numbers. Hybrid search with semantic ranker (where available) improves hit rate on LMS platform queries like "SCORM 2004 package size limit." Retrieve top 8–12 chunks, then rerank with a lightweight cross-encoder or LLM-based rerank only if latency budget allows (p95 under 3s for support UI).
- Apply OData filters:
audienceRole eq 'instructor' - Discard chunks below similarity threshold; return "I don't know" path
- Log retrieved chunk IDs with every answer for audit
Step 5 — ASP.NET Core minimal API endpoint
app.MapPost("/api/support/ask", async (
AskRequest request,
IRagService rag,
CancellationToken ct) =>
{
var result = await rag.AnswerAsync(request.Question, request.UserRole, ct);
return result.Grounded
? Results.Ok(result)
: Results.Json(result, statusCode: 422);
});
RagService orchestrates: embed question → search → build prompt → chat completion with temperature 0.2. System prompt enforces: answer only from context, cite sources, refuse legal advice beyond documented policy.
Step 6 — Prompt template for grounded answers
System: You are LMS platform support assistant. Use ONLY provided context.
If context insufficient, say you cannot find policy and suggest contacting support.
Cite sources as [source_uri]. Do not invent features.
Context:
{{retrieved_chunks}}
User question: {{question}}
Grounded does not mean correct—verify chunk quality during ingest. Garbage PDF OCR produces confident wrong answers.
Production pitfalls from the case study
- Stale index: Users quoted outdated refund windows until webhook on doc publish triggered re-index
- Prompt injection in user docs: Malicious markdown "ignore previous instructions" in uploaded instructor notes—sanitize and separate system from user content
- Cost spikes: Long chat histories re-sent every turn; summarize session server-side
- PII in logs: Redact emails from questions before Application Insights
Testing RAG before launch
Build a golden set: 50 questions with expected citation IDs and acceptable paraphrases. Automate regression in CI with mocked Azure clients for unit tests and nightly integration against staging index. Measure faithfulness (answer supported by chunk) not BLEU scores—LLM evaluators help but human spot checks weekly.
AI perspective: limits of RAG for developers
RAG reduces hallucination; it does not eliminate reasoning errors or contradictory documents. When two chunks disagree on extension deadlines, the model may blend them. Classical search plus human-curated FAQs still wins for zero-error domains. Generative answers need disclaimers in UI for instructors making compliance decisions.
For .NET developers in India building for global SaaS, Azure OpenAI regional availability and data residency rules matter—design partition keys and index replicas per region early, not after first enterprise customer audit.
Extensions worth phase two
Multi-modal ingestion for screenshot-heavy guides, conversational memory with thread-scoped retrieval, and admin UI to flag bad answers feeding chunk review queue. Semantic Kernel can wrap the same services if you outgrow hand-rolled orchestration—keep interfaces thin so swap is painless.
What you should have running locally today
Clone a spike repo structure: ingest one folder of markdown, index to a dev Azure AI Search service, hit POST /ask from Swagger. Time end-to-end latency. Read retrieved chunks when answer feels wrong—that debug habit saves production incidents.
Building RAG applications with ASP.NET Core and Azure OpenAI is approachable when you treat retrieval quality and ingestion as the hard problems—not the chat wrapper. The LMS platform case study proves internal support tools can ship in weeks with disciplined chunking, hybrid search, and skeptical testing before instructors trust the bot.