Every LMS looks the same in slides—and different in production
Course catalog, video player, quizzes, certificates, instructor dashboards, corporate tenant branding: the feature checklist is familiar. The architecture decisions that determine whether you survive year two are quieter. How do you track watch progress without writing to SQL every five seconds? Where do SCORM packages live? Can tenant A never see tenant B's enrollment even if a developer forgets a WHERE clause? This case study follows a platform similar to what we build at Toolliyo/GeminiLMS—a modular monolith that grew selective services, not a day-one galaxy of microservices.
Bounded contexts we settled on
- Catalog — Courses, modules, lessons, prerequisites; read-heavy, versioned content.
- Enrollment — Who bought or was assigned what; ties to payments and HRIS imports.
- Learning activity — Progress, quiz attempts, time-on-task signals.
- Assessment — Question banks, grading rubrics, proctoring hooks optional.
- Credentials — PDF certificates, verifiable links, blockchain optional distraction.
- Admin & tenant — Branding, roles, feature flags, usage billing.
Early code organized as .NET projects per context inside one deployable API, with separate DbSchemas or table prefixes. Extraction to services happened only for media transcoding and bulk email—different scaling and failure domains.
Reference diagram (logical)
Browser and mobile apps call a BFF-style ASP.NET Core API. CDN serves HLS video segments from blob storage. Redis caches catalog and session data. SQL Server holds relational truth. Service Bus handles certificate generation and analytics events. Azure Functions transcode uploads. Key Vault holds secrets. Optional AI service generates quiz items from lesson transcripts with human approval queue.
Multi-tenancy without regret
We chose shared database, shared schema with TenantId on every row and global query filters in EF Core:
modelBuilder.Entity<Course>().HasQueryFilter(c => c.TenantId == _tenantProvider.TenantId);
Defense in depth: integration tests that attempt cross-tenant access must fail. For enterprise clients later, dedicated databases became a migration playbook—not day one complexity.
Video delivery
Upload to blob; queue transcode job; output MPEG-DASH or HLS to CDN. Never proxy video bytes through Kestrel in production. Signed URLs with short TTL. Player reports progress every 30–60 seconds batching to API, not per second. For live classes, integrate WebRTC vendor or Teams SDK rather than building SFU yourself unless that is your product.
Progress and completion rules
Business rules vary: 80% watch time plus quiz pass. Implement as a policy engine or explicit state machine, not scattered if-statements in controllers. Emit LessonCompleted events so certificates and gamification react consistently.
Quizzes and academic integrity
Question pools randomized server-side. Timer enforced server-side. AI-generated distractors reviewed in staging table before publish—learners should not be guinea pigs for unreviewed LLM output.
Search and discovery
PostgreSQL full-text or Azure Cognitive Search for large catalogs. Sync via outbox when course published. SEO public pages may live on Next.js as covered in our frontend architecture content; API remains authoritative.
Performance numbers that guided design
Launch day: 8k concurrent learners, 40k catalog reads/minute cached, 2k progress writes/minute batched. Without Redis and CDN, SQL would have saturated. Without query filters, someone would have leaked a tenant. Load tests drove Redis TTL jitter and read replicas for reporting—not premature sharding.
Security and compliance
Role matrix: SuperAdmin, TenantAdmin, Instructor, Learner, Support-readonly. OWASP fixes on every API. Audit log for grade changes and manual completions. GDPR export deletes PII in activity tables with retention policies. COPPA/FERPA awareness for K-12 tenants affects analytics—we disabled third-party trackers for those tenants.
What failed in v1 (honest)
- Building custom SCORM player—integrated vendor instead.
- Realtime websocket for every page—replaced with polling and SSE for notifications only.
- Monolithic webpack admin bundle—split and lazy-loaded.
- Trying to rank learners globally without tenant scope—product bug, architecture fix via filters.
Roadmap after product-market fit
Extract media pipeline fully. Add read models for analytics. Consider event sourcing for compliance-heavy enterprises. Keep core enrollment and learning paths in a well-tested monolith until team and traffic force a seam.
Building an online learning platform rewards clear context boundaries, tenant discipline, and CDN-first media more than fashionable distribution. Start modular inside one deployable system, measure real enrollment spikes, then split what actually needs independent scale—the architecture lessons above are cheaper learned from this case study than from your first outage.