2026activeAIDataEducation

Career Advisor

AI-powered education and career guidance built on structured Mongolian education data.

Role: Founder / Builder
Year: 2026
Status: active
Stack: Next.js · TypeScript · Python +4

Problem

Students cannot get reliable, structured, and context-aware guidance from messy public education data.

Solution

A data-backed AI advisor that collects, structures, validates, and retrieves education and career information before answering — not another prompt wrapper.

Highlights

Web crawler pipeline for public education sources
Structured data extraction and normalization
Validation and quality scoring layer
Knowledge base with retrieval-augmented generation
AI-assisted data improvement loop
Mongolian language coverage as a first-class concern

Stack

Next.jsTypeScriptPythonPostgreSQLVector DBLLM APIsCrawler infra

Data pipeline

The whole system rests on one decision: data quality beats prompt cleverness. The pipeline crawls public Mongolian education and career sources, extracts structured fields with LLM-assisted parsing, runs them through a normalization layer that resolves duplicates, language variants, and category mismatches, and lands them in a versioned knowledge base.

Every record has a source, an extracted-at timestamp, a confidence score, and a validation status. Records that fail validation are routed back into a human-review queue with diff views — the AI improves its own data over time, but never silently.

Why this matters

Most "AI education" products skip this step. They take public web data, shove it through a prompt, and hope. The result is confident nonsense — exactly the wrong tool for a student making a real decision.

RAG answer flow

When a user asks a question, the system first classifies intent (career path, university, scholarship, exam, timeline), retrieves the most relevant records from the knowledge base, and constructs an answer with explicit citations. The model cannot answer questions that the data cannot support — it has to either return grounded content or refuse with a clear reason.

Guardrails in practice

No record, no answer (or a clearly-flagged "best guess").
Every claim is cited back to a source URL.
Out-of-scope questions route to a fallback message that points the user to authoritative sources.
User feedback (thumbs up/down, "this is wrong") feeds back into the data review queue.

Current status

The crawler and normalization pipeline is in production against a first slice of Mongolian public education data. The RAG answer layer is being hardened with an evaluation set built from real student questions, with explicit pass criteria on groundedness, citation accuracy, and refusal correctness.

Next steps

Expand source coverage to scholarship and exam data.
Add structured "career path" graph as a first-class entity.
Public beta with a small cohort of students.

Lessons

The single biggest lesson: in education contexts, an honest "I don't know, but here is the official source" is worth ten times more than a confident wrong answer. This is not a UX preference — it is the product. Trust is the moat.

The second lesson: data work is unglamorous and it is the work. A month spent on extraction accuracy saves a year of apologizing to users.

Links

Project notes

Next case study

Mazaal AI