Career Advisor
AI-powered education and career guidance built on structured Mongolian education data.
- Role
- Founder / Builder
- Year
- 2026
- Status
- active
- Stack
- Next.js · TypeScript · Python +4
Problem
Students cannot get reliable, structured, and context-aware guidance from messy public education data.
Solution
A data-backed AI advisor that collects, structures, validates, and retrieves education and career information before answering — not another prompt wrapper.
Highlights
- Web crawler pipeline for public education sources
- Structured data extraction and normalization
- Validation and quality scoring layer
- Knowledge base with retrieval-augmented generation
- AI-assisted data improvement loop
- Mongolian language coverage as a first-class concern
Stack
01
Data pipeline
The whole system rests on one decision: data quality beats prompt cleverness. The pipeline crawls public Mongolian education and career sources, extracts structured fields with LLM-assisted parsing, runs them through a normalization layer that resolves duplicates, language variants, and category mismatches, and lands them in a versioned knowledge base.
Every record has a source, an extracted-at timestamp, a confidence score, and a validation status. Records that fail validation are routed back into a human-review queue with diff views — the AI improves its own data over time, but never silently.
Why this matters
Most "AI education" products skip this step. They take public web data, shove it through a prompt, and hope. The result is confident nonsense — exactly the wrong tool for a student making a real decision.
02
RAG answer flow
When a user asks a question, the system first classifies intent (career path, university, scholarship, exam, timeline), retrieves the most relevant records from the knowledge base, and constructs an answer with explicit citations. The model cannot answer questions that the data cannot support — it has to either return grounded content or refuse with a clear reason.
Guardrails in practice
- No record, no answer (or a clearly-flagged "best guess").
- Every claim is cited back to a source URL.
- Out-of-scope questions route to a fallback message that points the user to authoritative sources.
- User feedback (thumbs up/down, "this is wrong") feeds back into the data review queue.
03
Current status
The crawler and normalization pipeline is in production against a first slice of Mongolian public education data. The RAG answer layer is being hardened with an evaluation set built from real student questions, with explicit pass criteria on groundedness, citation accuracy, and refusal correctness.
Next steps
- Expand source coverage to scholarship and exam data.
- Add structured "career path" graph as a first-class entity.
- Public beta with a small cohort of students.
04
Lessons
The single biggest lesson: in education contexts, an honest "I don't know, but here is the official source" is worth ten times more than a confident wrong answer. This is not a UX preference — it is the product. Trust is the moat.
The second lesson: data work is unglamorous and it is the work. A month spent on extraction accuracy saves a year of apologizing to users.
Links