case studyMarch 17, 2026 · TypeScript · Node.js · DOCX parsing

Bijoy to Unicode LaTeX

Converts Bijoy-encoded Bangla MCQ Word files into structured JSON with Unicode text and raw LaTeX equations.

Overview

This project solves a very specific but high-value workflow: taking Bangla MCQ documents written in Bijoy encoding and turning them into clean structured data that can be rendered on the web.

It reads .docx files, converts Bijoy Bangla text to Unicode, walks OMML math nodes into raw LaTeX, and keeps question text and options aligned in the same order they appeared in the original document.

Highlights

  • Converts Bijoy Bangla text to Unicode with reordering and normalization support.
  • Transforms Word equations from OMML into raw LaTeX for KaTeX or MathJax pipelines.
  • Handles common Bangladesh MCQ layouts including multi-column option tables.
  • Ships both a browser-based upload UI and a CLI workflow for batch conversion.

Why it matters

This reduces a painful manual content-preparation step for exam, LMS, and question-bank workflows where Bangla text and math formatting usually break during migration.

Repository: github.com/bipulhf/bijoy-to-unicode-latex.