LangReels Language Learning App

LangReels Language Learning App

AI-powered TikTok-style language learning platform that transforms user-generated videos into interactive learning experiences. Combines Social Learning Theory with advanced AI to make authentic cultural exchange the foundation of language acquisition.

LangReels Language Learning App

LangReels AI

Role Stack Status
Solo builder: product, architecture, and implementation Flutter · Firebase · AWS · Google Cloud Proof of Concept

GitHub LinkedIn


Watch the demo:

The Problem

Language learning apps are built around structured curricula — flashcards, grammar drills, vocabulary lists. These build foundational knowledge but fail at the thing that actually makes people fluent: exposure to natural, unscripted language used in real cultural contexts.

With 1.5 billion people actively learning a second language, the gap between “knowing vocabulary” and “understanding a real conversation” remains stubbornly wide. Authentic native-speaker content exists in abundance — but it’s inaccessible to learners. Too fast, no subtitles, no way to pause on a specific phrase and study it.

LangReels removes that friction. Creators record short natural videos in their own language. The AI pipeline does everything else.


The Solution

A creator records a video (up to 2 minutes). Within 3 minutes, that video is:

  • Transcribed in the original language with word-level and sentence-level timing
  • Translated into 15 languages simultaneously
  • Published to a social feed with two distinct viewing modes

No manual transcription. No manual translation. No editing. The pipeline is fully automated.

Home feed with live word-level subtitles
Home feed — word-level karaoke subtitles in normal mode

Two Viewing Modes on the Same Video

The core product insight is that language learning has two distinct states — passive acquisition and active study — and the same content should serve both without forcing the user to choose upfront.

Normal mode (passive watching) Subtitles appear word-by-word, synchronised at the millisecond level. Each word lights up at the exact moment it’s spoken — karaoke-style. The learner picks any of the 15 supported languages for subtitles and can switch languages mid-playback. No reload, no re-fetch — all 15 translations are pre-loaded.

Study mode (active learning) The video pauses at sentence boundaries. The learner sees the full sentence, can replay it, navigate to the previous or next sentence, and read it in any language. This turns a 30-second clip into a structured micro-lesson.

Switching between modes is instant because both data sets — word-level timing and sentence-level timing — are stored separately in Firestore and loaded together when the reel opens.

Study mode with sentence navigation controls
Study mode — sentence-level navigation and replay controls

The Technical Pipeline

LangReels AI pipeline architecture diagram
End-to-end processing pipeline

The moment a video lands in Firebase Storage, two Cloud Functions trigger in parallel:

Content moderation — Google Video Intelligence API scans for explicit content using frame-level detection and segment-level label analysis. Rejected content never reaches transcription.

Transcription (event-driven) — This is where the architecture gets interesting. AWS Transcribe jobs take 1–3 minutes. Holding a Cloud Function open to poll AWS would be expensive and fragile — Firebase Functions have a maximum 9-minute timeout and polling is wasteful.

The solution splits the work across two functions using AWS EventBridge as a callback:

  1. transcribeAndTranslate uploads the video to S3, starts an AWS Transcribe job with automatic language detection across 15 language codes, stores job metadata in Firestore, and exits in ~10 seconds
  2. AWS Transcribe runs independently in the background
  3. When the job completes, EventBridge fires → SNS topic → HTTP POST to handleTranscribeWebhook
  4. The webhook retrieves the transcript JSON and SRT file from S3, parses word-level timing and sentence boundaries, and hands off to translation

Translation (batch approach) — An earlier version translated the full transcript as one block per language, then tried to split the result back into individual sentences. This caused truncation — sentences ending mid-phrase because the alignment logic couldn’t reliably split arbitrary translated text.

The fix: send all sentences as an array to Google Translate in a single batch call per language. The API returns a 1:1 mapped array — guaranteed completion, no truncation, and the batch context preserves discourse coherence across sentences.

Input:  ["Sentence 1", "Sentence 2", "Sentence 3"]
Output: ["Traducción 1", "Traducción 2", "Traducción 3"]

14 batch calls (one per non-source language) write the results to Firestore. The Flutter app’s realtime stream listener fires, and the reel appears in the feed.

Processing status screen
Real-time processing status — each stage reflects actual backend state

Key Design Decisions

Why AWS Transcribe over OpenAI Whisper? Early versions (v4.0) used Whisper. AWS Transcribe was adopted for three reasons: native SRT output with sentence boundary detection, built-in automatic language identification across all 15 supported codes, and the event-driven architecture that avoids Cloud Function timeout limits entirely.

Why sentence-level navigation (not word-level)? Early prototypes let learners navigate word by word. User testing showed this was too granular — learners got stuck on individual words and lost the sentence’s meaning and rhythm. Fluent speakers parse speech in chunks, not word by word, so the navigation unit should match.

Why auto-detect language (not creator-specified)? Removing the language tagging step from the creator flow reduces friction and is more accurate for non-dominant dialects and regional varieties that creators might not know how to categorise.

Why include Indian regional languages? Kannada, Marathi, and Tamil are underserved by every major language learning platform. These languages have large speaker populations and active learner bases — the gap between supply and demand for learning content is much wider than for European languages.


Tech Stack

Layer Technology
Mobile app Flutter (iOS + Android), Provider state management
Auth + database Firebase Auth, Firestore
File storage Firebase Storage
Backend Firebase Cloud Functions (Node.js 20)
Speech-to-text AWS Transcribe — auto language detection, SRT output
Job callbacks AWS EventBridge → AWS SNS → HTTP webhook
Translation Google Cloud Translation API v3 (batch)
Content moderation Google Cloud Video Intelligence API
Temp file storage AWS S3 (ephemeral — deleted after each job)

Cost per video

Service Cost
AWS Transcribe ~$0.048 / 2-min video
Google Cloud Translation ~$0.07 / 2-min video
Google Video Intelligence ~$0.10 / 2-min video
Firebase Functions + S3 < $0.01
Total ~$0.15–0.23 per video

Test Dataset

12 languages tested end-to-end during development: Arabic, Chinese, French, German, Hindi, Italian, Kannada, Korean, Marathi, Portuguese, Russian, Spanish — covering Latin, Devanagari, CJK, Hangul, Arabic, Cyrillic, and Kannada scripts.

View test reels on YouTube


What I Learned

Event-driven over polling. The shift from a polling architecture to EventBridge callbacks cut Cloud Function runtime from 3+ minutes to ~10 seconds per invocation. The pattern is reusable for any long-running async job.

Batch API calls preserve context. Translating sentences together (not individually) maintains discourse coherence and eliminates the alignment problem entirely. Simpler code, better output.

The friction of content creation is the real product problem. Every step that requires creator input after upload reduces supply. The pipeline is intentionally zero-touch — the creator records, and the product handles everything else.


Open source — view the full codebase and documentation on GitHub

Interested in discussing AI-powered educational products or product management? Let’s connect