Architecture

How GemMate Works

Smart routing between on-device and local GPU inference

📱

GemMate App

Chat · Flashcards · Quizzes · Mind Maps

🔀

Smart Router

Auto-selects the best AI model

🧠

On-Device

Gemma 4 E2B

Runs via LiteRT-LM. Offline, 3-8s.

💻

LAN GPU

Gemma 4 E4B

Via Ollama on laptop. <1s response.

Condition	Model	Latency
WiFi + Laptop available	Gemma 4 E4B via Ollama	< 1s
No WiFi, model installed	Gemma 4 E2B on-device	3-8s
WiFi + No laptop	Gemma 4 E2B on-device	3-8s
No WiFi, no model	Prompt to download	—

🤖

AI Model

Gemma 4 E2B / E4B

⚡

Runtime

LiteRT-LM (on-device) + Ollama (GPU)

💎

Framework

Flutter 3.41 / Dart

🧮

Study Algorithm

SM-2 Spaced Repetition

👁️

OCR / Vision

ML Kit (offline) + Gemma 4 multimodal

💾

Storage

SharedPreferences + JSON

🛡️

Privacy by Design

Your data never leaves your device. No cloud. No tracking. No API keys. Open source under Apache 2.0.

✓

Zero data collection — all processing stays on your device

✓

No API keys needed — Gemma 4 runs locally

✓

Open source — audit the code yourself on GitHub