System Documentation

Comprehensive visualization of architecture, dependencies, and user interaction flows

Complete Reference

Diagrams + User Flows

Anki Audio Pipeline

Anki Audio Pipeline
Data Flow - Latvian Learning Platform

Data Flow - Latvian Learning Platform
Docker Network Topology

Docker Network Topology
Failure Points Analysis

Failure Points Analysis
Lesson Upload Flow

Lesson Upload Flow
Network Topology - Latvian Learning Platform

Network Topology - Latvian Learning Platform
Service Dependencies

Service Dependencies
System Overview - Latvian Learning Platform

System Overview - Latvian Learning Platform
Transcription Reliability

Transcription Reliability
Data-Flow-1

Data-Flow-1
Data-Flow

Data-Flow
Service-Dependencies

Service-Dependencies
Upload-Process-Flow-1

Upload-Process-Flow-1
Upload-Process-Flow

Upload-Process-Flow

About Architecture Diagrams

How TILTS Works - Complete Breakdown

Master Overview

The existing documentation showed what the system does but not how users interact with it. These documents provide complete traceability from:

- User Action β†’ Button Click β†’ API Endpoint β†’ Service Chain β†’ Database Update β†’ UI Feedback

This enables systematic troubleshooting when users report "it's not working" by providing a complete audit trail.

### 1. [Recording Upload Flow](01-recording-upload-flow.md)
Purpose: Complete audio lesson processing pipeline
User Action: Upload audio file for lesson generation
System Scope: 17+ microservices, GPU transcription, AI processing

Key Flows:
- File upload β†’ Temporary storage β†’ Job queue
- Whisper transcription (GPU) β†’ AI processing chain
- Lesson assembly β†’ AnkiConnect export β†’ CEFR update

Critical Dependencies:
- GPU access in LXC containers
- Database connectivity across services
- Volume mounts for model files

### 2. [Learning Exercise Flow](02-learning-exercise-flow.md)
Purpose: Interactive practice and study sessions
User Action: Complete writing/test/grammar exercises
System Scope: Dashboard frontend, Flask backend, PostgreSQL

Key Flows:
- Exercise generation from AnkiConnect data
- GPT-powered corrections and feedback
- Progress tracking and CEFR calculation

### 3. [Content Processing Pipeline](03-content-processing-pipeline.md)
Purpose: Joplin note ingestion and vocabulary processing
User Action: Add vocabulary notes in Joplin
System Scope: Background daemon, GPT enhancement, Anki sync

Key Flows:
- Joplin polling β†’ Classification β†’ Enhancement β†’ Deduplication β†’ Anki import
- Support for structured vocab, chat logs, and image extraction

### 4. [Dashboard Widget Data Flows](04-dashboard-widget-data-flows.md) ⭐ NEW
Purpose: Map data sources for each dashboard widget
User Action: View dashboard to check progress
System Scope: Real-time APIs, cached files, external services

Key Widgets:
- CEFR Progress: 7-component calculation from multiple sources
- Anki Statistics: Real-time AnkiConnect queries
- Practice Progress: JSONL file aggregation
- Daily Streak: Persistent counter with date tracking

Data Sources Mapped:

CEFR Widget β†’ /api/cefr β†’ cefr_summary_latest.json + PostgreSQL + AnkiConnect
Anki Widget β†’ /api/anki β†’ AnkiConnect direct queries
Progress Widget β†’ /api/data β†’ Multiple JSONL files
Streak Widget β†’ /api/daily/streak β†’ streak.json

### 5. [Button Click Journeys](05-button-click-journeys.md) ⭐ NEW
Purpose: Complete system flows from button clicks to data persistence
User Action: Submit corrections, generate tests, create lesson plans
System Scope: End-to-end traceability with failure points

Critical Journeys Mapped:

#### "Submit for Correction" Flow

User Text Input β†’ POST /api/practice/writing/submit β†’ 
WritingProcessor β†’ GPT-4.1 API β†’ Corrections β†’ 
MemoryStore β†’ writing_scores.jsonl β†’ 
CEFRTracker β†’ cefr_summary_latest.json β†’ 
PostgreSQL shadow-write β†’ UI Update

#### "Generate Test" Flow

Test Options β†’ POST /api/practice/test/generate β†’
ExerciseGenerator β†’ AnkiConnect queries β†’
AdaptiveEngine β†’ Exercise array β†’
Audio file resolution β†’ UI display

#### "Create Lesson Plan" Flow

Lesson Text β†’ POST /api/plan/create β†’
Archive current plan β†’ GPT extraction β†’
YAML file save β†’ AnkiConnect sync β†’
PostgreSQL progress init β†’ UI redirect

### 6. [Troubleshooting User Flows](06-troubleshooting-user-flows.md) ⭐ NEW
Purpose: Systematic diagnostics for user interaction failures
User Action: "It's not working" β†’ Root cause analysis
System Scope: Decision trees, diagnostic commands, recovery procedures

Troubleshooting Patterns:

#### Widget Not Loading

flowchart TD
    A[User Report] --> B[Identify Widget]
    B --> C[Check API Endpoint]
    C --> D[Test Service Health]
    D --> E[Check Data Sources]
    E --> F[Identify Root Cause]
    F --> G[Apply Fix]

#### Practice Submission Fails

flowchart TD
    A[Button Click] --> B[Frontend Check]
    B --> C[API Health]
    C --> D[GPT API Status]
    D --> E[File Permissions]
    E --> F[Database Connectivity]
    F --> G[Recovery Action]

#### Progress Not Updating

flowchart TD
    A[Completed Action] --> B[Check File Updates]
    B --> C[Verify API Responses]
    C --> D[Test Calculations]
    D --> E[Check Cache Refresh]
    E --> F[Fix Data Flow]

### Frontend β†’ Backend Mapping

| UI Component | Route | API Endpoint | Backend Service | Data Store |
|--------------|-------|--------------|----------------|------------|
| Dashboard CEFR Card | / | /api/cefr | CEFRTracker | cefrsummarylatest.json |
| Writing Practice | /practice | /api/practice/writing/submit | WritingProcessor + GPT | writing_scores.jsonl |
| Test Generator | /practice | /api/practice/test/generate | ExerciseGenerator | AnkiConnect |
| Lesson Plans | /plan | /api/plan/create | LessonPlanExtractor + GPT | YAML files |
| Progress Stats | /progress | /api/data | Multiple sources | JSONL + PostgreSQL |
| Vocabulary Lookup | /morphology | /api/morphology/analyze | MorphAnalyzer | Database |

### Service Interaction Patterns

#### Real-Time Services (Low Latency)
- AnkiConnect: Direct deck queries for test generation
- Dashboard APIs: Cached file reads for fast display
- Health Checks: Socket connections for service status

#### Processing Services (High Latency)
- GPT APIs: 2-10 second response times for corrections/extractions
- Audio Processing: 30-120 seconds for transcription
- Lesson Pipeline: 5-15 minutes for complete processing

#### Background Services (Asynchronous)
- Joplin Daemon: 20-second polling for new notes
- CEFR Calculator: Triggered by practice submissions
- File Cleanup: Daily maintenance jobs

### Data Flow Patterns

#### User-Triggered Updates

User Action β†’ API Call β†’ Service Processing β†’ 
Database Update β†’ File Update β†’ Cache Refresh β†’ 
UI Update (via page reload or AJAX)

#### Background Updates

Scheduled Job β†’ Data Collection β†’ 
Processing β†’ File/Database Write β†’ 
Next UI Load Shows New Data

#### External Dependencies

User Request β†’ Internal Processing β†’ 
External API (GPT/AnkiConnect) β†’ 
Response Processing β†’ Data Storage β†’ 
User Feedback

### 1. Symptom Identification
- Which widget/feature is affected?
- What was the user trying to accomplish?
- What error message or behavior was observed?

### 2. Flow Isolation
- Identify the specific user journey involved
- Locate the primary API endpoint
- Check the service chain dependencies

### 3. Component Testing
- Test frontend JavaScript (browser console)
- Test API endpoint directly (curl)
- Test service dependencies (health checks)
- Test data stores (file existence, database queries)

### 4. Root Cause Analysis
- Follow the data flow backward from failure point
- Check logs at each service boundary
- Verify external API availability
- Confirm configuration and permissions

### 5. Recovery Actions
- Immediate: Restart failed services
- Short-term: Fix configuration or permissions
- Long-term: Address systemic issues

### Existing Architecture Documents
- System Architecture Diagrams: Show service relationships
- Service Interactions: Define API contracts
- Database Schema: Data storage structure

### New User Journey Documents
- Complete Flow Tracing: User action to system response
- Troubleshooting Guidance: Failure diagnosis and recovery
- Performance Monitoring: Critical path identification

### Combined Value
- Developers: Understand impact of code changes on user experience
- Operations: Diagnose user-reported issues systematically
- Users: Self-service troubleshooting for common issues

### Emergency Diagnostics

# Check all critical services
curl -s http://localhost:5002/health | jq '.microservices.down'

# Test user critical paths
curl -s http://localhost:5002/api/cefr >/dev/null && echo "Dashboard: OK"
curl -s http://localhost:8765 >/dev/null && echo "Anki: OK" 
docker exec latvian-postgres pg_isready && echo "DB: OK"

### Common Issues
1. Dashboard Empty: Check cefrsummarylatest.json existence
2. Anki Unavailable: Restart anki-headless container
3. Practice Fails: Check GPT API key and quota
4. Tests Empty: Verify AnkiConnect deck access
5. Plans Don't Save: Check lesson_plans/ permissions

### Recovery Commands

# Restart core services
docker restart latvian-main-site anki-headless latvian-postgres

# Fix file permissions
sudo chown -R david:david /srv/latvian_learning/logs/

# Reset corrupted state
rm /srv/latvian_learning/logs/memory/corrupt_file.jsonl
systemctl --user restart latvian_dashboard.service

---

Document Status: Complete user interaction flow documentation suite
Last Updated: 2026-05-17
Purpose: Bridge the gap between system architecture and user experience troubleshooting
Next Steps: Integrate with monitoring system for automated flow validation

Detailed Processing Pipelines

Recording Upload Flow

### Phase 1: Initial Access & Upload

#### Step 1.1: User Navigation
- User Action: Access https://latvian.shifting-ground.link/recording
- System Response: Nginx proxy routes to 192.168.1.11:5002 (latvian-main-site container)
- Frontend Load: Flask serves upload interface with drag-and-drop (200MB limit)
- Dependencies:
- Nginx proxy configuration
- Docker container latvian-main-site running on port 5002
- Flask application routing

Diagnostic Commands:

# Check domain resolution and proxy
curl -I https://latvian.shifting-ground.link/recording

# Check direct container access
curl -I http://192.168.1.11:5002/recording

# Verify container status
docker ps | grep latvian-main-site

#### Step 1.2: File Selection & Validation
- User Action: Drag/drop or select audio file
- Frontend Validation:
- File size check (≀200MB)
- File type validation (audio formats)
- JavaScript client-side checks
- UI Feedback: Progress indicators, file info display

Potential Failure Points:
- File size exceeding 200MB limit
- Unsupported audio format
- JavaScript errors preventing validation

#### Step 1.3: Upload Initiation
- User Action: Click "Upload & Process" button
- Frontend: Creates multipart/form-data POST request to /recording/upload
- Network Path: User browser β†’ Nginx β†’ Docker bridge β†’ latvian-main-site:5000

### Phase 2: Backend Processing Initiation

#### Step 2.1: Flask Route Handling
- Endpoint: POST /recording/upload
- Process:
1. Flask receives multipart form data
2. Validates file on server side
3. Extracts audio file from form
4. Generates unique jobid (timestamp-based)

File Locations:
- Temp Storage: /srv/latvian
learning/tempuploads/
- Processing: /srv/latvian
learning/workspace/jobs/

#### Step 2.2: AudioIngestor Invocation
- Module: AudioIngestor (inferred from architecture analysis)
- Operations:
1. Creates job directory structure
2. Moves file from temp to processing workspace
3. Generates metadata.json with job details
4. Audio format normalization to standard WAV

Directory Structure Created:

/srv/latvian_learning/workspace/jobs/pending//
β”œβ”€β”€ raw_audio.wav          # Normalized audio input
β”œβ”€β”€ metadata.json          # Job metadata and timestamps
└── status.json           # Processing stage tracker

#### Step 2.3: Queue Placement
- Action: Job moved to pending queue
- Location: /srv/latvianlearning/workspace/jobs/pending/id>/
- Status: status.json initialized with "stage": "transcribing"
- Trigger: Background lessonagent.py picks up pending jobs

### Phase 3: AI Processing Pipeline

#### Step 3.1: Transcription Stage
- Processor: lesson
agent.py β†’ WhisperProcessor
- Service Called: asr-transcription-lv:8101 (GPU-dependent)
- Input: rawaudio.wav
- Output: transcript.json
- Technology: Whisper model with CUDA acceleration

Service Communication:

# Internal container communication
latvian-main-site β†’ asr-transcription-lv:8101
# Network: latvian_network (172.18.0.0/16)
# Protocol: HTTP POST with audio data

Critical Dependencies:
- GPU device access (device
ids: ['1'])
- NVIDIA Docker runtime
- Whisper model files in /srv/latvianxtts/models/whisper-lv-ct2
- CUDA environment variables

Diagnostic Commands:

# Check GPU access in container
docker exec asr-transcription-lv nvidia-smi

# Test service health
curl http://192.168.1.11:8101/health

# Check model files
docker exec asr-transcription-lv ls -la /srv/latvian_xtts/models/whisper-lv-ct2

#### Step 3.2: Segmentation Stage
- Processor: LessonSegmenter module
- Input: transcript.json
- Outputs:
- transcript
norm.json (normalized transcript)
- dialoguelines.json (segmented dialogue)
- Processing: Text normalization, dialogue segmentation

#### Step 3.3: Content Generation Pipeline
Service Chain (Sequential Processing):

graph TD
    A[transcript_json] --> B[UDPipe_Analysis_8092]
    B --> C[Sentence_Embeddings_8093]
    C --> D[Fluency_Scoring_8094]
    D --> E[Validation_Gateway_8097]
    E --> F[Template_Extraction_8098]
    F --> G[Constrained_Generation_8099]
    G --> H[Repair_Loop_8100]
    H --> I[lesson_json_final_output]

Detailed Service Interactions:

| Stage | Service | Port | Input | Output | Purpose |
|-------|---------|------|-------|--------|---------|
| Morphology | udpipe-lv | 8092 | Text segments | Linguistic analysis | Word structure analysis |
| Embedding | sentence-embedder-lv | 8093 | Analyzed text | Vector embeddings | Semantic understanding |
| Fluency | fluency-index-lv | 8094 | Embeddings | Fluency scores | Difficulty assessment |
| Validation | comprehensive-validation-gateway | 8097 | Scored content | Validated content | Quality control |
| Extraction | template-extractor-lv | 8098 | Validated text | Language patterns | Educational content |
| Generation | constrained-generator-lv | 8099 | Patterns | Generated exercises | AI content creation |
| Refinement | repair-loop-lv | 8100 | Generated content | Polished content | Quality improvement |

Database Dependencies:

# Multiple services require database access
morphological-analyzer-lv β†’ latvian-postgres:5432/tilts_tezaurs
vocabulary-database-lv β†’ latvian-postgres:5432/tilts_tezaurs  
template-extractor-lv β†’ latvian-postgres:5432/tilts_tezaurs
latvian-main-site β†’ latvian-postgres:5432 (multiple databases)

### Phase 4: Export & Integration

#### Step 4.1: Lesson Assembly
- Stage: assembling
- Input: All processed content from previous stages
- Output: lesson.json (complete lesson package)
- Process: Consolidates all AI-generated content into structured lesson

#### Step 4.2: Anki Export Preparation
- Module: IntelligentCardGenerator
- Input: lesson.json
- Output: anki
cards.json
- Enhancement: GPT-powered enrichment (phonetic, examples, grammar notes)

#### Step 4.3: AnkiConnect Integration
- Service: anki-headless:8765
- Process:
1. Connect to AnkiConnect API
2. Deduplication check against existing cards
3. Card creation with 4-Card Template v2 structure
4. CEFR tracking updates

Anki Card Structure:
- Deck: Latvian (ChatGPT) > Vocab & Sentences
- Template: 4-Card Template v2
- Fields: latvian, english, phonetic, gender, plural, examplelv, exampleen, notes, audio-latvian, image, morphology

### Phase 5: Job Completion

#### Step 5.1: Status Updates
- Action: status.json updated to "stage": "done"
- Location: Job moved to /srv/latvianlearning/workspace/jobs/done/id>/
- Logs: Processing logs saved for diagnostic purposes

#### Step 5.2: User Notification
- UI Update: Progress indicator shows completion
- Dashboard: New vocabulary appears in CEFR tracking dashboard
- Anki: Cards available for study in Anki application

User Audio Upload
    ↓
Flask Temp Storage β†’ AudioIngestor β†’ Pending Queue
    ↓
Whisper Transcription (GPU) β†’ transcript.json
    ↓
Segmentation β†’ dialogue_lines.json + transcript_norm.json
    ↓
AI Processing Chain (17 services) β†’ Enhanced Content
    ↓
Lesson Assembly β†’ lesson.json
    ↓
AnkiConnect Export β†’ Flashcards in Anki
    ↓
CEFR Tracking Update β†’ Dashboard Statistics

### 1. GPU Access Issues (HIGH RISK)
Services Affected: asr-transcription-lv, back-translator-lv
Problem: LXC may not properly expose GPU devices

Diagnostic Steps:

# Check GPU visibility in containers
docker exec asr-transcription-lv nvidia-smi
docker exec back-translator-lv nvidia-smi

# Check NVIDIA Docker runtime
docker info | grep -i nvidia

# Verify device mappings
docker inspect asr-transcription-lv | grep -A5 DeviceRequests

### 2. Database Connectivity (HIGH RISK)
Services Affected: Multiple AI services requiring database access
Problem: Connection strings or network routing issues

Diagnostic Steps:

# Test database connectivity from dependent services
docker exec morphological-analyzer-lv nc -z latvian-postgres 5432
docker exec vocabulary-database-lv nc -z latvian-postgres 5432
docker exec template-extractor-lv nc -z latvian-postgres 5432

# Check database container
docker exec latvian-postgres pg_isready -U postgres

### 3. Volume Mount Failures (MEDIUM RISK)
Affected: Model files, workspace directories
Problem: File permissions or mount point changes

Diagnostic Steps:

# Check workspace permissions
docker exec latvian-main-site ls -la /srv/latvian_learning/workspace/
docker exec asr-transcription-lv ls -la /srv/latvian_xtts/models/

# Verify volume mounts
docker inspect latvian-postgres | grep -A10 Mounts
docker inspect asr-transcription-lv | grep -A10 Mounts

### 4. Service Communication (MEDIUM RISK)
Problem: Internal Docker network routing issues

Diagnostic Steps:

# Test internal service connectivity
docker exec latvian-main-site wget -q -O- http://asr-transcription-lv:8101/health
docker exec latvian-main-site wget -q -O- http://comprehensive-validation-gateway:8097/health

# Check Docker network
docker network inspect latvian_network

### Immediate Recovery Steps
1. Check container health: docker ps --format "table {{.Names}}\t{{.Status}}"
2. Restart failed services: docker-compose restart name>
3. Check logs: docker logs name> --tail 50
4. Test service endpoints: Use curl to verify service health endpoints

### Advanced Diagnostics
1. Network debugging: Use docker exec to test internal connectivity
2. GPU troubleshooting: Verify NVIDIA runtime and device access
3. Database verification: Check PostgreSQL connection and schema
4. Volume inspection: Verify file permissions and mount points

### Key Metrics to Track
- Upload success rate: Percentage of successful file uploads
- Processing time: End-to-end time from upload to Anki export
- Service health: Individual microservice availability
- GPU utilization: Monitor GPU memory and compute usage
- Database performance: Query response times and connection counts

### Monitoring Endpoints
- Main Health: http://192.168.1.11:5002/health
- Services Status: http://192.168.1.11:5002/services/status
- Individual Services: http://192.168.1.11:/health

---

Document Status: Complete user journey mapping for recording upload flow
Last Updated: 2026-05-16
Next Steps: Validate each step against current system state post-migration

Learning Exercise Flow

### Phase 1: Educational Dashboard Access

#### Step 1.1: Dashboard Navigation
- Entry Point: https://latvian.shifting-ground.link or http://192.168.1.11:5002
- Landing: Main educational dashboard
- Navigation Options:
- Dashboard (CEFR tracking)
- Lesson Plans (weekly structure)
- Stats (Anki integration)
- Practice (interactive exercises)
- Shadowing (audio practice)

Service Chain:

User Browser β†’ Nginx Proxy β†’ latvian-main-site:5000 (Flask)

Template System:
- Base: Flask with Jinja2 templating
- Navigation: Centralized via templates/partials/navbar.html
- Styling: Mobile-responsive CSS with breakpoints (768px, 375px)

#### Step 1.2: CEFR Progress Display
- Data Source: logs/agent/cefrsummarylatest.json
- Components Tracked:
- Vocabulary (35%) - Unique words in Anki
- Maturity (20%) - Card intervals β‰₯21 days
- Quality (15%) - Retention + consistency + streak
- Writing (12%) - GPT correction scores
- Tests (8%) - Quiz accuracy
- Grammar (8%) - Grammar exercise scores
- AI Assessment (2%) - Weekly GPT evaluation

Score Calculation:

A1-: 0.0-0.5
A1:  0.5-0.8
A1+: 0.8-1.0
A2-: 1.0-1.3
A2:  1.3-1.7
A2+: 1.7-2.0

### Phase 2: Lesson Plan System

#### Step 2.1: Weekly Lesson Access
- Route: /plan or /plan/id>
- Data Source: lessonplans/*.yaml files
- Structure:
- Vocabulary (~20 words with focus forms)
- Grammar topic with rules and examples
- Writing prompts with requirements
- Exercise tracking

Lesson Plan Structure:

meta:
  id: 2025_w48
  title: "Locative Case & Places"
  week_number: 48
  due_date: "2025-12-06"
vocabulary:
  - latvian: "veikals"
    english: "store"
    gender: "m"
    focus_forms:
      - form: "veikalā"
        case: "locative"
        usage: "Es esmu veikalā."
grammar:
  topic: "Locative Case"
  rules:
    - pattern: "-a β†’ -ā"
      examples: [{base: "māja", inflected: "mājā"}]

#### Step 2.2: Lesson Lifecycle Management
- Active Lesson: One lesson active at a time
- Archival: Creating new lesson archives current one
- Carry-Forward: Unmastered vocab (max 10) and incomplete writing (max 2) transfer to new lesson
- Priority Algorithm:

  priority = (1 - combined_accuracy) * 100 + min(total_attempts, 20)
           + min(anki_lapses * 10, 30)  # Anki failure penalty
           + recent_failure_boost(25)   # Failed within 7 days
  ```

#### Step 2.3: Anki Integration
- **Sync Endpoint:** `/api/plan//anki/sync`
- **Tags Applied:** `week_XX::vocab`, `week_XX::grammar`, `current_week`
- **Image Generation:** DALL-E icons for cards without images
- **Filtered Deck:** `tag:current_week` shows current week's vocabulary

### Phase 3: Interactive Practice System

#### Step 3.1: Writing Practice Flow
**Route:** `/practice/writing`

**User Journey:**
1. User selects writing prompt (configurable or lesson-based)
2. Writes Latvian text in text area (minimum word count enforced)
3. Submits for GPT correction via `/api/practice/writing/submit`
4. Receives instant feedback with error categorization
5. Views corrected text with inline comparisons
6. Score recorded to CEFR writing component

**GPT Processing:**
- **Model:** gpt-4.1 (tutor model slot)
- **Error Categories:** Spelling, grammar, case, gender, word order
- **Output:** Corrected text + detailed feedback + score (0-100)

**Data Persistence:**
- **File:** `logs/memory/writing_scores.jsonl`
- **Format:** `{"timestamp": "...", "score": 85, "errors": 3, "prompt": "..."}`

#### Step 3.2: Test Practice Flow
**Route:** `/practice/test`

**Exercise Types:**
1. **Fill-in-blank:** Complete sentences with missing words
2. **Translation LV→EN:** Translate Latvian to English
3. **Translation EN→LV:** Translate English to Latvian
4. **Multiple Choice:** Select correct answer from options
5. **Listening:** Audio comprehension with HyperTTS

**Question Generation Process:**

User selects strategy β†’ API call to /api/practice/test/generate
↓
AnkiConnect queries cards β†’ Strategy filtering:
- weak: Low success rate cards
- recent: Recently added cards
- random: Random selection
- mixed: Equal distribution of all types
↓
GPT generates questions β†’ Format validation β†’ Return to frontend

**Audio Integration:**
- **Source:** Anki HyperTTS audio files
- **Verification:** Audio files must exist in Docker container
- **Path:** `/api/anki/audio/`
- **Format:** MP3 files generated by HyperTTS

#### Step 3.3: Grammar Practice Flow
**Route:** `/practice/grammar`

**Topic Sources:**
1. **A1 Grammar Bank** (18 topics) - Always available
2. **Lesson Plan Topics** - From weekly lesson plans

**A1 Grammar Bank Topics:**
- Personal Pronouns, Verb "bΕ«t", Noun Genders
- Cases: Nominative, Accusative, Locative, Dative
- Tenses: Present, Past, Future
- Adjective Agreement, Possessive/Demonstrative Pronouns
- Question Words, Negation, Numbers 1-100
- Prepositions, Reflexive Verbs

**Exercise Types:**
1. **Fill-in-blank:** Grammar pattern completion
2. **Multiple Choice:** Select correct grammatical form
3. **Transformation:** Convert between tenses/cases
4. **Error Correction:** Identify and fix grammatical errors
5. **Conjugation:** Verb form exercises

**Scoring & Tracking:**
- **Individual answers:** Immediate feedback with explanations
- **Final score:** Recorded to CEFR grammar component
- **Data:** `logs/memory/grammar_scores.jsonl`
- **Mastery tracking:** Per-topic progress tracking

### Phase 4: Shadowing Practice System

#### Step 4.1: Shadowing Interface
**Route:** `/shadow`

**Features:**
- **Speed Control:** 0.8x (slow), 1.0x (normal), 1.1x (fast)
- **Composite Audio:** Progressive practice tracks
- **Dual Display:** Latvian screenplay + English translation
- **Voice Variety:** Betty (female) and John (male) Latvian voices

#### Step 4.2: Audio Generation Pipeline
**TTS Service:** Narakeet API
**Voice Configuration:**
yaml
narakeet:
voices:
female: "betty" # Latvian female voice
male: "john" # Latvian male voice
speeds:
slow: 0.8
normal: 1.0
fast: 1.1

**Audio Processing:**
1. User creates/selects dialogue
2. Text split into lines with speaker assignment
3. Narakeet API calls for each line/speed combination
4. Audio files cached with hash-based naming
5. Composite tracks generated (slow β†’ normal β†’ fast)

**File Structure:**

workspace/media/shadowing/id>/
β”œβ”€β”€ line
1slow.mp3 # Individual line audio
β”œβ”€β”€ line
1normal.mp3
β”œβ”€β”€ line
1fast.mp3
β”œβ”€β”€ composite
slow.mp3 # Combined track for speed
β”œβ”€β”€ compositenormal.mp3
└── composite
fast.mp3

#### Step 4.3: Dialogue Management
**API Endpoints:**
- `GET /api/shadow/list` - List all dialogues
- `POST /api/shadow/dialogue` - Create new dialogue
- `PATCH /api/shadow/dialogue/` - Rename dialogue
- `DELETE /api/shadow/dialogue/` - Delete dialogue
- `POST /api/shadow/dialogue//generate` - Regenerate audio

**Dialogue Structure:**
json
{
"id": "dialogueuuid",
"title": "At the Store",
"lines": [
{"speaker": "A", "text
lv": "Labdien!", "texten": "Hello!"},
{"speaker": "B", "text
lv": "Sveiki!", "texten": "Hi!"}
],
"created
at": "2026-05-16T10:30:00Z"
}

### Phase 5: Progress Tracking & Analytics

#### Step 5.1: Daily Suggestions System
**Endpoint:** `/api/daily/suggestions`

**Recommendation Engine:**
- **Streak tracking:** Persistent storage in `logs/memory/streak.json`
- **Weak cards identification:** Based on Anki success rates
- **Balanced practice:** Writing, vocabulary, grammar rotation
- **Adaptive goals:** Based on streak length and performance

**Streak Calculation:**
json
{
"currentstreak": 7,
"last
activity": "2026-05-16",
"longeststreak": 15,
"total
days": 45,
"activities": ["writing", "vocabulary", "grammar"]
}
``

#### Step 5.2: Comprehensive Statistics
Endpoint:
/api/stats/full`

Data Sources:
- Anki stats: Card counts, retention, intervals
- Practice history: Writing/test/grammar scores
- Learning suggestions: Weak areas identification
- CEFR progression: Historical tracking

Integrated Display:
- Dashboard charts: CEFR component visualization
- Progress metrics: Study consistency, retention rates
- Performance trends: Improvement over time
- Recommendations: Personalized study suggestions

### Core Service Interactions

flowchart TD
    A[User Interface Flask] --> B["Practice APIs /api/practice/"]
    B --> C[GPT Processing OpenAI API]
    C --> D["AnkiConnect anki-headless:8765"]
    D --> E[Data Persistence File System]
    E --> F[CEFR Calculations and Dashboard Updates]

### Database Dependencies
- AnkiConnect: Card management, statistics
- File System: Progress tracking, scores, lesson plans
- GPT API: Content generation, correction, scoring

### Critical Integration Points

1. Anki Integration:
- Health Check: /api/anki/status verifies AnkiConnect
- Card Queries: Real-time card data for exercise generation
- Audio Access: Serving HyperTTS files via Flask

2. GPT Processing:
- Model Selection: ai_router.py manages model assignments
- Rate Limiting: Built-in retry and backoff mechanisms
- Quality Control: Response validation and error handling

3. Audio System:
- Narakeet API: External TTS service with caching
- File Management: Hash-based naming prevents duplication
- Streaming: Direct file serving for audio playback

### High-Risk Areas

1. AnkiConnect Unavailable:
- Impact: No exercise generation, progress tracking disabled
- Recovery: Check anki-headless container, restart if needed
- Diagnostic: curl http://192.168.1.11:8765/version

2. GPT API Limits:
- Impact: Writing corrections, exercise generation fail
- Recovery: Retry with exponential backoff
- Monitoring: Track API usage and rate limits

3. Audio File Corruption:
- Impact: Listening exercises, shadowing practice affected
- Recovery: Regenerate audio through Narakeet API
- Prevention: Hash verification, backup strategies

### Monitoring & Alerting

Health Endpoints:
- Main Dashboard: http://192.168.1.11:5002/health
- AnkiConnect: http://192.168.1.11:8765/version
- Service Status: http://192.168.1.11:5002/services/status

Key Metrics:
- Exercise completion rate: Percentage of started exercises completed
- Audio playback success: Audio file availability and playback
- API response times: GPT and Narakeet response latencies
- Data persistence: File write success rates

---

Document Status: Complete learning exercise flow mapping
Last Updated: 2026-05-16
Next Steps: Validate exercise functionality against current system state

Content Processing Pipeline

### Processing Stages Overview

Raw Audio β†’ Transcription β†’ Linguistic Analysis β†’ Content Generation β†’ Quality Control β†’ Educational Output

### Service Topology

flowchart TD
    A[Raw Audio Input] --> B["ASR Transcription :8101 GPU"]
    B --> C[Text Normalization and Segmentation]
    C --> D{Parallel Processing}
    D --> E["UDPipe Analysis :8092"]
    D --> F["Sentence Embedder :8093"]
    D --> G["Morphological Analyzer :8087"]
    D --> H["Vocabulary Database :8088"]
    E & F & G & H --> I{Fluency Assessment}
    I --> J["Fluency Index :8094"]
    I --> K["Fluency Gate :8095"]
    J & K --> L{Validation and Quality Control}
    L --> M["Grammar Gate :8096"]
    L --> N["Grammar Correction :8103"]
    L --> O["Comprehensive Validation Gateway :8097"]
    M & N & O --> P{Content Generation}
    P --> Q["Template Extractor :8098"]
    P --> R["Constrained Generator :8099"]
    P --> S["Repair Loop :8100"]
    Q & R & S --> T{Export and Integration}
    T --> U[Anki Cards Generation]
    T --> V[Lesson Plan Integration]
    T --> W[CEFR Progress Updates]

### Stage 1: Audio Transcription

#### Service: ASR Transcription (Port 8101)
Technology: OpenAI Whisper with CUDA acceleration
Container: asr-transcription-lv
GPU Requirements: Device ID 1, Tesla P4 or equivalent

Processing Flow:
1. Input Validation:
- Audio format verification (WAV, MP3, M4A, etc.)
- Duration limits (typically 30+ minutes supported)
- File size validation (up to several GB)

2. Audio Preprocessing:
- Format normalization to 16kHz WAV
- Noise reduction (optional)
- Volume normalization

3. Whisper Inference:
- Model: whisper-lv-ct2 (Latvian-optimized)
- Location: /srv/latvianxtts/models/whisper-lv-ct2
- Output: Timestamped transcript with confidence scores

Output Format (transcript.json):

{
  "language": "lv",
  "duration": 1247.5,
  "segments": [
    {
      "id": 0,
      "start": 0.0,
      "end": 3.2,
      "text": "Sveiki, Ε‘odien mΔ“s runāsim par...",
      "avg_logprob": -0.15,
      "no_speech_prob": 0.001
    }
  ]
}

Critical Dependencies:
- GPU Access: NVIDIA runtime, CUDA libraries
- Model Files: Pre-trained Whisper Latvian model
- Memory: ~4-8GB GPU memory for inference
- Network: HTTP API endpoint for audio upload

Diagnostic Commands:

# Check GPU access
docker exec asr-transcription-lv nvidia-smi

# Test service health
curl -X POST http://192.168.1.11:8101/transcribe \
  -F "audio=@test.wav"

# Check model loading
docker exec asr-transcription-lv ls -la /srv/latvian_xtts/models/whisper-lv-ct2

### Stage 2: Text Normalization & Segmentation

#### Module: LessonSegmenter
Purpose: Convert raw transcript into structured dialogue lines
Input: transcript.json
Outputs: transcript
norm.json, dialoguelines.json

Processing Steps:
1. Text Normalization:
- Remove filler words, stutters
- Standardize punctuation
- Fix common transcription errors
- Apply Latvian-specific text cleaning rules

2. Dialogue Segmentation:
- Identify speaker boundaries
- Split into logical sentence units
- Preserve semantic coherence
- Extract meaningful dialogue exchanges

Output Structure:

{
  "lines": [
    {
      "id": 1,
      "start_time": 0.0,
      "end_time": 3.2,
      "speaker": "A",
      "text": "Labdien! Kā jums klājas?",
      "confidence": 0.95
    }
  ],
  "metadata": {
    "total_lines": 45,
    "speakers_detected": 2,
    "avg_line_duration": 2.8
  }
}

### Stage 3: Parallel Linguistic Analysis

#### Service: UDPipe (Port 8092)
Purpose: Universal Dependencies parsing for Latvian
Technology: UDPipe neural pipeline
Database Dependency: latvian-postgres:5432/tilts
tezaurs

Processing Capabilities:
- Tokenization: Word and sentence boundaries
- Part-of-Speech Tagging: Grammatical categories
- Lemmatization: Base word forms
- Dependency Parsing: Syntactic relationships
- Morphological Analysis: Case, gender, number, tense

Output Format:

{
  "tokens": [
    {
      "id": 1,
      "form": "Labdien",
      "lemma": "labdien", 
      "upos": "INTJ",
      "feats": "Polarity=Pos",
      "head": 0,
      "deprel": "root"
    }
  ]
}

#### Service: Sentence Embedder (Port 8093)
Purpose: Generate semantic vector representations
Technology: Sentence-transformers, multilingual models

Processing:
1. Text Encoding: Convert text to dense vectors
2. Semantic Analysis: Capture meaning and context
3. Similarity Scoring: Enable content comparison
4. Clustering: Group similar content

Output: 384 or 768-dimensional embeddings per sentence

#### Service: Morphological Analyzer (Port 8087)
Purpose: Advanced morphological analysis
Database Connection: Direct PostgreSQL access for dictionary lookup

Analysis Components:
- Word Structure: Prefix + root + suffix breakdown
- Inflection Patterns: Declension and conjugation rules
- Derivational Morphology: Word formation patterns
- Compound Analysis: Multi-part word decomposition

#### Service: Vocabulary Database (Port 8088)
Purpose: Comprehensive dictionary lookup and validation
Database: tilts_tezaurs - Latvian vocabulary database

Capabilities:
- Word Validation: Check against authoritative dictionary
- Definition Lookup: Retrieve meanings and usage examples
- Frequency Analysis: Word commonality scoring
- CEFR Classification: Difficulty level assignment

### Stage 4: Fluency Assessment

#### Service: Fluency Index (Port 8094)
Purpose: Comprehensive fluency scoring for learning content

Assessment Criteria:
1. Vocabulary Complexity: Word difficulty distribution
2. Syntactic Complexity: Sentence structure analysis
3. Semantic Coherence: Logical flow and clarity
4. Phonological Difficulty: Pronunciation challenges
5. Cultural Context: Idiomatic expressions, cultural references

Scoring Algorithm:

fluency_score = (
    vocabulary_score * 0.3 +
    syntax_score * 0.25 +
    coherence_score * 0.2 +
    phonology_score * 0.15 +
    cultural_score * 0.1
)

Output Range: 0.0 (A1 beginner) to 2.0 (A2+ advanced)

#### Service: Fluency Gate (Port 8095)
Purpose: Learning-appropriate content filtering
Technology: FAISS similarity search with LVTB (Latvian Treebank)

Gating Process:
1. Similarity Search: Compare against known CEFR-level content
2. Difficulty Assessment: Calculate objective difficulty metrics
3. Learning Suitability: Filter for A1-A2 appropriateness
4. Adaptation Recommendations: Suggest simplifications if needed

### Stage 5: Quality Control & Validation

#### Service: Grammar Gate (Port 8096)
Purpose: Grammatical correctness validation
Technology: UDPipe + rule-based grammar checking

Validation Checks:
- Case Agreement: Subject-object case relationships
- Gender Concordance: Adjective-noun agreement
- Verb Conjugation: Proper tense and person forms
- Word Order: Standard Latvian syntax patterns

#### Service: Grammar Correction (Port 8103)
Purpose: Automated grammar error detection and correction

Correction Capabilities:
- Error Detection: Identify grammatical mistakes
- Repair Suggestions: Provide correction options
- Confidence Scoring: Rate correction certainty
- Learning Integration: Generate educational explanations

#### Service: Comprehensive Validation Gateway (Port 8097)
Purpose: Orchestrate parallel fluency and grammar validation

Processing Flow:

Input Content
    ↓
Parallel Processing:
    β”œβ”€β”€ Fluency Gate (8095) - FAISS similarity scoring
    └── Grammar Gate (8096) - UDPipe rule validation
    ↓
Results Aggregation:
    β”œβ”€β”€ Combined quality score
    β”œβ”€β”€ Content appropriateness rating
    └── Improvement recommendations
    ↓
Validated Output

### Stage 6: Content Generation & Enhancement

#### Service: Template Extractor (Port 8098)
Purpose: Extract linguistic patterns and templates
Database Dependency: PostgreSQL for pattern storage

Extraction Process:
1. Pattern Recognition: Identify recurring language structures
2. Template Generation: Create reusable sentence templates
3. Variation Detection: Find pattern variations and exceptions
4. Educational Tagging: Mark patterns for learning focus

Template Format:

{
  "pattern": "Es [verb] [object]",
  "examples": ["Es lasu grāmatu", "Es dzeru kafiju"],
  "difficulty": "A1",
  "frequency": 0.85
}

#### Service: Constrained Generator (Port 8099)
Purpose: AI-powered content generation with grammatical constraints
Technology: GPT-4.1-mini with custom prompting

Generation Capabilities:
- Exercise Creation: Generate fill-in-the-blank exercises
- Example Sentences: Create usage examples for vocabulary
- Dialogue Extensions: Expand conversation scenarios
- Cultural Adaptations: Localize content for Latvian context

Constraints Applied:
- CEFR Level: A1-A2 vocabulary and grammar only
- Cultural Appropriateness: Latvia-specific contexts
- Grammatical Accuracy: Validated sentence structures
- Learning Objectives: Aligned with educational goals

#### Service: Repair Loop (Port 8100)
Purpose: Iterative content quality improvement

Repair Process:
1. Quality Assessment: Evaluate generated content
2. Error Detection: Identify linguistic or pedagogical issues
3. Iterative Improvement: Apply corrections and refinements
4. Validation Cycles: Multiple passes until quality threshold met
5. Final Approval: Human-readable quality report

### Stage 7: Educational Content Assembly

#### Module: IntelligentCardGenerator
Purpose: Transform processed content into structured educational materials

Card Generation Process:
1. Content Extraction: Pull key vocabulary and phrases
2. GPT Enhancement: Add phonetic transcriptions, examples, grammar notes
3. Morphological Enrichment: Add word structure analysis
4. Image Integration: Generate DALL-E prompts for visual learning
5. Audio Preparation: Prepare for TTS integration

Card Types Generated:
1. Vocabulary Cards: Core word learning
2. Sentence Cards: Contextual usage
3. Pattern Cards: Grammar structure practice
4. Dialogue Cards: Conversation scenarios

#### Anki Integration Pipeline
Service: AnkiConnect (anki-headless:8765)

Export Process:
1. Card Structure: Format for 4-Card Template v2
2. Deduplication: Check against existing Anki database
3. Tag Management: Apply lesson and week tags
4. Image Handling: Generate or retrieve visual content
5. Audio References: Link to HyperTTS audio generation

Final Card Structure:

{
  "deckName": "Latvian (ChatGPT)::Vocab & Sentences",
  "modelName": "4-Card Template v2",
  "fields": {
    "latvian": "grāmata",
    "english": "book",
    "phonetic": "ˈgrɑː.mΙ‘.tΙ‘",
    "gender": "f",
    "plural": "grāmatas",
    "example_lv": "Es lasu jaunu grāmatu.",
    "example_en": "I am reading a new book.",
    "notes": "Feminine noun, 4th declension",
    "morphology": "grāmat + a β€’ Related: grāmatvedΔ«ba, grāmatnieks",
    "image_prompt": "Simple illustration of an open book with Latvian text visible"
  }
}

### File System Organization

/srv/latvian_learning/workspace/jobs//
β”œβ”€β”€ raw_audio.wav              # Original audio input
β”œβ”€β”€ transcript.json            # Whisper output
β”œβ”€β”€ transcript_norm.json       # Normalized transcript
β”œβ”€β”€ dialogue_lines.json        # Segmented content
β”œβ”€β”€ linguistic_analysis/       # UDPipe outputs
β”œβ”€β”€ fluency_scores.json       # Fluency assessment
β”œβ”€β”€ validation_results.json   # Quality control outcomes
β”œβ”€β”€ generated_content/        # AI-generated materials
β”œβ”€β”€ lesson.json              # Final lesson package
β”œβ”€β”€ anki_cards.json          # Anki-ready cards
└── status.json              # Processing state

### Database Dependencies

PostgreSQL Connections:
- tiltstezaurs: Vocabulary and morphological data
- lesson
tracking: Educational progress data
- userpreferences: Personalization settings

Critical Tables:
- vocabulary: Word definitions, CEFR levels, frequency
- morphology: Word structure, inflection patterns
- templates: Language pattern library
- learning
progress: User advancement tracking

### Processing Times (Typical)

| Stage | Duration | Bottleneck |
|-------|----------|------------|
| Audio Transcription | 2-5 minutes | GPU availability |
| Linguistic Analysis | 30-60 seconds | Database queries |
| Fluency Assessment | 15-30 seconds | Vector computations |
| Content Generation | 1-2 minutes | GPT API calls |
| Anki Export | 10-20 seconds | Network I/O |

Total Pipeline: 5-10 minutes for 30-minute audio

### Optimization Strategies

1. GPU Scheduling: Queue management for transcription jobs
2. Database Connection Pooling: Reduce connection overhead
3. Caching: Store intermediate results for similar content
4. Parallel Processing: Concurrent linguistic analysis
5. Batch Operations: Group similar processing tasks

### Failure Points & Recovery

1. GPU Memory Exhaustion:
``bash
# Monitor GPU usage
docker exec asr-transcription-lv nvidia-smi

# Restart service if needed
docker-compose restart asr-transcription-lv
`

2. Database Connection Failures:
`bash
# Test connectivity
docker exec morphological-analyzer-lv nc -z latvian-postgres 5432

# Check PostgreSQL status
docker exec latvian-postgres pg_isready
`

3. Service Communication Timeouts:
`bash
# Test internal network
docker exec latvian-main-site wget -qO- http://udpipe-lv:8092/health

# Check service logs
docker logs comprehensive-validation-gateway --tail 50
`

4. Content Quality Failures:
- Automatic Retry: Repair loop attempts correction
- Manual Review: Flag for human oversight
- Fallback Content: Use simpler alternatives

### Monitoring & Alerting

Health Check Endpoints:
- Individual services:
http://:/health
- Pipeline status:
http://192.168.1.11:5002/services/status
- Processing queue: Monitor
/workspace/jobs/` directories

Key Metrics:
- Transcription accuracy: WER (Word Error Rate)
- Processing throughput: Jobs per hour
- Service availability: Uptime percentage
- Quality scores: Average fluency and grammar ratings
- Resource utilization: GPU, CPU, memory usage

---

Document Status: Complete content processing pipeline mapping
Last Updated: 2026-05-16
Next Steps: Validate service interactions and optimize bottlenecks post-migration

Dashboard Widget Data Flows

CEFR Progress Widget

Location: Top-left card on dashboard

Update Frequency: On page load + manual refresh

Data Source: logs/agent/cefr_summary_latest.json

Components: Vocabulary (35%), Maturity (20%), Quality (15%), Writing (12%), Tests (8%), Grammar (8%)

Primary endpoint: GET /api/cefr

Anki Statistics Widget

Location: Top-right card on dashboard

Update Frequency: On page load

Data Source: AnkiConnect API at anki-headless:8765

Displays: Total cards, Mature cards, Reviews today, Deck health

Primary endpoint: GET /api/anki

Practice Progress Widget

Location: Bottom section of dashboard

Update Frequency: After practice session completion

Data Sources:

  • Writing: logs/memory/writing_scores.jsonl
  • Tests: logs/memory/test_scores.jsonl
  • Grammar: logs/memory/grammar_scores.jsonl
Daily Streak Widget

Location: Small card showing streak counter

Update Frequency: Once per day, on first practice

Data Source: logs/memory/streak.json

Tracks: Current streak, longest streak, total days

Primary endpoint: GET /api/daily/streak

Recent Activity Widget

Location: Right sidebar or bottom section

Update Frequency: Real-time after activities

Data Sources:

  • Sessions: PostgreSQL study_sessions table
  • Lessons: workspace/jobs/done/
  • Achievements: logs/memory/achievements.jsonl
Widget Performance Characteristics
Loading Priority:
  1. CEFR progress (critical path)
  2. Anki statistics (async)
  3. Recent activity (lazy)
Caching Strategy:
  • CEFR: JSON file cache
  • Anki: Real-time queries
  • Practice: Append-only logs
Error Handling:
  • Graceful degradation
  • Retry with exponential backoff
  • Fallback to cached data

Critical User Action Journeys

Complete system flows from button click to database updates

Submit for Correction (Writing)
  1. User Action: Types Latvian text & clicks "Submit"
  2. API Call: POST /api/practice/writing/submit
  3. Processing: GPT-4.1 correction analysis
  4. Storage: Append to writing_scores.jsonl
  5. Updates: CEFR component + streak tracking
  6. Response: Corrections + updated score
Risk: HIGH - GPT API timeout can block user
Generate Test (Practice Quiz)
  1. User Action: Selects test type & count
  2. API Call: POST /api/practice/test/generate
  3. Selection: Query weak/recent cards from Anki
  4. Generation: Create 5 exercise types (20% each)
  5. Display: First question with audio player
  6. Submission: POST per-question answers
Risk: HIGH - AnkiConnect unavailable = no exercises
Create Lesson Plan
  1. User Action: Input lesson content & title
  2. API Call: POST /api/plan/create
  3. Processing: GPT extracts plan structure
  4. Archive: Current plan + carry-forward items
  5. Storage: Save as YAML in lesson_plans/
  6. Sync: Auto-sync vocabulary to Anki
Risk: MEDIUM - File system permissions required
Check Exercise Answer
  1. User Action: Submits exercise answer
  2. API Call: POST /api/practice/test/check
  3. Validation: Normalize & compare answer
  4. Feedback: Generate contextual feedback
  5. Update: Record result in memory store
  6. Display: Show correctness + explanation
Risk: LOW - Local processing, no external deps
Service Dependencies Map

All user actions flow through Flask β†’ optionally GPT-4.1 API β†’ optionally AnkiConnect β†’ optionally PostgreSQL β†’ File System for persistence

Dashboard Widget Data Flows - Detailed

graph TD
    A[User Browser] --> B[Flask Dashboard Route /]
    B --> C[dashboard.html Template]
    C --> D[JavaScript Widget Loaders]
    D --> E[API Endpoints]
    E --> F[Data Sources]
    
    subgraph "Widget Types"
        G[CEFR Progress]
        H[Anki Statistics]
        I[Practice Progress]
        J[Daily Streak]
        K[Recent Activity]
    end
    
    subgraph "Data Sources"
        L[PostgreSQL Database]
        M[AnkiConnect API]
        N[JSONL Log Files]
        O[Memory Store Files]
    end

### Widget Display
- Location: Top-left card on dashboard
- Elements: Current level (A1, A1+, A2), numeric score, progress bar, component breakdown
- Update Frequency: On page load + manual refresh

### Data Flow

sequenceDiagram
    participant Browser
    participant Dashboard
    participant API
    participant CEFR
    participant DB
    participant Anki
    participant Files
    
    Browser->>Dashboard: GET /
    Dashboard->>Browser: dashboard.html + skeleton
    Browser->>API: GET /api/cefr
    
    API->>CEFR: get_cefr_summary()
    CEFR->>Files: Read logs/agent/cefr_summary_latest.json
    CEFR->>API: Return cached CEFR data
    
    API->>DB: Query study_sessions for reviews_today
    alt DB Available
        DB->>API: Return review count
    else DB Unavailable
        API->>Anki: AnkiConnect getNumCardsReviewedToday
        Anki->>API: Return review count
    end
    
    API->>Anki: Check card stats if cached=0
    Anki->>API: findCards queries for deck stats
    
    API->>Browser: Complete CEFR summary JSON
    Browser->>Browser: Render progress bar + components

### Data Sources

| Component | Weight | Primary Source | Fallback | Update Method |
|-----------|--------|---------------|----------|---------------|
| Vocabulary | 35% | cefrsummarylatest.json | None | Updated by cefrtracker.py |
| Maturity | 20% | cefr
summarylatest.json | None | Updated by cefrtracker.py |
| Quality | 15% | cefrsummarylatest.json | None | Updated by cefrtracker.py |
| Writing | 12% | memory
store/writingscores.jsonl | Empty | Updated by practice submissions |
| Tests | 8% | memory
store/testscores.jsonl | Empty | Updated by practice submissions |
| Grammar | 8% | memory
store/grammarscores.jsonl | Empty | Updated by practice submissions |
| Reviews Today | N/A | PostgreSQL study
sessions | AnkiConnect | Real-time |

### API Endpoints
- Primary: GET /api/cefr
- History: GET /api/cefr/history
- Response Format:

{
  "level": "A1+",
  "score": 0.85,
  "components": {
    "vocabulary": {"score": 0.9, "weight": 0.35},
    "maturity": {"score": 0.8, "weight": 0.20}
  },
  "anki": {
    "reviews_today": 25,
    "total_cards": 450,
    "mature_cards": 120
  }
}

### Troubleshooting
1. Widget shows "Loading...": Check /api/cefr endpoint
2. Reviews today = 0: Check AnkiConnect health at anki-headless:8765
3. Components missing: Check file existence: logs/agent/cefrsummarylatest.json
4. Stale data: Check cefr_tracker.py last run in systemd logs

### Widget Display
- Location: Top-right card on dashboard
- Elements: Total cards, mature cards, reviews today, deck health
- Update Frequency: On page load

### Data Flow

sequenceDiagram
    participant Browser
    participant API
    participant Anki
    
    Browser->>API: GET /api/anki
    API->>Anki: findNotes deck:"Latvian (ChatGPT)"
    Anki->>API: note_ids array
    API->>Anki: findCards deck:"Latvian (ChatGPT)"  
    Anki->>API: card_ids array
    API->>Anki: getNumCardsReviewedToday
    Anki->>API: reviews_today count
    API->>Browser: Combined statistics

### Data Source Details
- Service: AnkiConnect API at anki-headless:8765
- Deck Filter: deck:"Latvian (ChatGPT)"
- Real-time: Direct API queries (no caching)

### API Response

{
  "total_notes": 150,
  "total_cards": 450,
  "reviews_today": 25,
  "available": true
}

### Troubleshooting
1. Widget shows "Anki Unavailable": Check container docker ps | grep anki-headless
2. Zero cards despite content: Check deck name exactly matches filter
3. Timeout errors: Check network connectivity between containers

### Widget Display
- Location: Bottom section of dashboard
- Elements: Writing accuracy, test scores, grammar progress
- Update Frequency: After practice session completion

### Data Flow

sequenceDiagram
    participant Browser
    participant Practice
    participant Memory
    participant Dashboard
    
    Browser->>Practice: Submit practice (writing/test/grammar)
    Practice->>Memory: Append to scores.jsonl
    Practice->>Browser: Success response
    
    Note over Browser: User navigates back to dashboard
    Browser->>Dashboard: GET /api/data
    Dashboard->>Memory: Read recent scores from JSONL
    Memory->>Dashboard: Return last N scores
    Dashboard->>Browser: Practice statistics

### Data Sources
| Practice Type | File Location | Update Trigger | Display Metric |
|---------------|---------------|----------------|----------------|
| Writing | logs/memory/writingscores.jsonl | POST /api/practice/writing/submit | Average correction score |
| Tests | logs/memory/test
scores.jsonl | POST /api/practice/test/submit | Average accuracy |
| Grammar | logs/memory/grammar_scores.jsonl | POST /api/grammar/submit | Average score |

### File Format Example

{"timestamp": "2026-05-17T10:30:00", "score": 85.5, "corrections": 3}
{"timestamp": "2026-05-17T11:45:00", "score": 92.1, "corrections": 1}

### Troubleshooting
1. No practice data: Check JSONL file existence and permissions
2. Stale scores: Verify practice submission endpoints work
3. Score calculation errors: Check JSON formatting in JSONL files

### Widget Display
- Location: Small card showing streak counter
- Elements: Current streak days, streak emoji/icon
- Update Frequency: Once per day, updated on first practice

### Data Flow

sequenceDiagram
    participant Browser
    participant API
    participant Memory
    participant File
    
    Browser->>API: GET /api/daily/streak
    API->>Memory: get_memory_store()
    Memory->>File: Read logs/memory/streak.json
    File->>Memory: Current streak data
    Memory->>API: Streak info
    API->>Browser: {"current_streak": 7, "last_practice": "2026-05-17"}

### Data Source
- File: logs/memory/streak.json
- Format:

{
  "current_streak": 7,
  "last_practice_date": "2026-05-17",
  "longest_streak": 12,
  "total_days": 45
}

### Update Triggers
- Record Practice: POST to /api/daily/record
- Automatic: First practice session each day extends streak
- Break: Missing a day resets streak to 0

### Troubleshooting
1. Streak not updating: Check POST /api/daily/record endpoint
2. Incorrect dates: Check system timezone settings
3. File corruption: Restore from backup or reset streak

### Widget Display
- Location: Right sidebar or bottom section
- Elements: Recent lesson completions, practice sessions, achievements
- Update Frequency: Real-time after activities

### Data Flow

sequenceDiagram
    participant Browser
    participant API
    participant DB
    participant Jobs
    
    Browser->>API: GET /api/data (includes recent activity)
    API->>DB: Query study_sessions recent
    DB->>API: Recent session data
    API->>Jobs: Check workspace/jobs/done/
    Jobs->>API: Recent lesson completions
    API->>Browser: Combined activity feed

### Data Sources
| Activity Type | Source | Query/Path |
|---------------|--------|------------|
| Practice Sessions | PostgreSQL | studysessions WHERE startedat > now() - interval '7 days' |
| Lesson Processing | File System | workspace/jobs/done/*/metadata.json |
| Achievements | Memory Store | logs/memory/achievements.jsonl |

### Auto-Refresh Elements
- CEFR Progress: Updates after any practice submission
- Anki Statistics: Static (requires page reload)
- Streak Counter: Updates after daily practice recording
- Recent Activity: Updates immediately after activities

### Manual Refresh
- Full Dashboard: F5 or browser refresh
- Individual Widgets: Click refresh icon where available
- API Polling: Some widgets poll every 30s when tab is active

### Widget Loading Priority
1. Critical Path: CEFR progress (blocks above-fold content)
2. Async Loading: Anki statistics, practice progress
3. Lazy Loading: Recent activity (only when scrolled into view)

### Caching Strategy
- CEFR Data: Cached in JSON file, updated by background process
- Anki Data: No caching (real-time queries)
- Practice Scores: File-based append-only logs
- Streak Data: Single JSON file (fast read)

### Error Handling
- Graceful Degradation: Missing data shows placeholder instead of error
- Retry Logic: Failed API calls retry 3x with exponential backoff
- Fallback Data: Use cached/stale data when services unavailable

---

Document Status: Complete data flow mapping for all dashboard widgets
Last Updated: 2026-05-17
Next Steps: Create button click journey diagrams

Button Click Journeys

### 1. "Submit for Correction" (Writing Practice)

#### User Journey Map

sequenceDiagram
    participant User
    participant Browser
    participant Flask
    participant WritingProcessor
    participant GPT
    participant MemoryStore
    participant PostgreSQL
    participant CEFRTracker
    participant Dashboard
    
    User->>Browser: Type Latvian text and click Submit for Correction
    Browser->>Flask: POST /api/practice/writing/submit
    Note over Browser, Flask: POST body contains text and prompt fields
    
    Flask->>WritingProcessor: process_correction(text, prompt)
    WritingProcessor->>GPT: GPT-4.1 correction analysis
    GPT->>WritingProcessor: Corrections + score + feedback
    WritingProcessor->>Flask: Structured correction result
    
    Flask->>MemoryStore: Append to writing_scores.jsonl
    Note over MemoryStore: Appends timestamp, score, and corrections count
    
    Flask->>PostgreSQL: Shadow-write per-lemma errors to learner_state
    Note over PostgreSQL: Per-word correct/incorrect records for spaced repetition
    
    Flask->>CEFRTracker: update_component('writing', score)
    CEFRTracker->>MemoryStore: Update cefr_summary_latest.json
    
    Flask->>Browser: Correction results + updated CEFR
    Browser->>Dashboard: Update writing component score
    Dashboard->>User: Display corrections with highlighted errors

#### Detailed Flow Steps

1. Frontend Interaction

// Location: templates/practice.html
function submitWriting() {
    const text = document.getElementById('writingText').value;
    const prompt = document.getElementById('promptSelect').value;
    
    fetch('/api/practice/writing/submit', {
        method: 'POST',
        headers: {'Content-Type': 'application/json'},
        body: JSON.stringify({text: text, prompt: prompt})
    })
    .then(response => response.json())
    .then(data => displayCorrections(data));
}

2. Backend Processing

# Location: blueprints/practice_bp.py:271
@practice_bp.route('/api/practice/writing/submit', methods=['POST'])
def api_practice_writing_submit():
    data = request.get_json()
    text = data.get('text', '').strip()
    prompt_key = data.get('prompt', 'daily_life')
    
    # Step 1: GPT Processing
    processor = get_writing_processor()
    result = processor.process_correction(text, prompt_key)
    
    # Step 2: Memory Store Update
    memory = get_memory_store()
    memory.log_writing_score(result.get('score', 0.5))
    
    # Step 3: CEFR Component Update  
    cefr = get_cefr_tracker()
    cefr_result = cefr.update_component('writing', result.get('score'))
    
    # Step 4: Shadow Write to PostgreSQL (per-lemma errors)
    _shadow_write_writing(result.get('corrections', []), plan_id=None)
    
    return jsonify(result)

3. Data Persistence Points

| Step | Location | Data Format | Purpose |
|------|----------|-------------|---------|
| Memory Store | logs/memory/writingscores.jsonl | {"timestamp": "...", "score": 85.5} | CEFR calculation input |
| CEFR Update | logs/agent/cefr
summarylatest.json | Component scores with weights | Dashboard display |
| PostgreSQL | learner
state table | Per-lemma correct/incorrect records | Spaced repetition |
| Daily Suggester | logs/memory/streak.json | Practice activity recording | Streak tracking |

#### Failure Points & Diagnostics

1. GPT API Timeout (HIGH RISK)
- Symptoms: "Processing..." spinner never completes
- Check: curl -X POST localhost:5002/api/practice/writing/submit with test data
- Logs: Flask container logs for OpenAI API errors

2. File Write Permissions (MEDIUM RISK)
- Symptoms: No score updates in CEFR dashboard
- Check: File permissions on logs/memory/writingscores.jsonl
- Recovery: Fix permissions and retry submission

3. PostgreSQL Connection (LOW RISK - Shadow Write)
- Symptoms: Warning in logs but submission still works
- Check: Container connectivity to latvian-postgres:5432
- Impact: Spaced repetition data missing

### 2. "Generate Test" (Practice Quiz)

#### User Journey Map

sequenceDiagram
    participant User
    participant Browser
    participant Flask
    participant ExerciseGenerator
    participant AnkiConnect
    participant PostgreSQL
    participant AdaptiveEngine
    
    User->>Browser: Select test type and count then click Generate Test
    Browser->>Flask: POST /api/practice/test/generate
    Note over Browser, Flask: POST body with type mixed, count 10, strategy weak
    
    Flask->>ExerciseGenerator: generate_mixed_test(count, strategy)
    ExerciseGenerator->>AnkiConnect: Query weak cards from deck
    AnkiConnect->>ExerciseGenerator: Card data with intervals
    ExerciseGenerator->>AdaptiveEngine: Create varied exercise formats
    AdaptiveEngine->>Flask: Exercise array with audio references
    
    Flask->>Browser: Returns exercises array and total count
    Browser->>User: Display first question with audio player
    
    User->>Browser: Submit answer for each question
    Browser->>Flask: POST per question check endpoint
    Flask->>AdaptiveEngine: check_answer(exercise, user_answer)
    AdaptiveEngine->>Flask: Returns correct status and feedback
    
    User->>Browser: Click Submit Test after all answers
    Browser->>Flask: POST /api/practice/test/submit
    Flask->>PostgreSQL: Record study_session with scores
    Flask->>CEFRTracker: update_component('tests', final_score)
    Flask->>Browser: Final score + updated CEFR

#### Exercise Generation Process

1. Vocabulary Selection

# Location: modules/exercise_generator.py
def generate_mixed_test(self, count, strategy, plan_id=None):
    # Strategy determines card selection
    if strategy == 'weak':
        # Query cards with low intervals or high fail rates
        cards = self.anki_connect.get_struggling_cards(limit=count*2)
    elif strategy == 'recent':  
        # Query recently added cards
        cards = self.anki_connect.get_recent_cards(days=7, limit=count*2)
    else:  # random
        # Random sample from entire deck
        cards = self.anki_connect.get_random_cards(limit=count*2)
    
    # Generate 5 exercise types (20% each for mixed)
    exercises = []
    per_type = count // 5
    exercises.extend(self._generate_fill_blank(cards[:per_type]))
    exercises.extend(self._generate_translation(cards[per_type:per_type*2]))
    # ... more types
    
    random.shuffle(exercises)
    return {"exercises": exercises, "total": len(exercises)}

2. Answer Checking with Feedback

# Location: modules/adaptive_exercise_engine.py  
def check_answer(self, exercise, user_answer):
    correct_answer = exercise['correct_answer']
    user_clean = self._normalize_answer(user_answer)
    correct_clean = self._normalize_answer(correct_answer)
    
    is_correct = user_clean == correct_clean
    
    # Generate contextual feedback
    feedback = self._generate_feedback(exercise, is_correct, user_answer)
    
    return {
        "correct": is_correct,
        "correct_answer": correct_answer,
        "feedback": feedback,
        "explanation": exercise.get('explanation', '')
    }

#### Failure Points & Diagnostics

1. AnkiConnect Unavailable (HIGH RISK)
- Symptoms: "No exercises available" error
- Check: curl http://anki-headless:8765 for connectivity
- Recovery: Restart anki-headless container

2. Empty Deck (MEDIUM RISK)
- Symptoms: Generated exercises but no content
- Check: Verify cards exist: AnkiConnect findCards query
- Recovery: Import vocabulary content first

3. Audio Files Missing (LOW RISK)
- Symptoms: Listening exercises show broken audio players
- Check: Mount point /mnt/data/apps/anki-media accessible
- Impact: Listening exercises fail, other types work

### 3. "Create Lesson Plan" (Weekly Setup)

#### User Journey Map

sequenceDiagram
    participant User
    participant Browser
    participant Flask
    participant PlanExtractor
    participant GPT
    participant FileSystem
    participant AnkiSync
    participant PostgreSQL
    
    User->>Browser: Input lesson content then click Create Plan
    Browser->>Flask: POST /api/plan/create
    Note over Browser, Flask: POST body with lesson text and title fields
    
    Flask->>FileSystem: Check for existing active plan
    FileSystem->>Flask: Current plan details
    Flask->>Flask: Archive current plan with carry-forward
    
    Flask->>PlanExtractor: extract_lesson_plan(text, title)
    PlanExtractor->>GPT: Parse vocabulary, grammar, exercises
    GPT->>PlanExtractor: Structured lesson plan
    PlanExtractor->>Flask: Lesson plan object
    
    Flask->>FileSystem: Save new plan as YAML
    Note over FileSystem: lesson_plans/2025_w48.yaml
    
    Flask->>AnkiSync: Auto-sync vocabulary to Anki
    AnkiSync->>AnkiConnect: Create cards with week_48 tags
    AnkiConnect->>AnkiSync: Card creation results
    
    Flask->>PostgreSQL: Initialize progress tracking
    Note over PostgreSQL: vocabulary table with spaced repetition
    
    Flask->>Browser: Plan creation success + plan_id
    Browser->>User: Redirect to new plan detail page

#### Lesson Plan Lifecycle

1. Archive & Carry-Forward

# Location: blueprints/plan/routes.py
def create_plan():
    # Step 1: Archive current active plan
    current_plan = get_active_plan()
    if current_plan:
        carry_forward = calculate_carry_forward(current_plan)
        archive_plan(current_plan['id'], carry_forward)
    
    # Step 2: Create new plan with carried items
    new_plan = extract_plan_from_input(data)
    if carry_forward:
        new_plan['vocabulary'].extend(carry_forward['vocabulary'])
        new_plan['writing_prompts'].extend(carry_forward['writing'])
    
    # Step 3: Save and activate
    save_plan_yaml(new_plan)
    return {"plan_id": new_plan['id'], "status": "active"}

2. Anki Integration

# Location: modules/lesson_plan_anki_sync.py
def sync_plan_to_anki(plan_id):
    plan = load_plan_yaml(plan_id)
    
    # Remove old current_week tags
    anki.remove_tags_by_pattern("current_week")
    
    # Add new cards with tags
    for vocab in plan['vocabulary']:
        card_data = {
            "deckName": "Latvian (ChatGPT)",
            "modelName": "4-Card Template v2", 
            "fields": generate_card_fields(vocab),
            "tags": [f"week_{plan['week_number']}", "current_week"]
        }
        anki.addNote(card_data)

#### Failure Points & Diagnostics

1. GPT Extraction Errors (HIGH RISK)
- Symptoms: Malformed lesson plans with missing sections
- Check: GPT API response logs for parsing errors
- Recovery: Manual plan editing via web interface

2. File System Permissions (MEDIUM RISK)
- Symptoms: "Unable to save plan" error
- Check: Write permissions on lesson
plans/ directory
- Recovery: Fix permissions and retry

3. Anki Sync Failures (LOW RISK)
- Symptoms: Plan created but cards missing in Anki
- Check: AnkiConnect availability and deck structure
- Impact: Manual card creation needed

### API Endpoint Summary

| User Action | Primary Endpoint | Secondary Calls | Database Updates |
|-------------|------------------|-----------------|------------------|
| Submit Writing | /api/practice/writing/submit | GPT-4.1, CEFR tracker | writingscores.jsonl, learnerstate |
| Generate Test | /api/practice/test/generate | AnkiConnect queries | None (read-only) |
| Submit Test | /api/practice/test/submit | CEFR tracker | testscores.jsonl, studysessions |
| Create Plan | /api/plan/create | GPT extraction, Anki sync | Plan YAML, vocabulary table |
| Check Exercise | /api/plan//exercises/check | Adaptive engine | progress.yaml, learnerstate |

### Service Dependencies

graph TD
    A[User Browser] --> B[Flask Dashboard]
    B --> C[GPT-4.1 API]
    B --> D[AnkiConnect :8765]
    B --> E[PostgreSQL :5432]
    B --> F[File System]
    
    C --> G[OpenAI Services]
    D --> H[Anki Desktop]
    E --> I[Learning Database]
    F --> J[Logs & YAML Files]
    
    subgraph "Failure Impact"
        K[GPT Unavailable: No corrections/extractions]
        L[Anki Unavailable: No test generation]
        M[DB Unavailable: No progress tracking]
        N[File System: No plan persistence]
    end

### Error Propagation Patterns

Graceful Degradation Hierarchy:
1. Critical Path: User gets response even if secondary features fail
2. Shadow Operations: PostgreSQL writes fail silently, don't block user
3. Background Tasks: File system operations retry 3x before failing
4. External APIs: GPT and AnkiConnect have circuit breakers

Recovery Strategies:
- Immediate: Retry failed operations automatically (3x max)
- Manual: Admin endpoints to replay failed operations
- Background: Periodic cleanup jobs fix inconsistent state
- Data: Point-in-time backups of critical YAML and JSONL files

---

Document Status: Complete button click journey mapping
Last Updated: 2026-05-17
Next Steps: Create troubleshooting decision trees for each journey

System Troubleshooting Guide

Diagnostic Steps:
# Check API endpoints
curl -s http://localhost:5002/api/cefr | jq '.'
curl -s http://localhost:8765 -d '{"action":"version","version":6}' | jq '.'

# Check data files
ls -la /srv/latvian_learning/logs/agent/cefr_summary_latest.json
ls -la /srv/latvian_learning/logs/memory/*.jsonl

# Check service health
docker ps | grep latvian
systemctl --user status latvian_dashboard.service
Resolution Steps:
  1. CEFR Widget: Check /api/cefr endpoint and cefr_summary_latest.json file
  2. Anki Widget: Verify anki-headless container is running
  3. Practice Widget: Check JSONL file permissions
  4. Streak Widget: Verify streak.json exists and is valid JSON

Quick Checks:
# Test endpoint
curl -X POST http://localhost:5002/api/practice/writing/submit \
  -H "Content-Type: application/json" \
  -d '{"text":"Es mācu latvieőu valodu","prompt":"daily_life"}'

# Check GPT API key
docker exec latvian-main-site env | grep OPENAI

# Test file permissions
touch /srv/latvian_learning/logs/memory/test_write.txt
ls -la /srv/latvian_learning/logs/memory/
Common Issues:
  • Loading spinner forever: GPT API timeout - check API key and quota
  • "Invalid text" error: Check text length (min 5 chars, max 500)
  • "Score not updated": Check write permissions on logs/memory/
  • Browser console errors: Check network tab in Developer Tools (F12)

Diagnostic Commands:
# Check AnkiConnect
curl -s http://localhost:8765 -d '{"action":"version","version":6}'

# Check deck access
curl -s http://localhost:8765 -d '{
  "action": "findCards",
  "version": 6,
  "params": {"query": "deck:\"Latvian (ChatGPT)\""}
}' | jq '.result | length'

# Check audio files
ls -la /mnt/data/apps/anki-media/ | head -10
find /mnt/data/apps/anki-media -name "*.mp3" | wc -l
Resolution:
  • "No exercises available": AnkiConnect unreachable - restart container
  • Empty deck: Import vocabulary first
  • Missing audio: Run HyperTTS batch generation
  • Wrong deck name: Check deck filter in code matches actual deck

Test Commands:
# Test endpoint
curl -X POST http://localhost:5002/api/plan/create \
  -H "Content-Type: application/json" \
  -d '{"text":"Week 1: Basic greetings...","title":"Greetings"}'

# Check file permissions
ls -la /srv/latvian_learning/lesson_plans/
touch /srv/latvian_learning/lesson_plans/test.yaml

# Check extraction logs
docker logs latvian-main-site | grep -i "lesson.*extract"
Troubleshooting Path:
  1. No response: Check Flask endpoint is reachable
  2. Timeout: Check GPT API performance and quota
  3. Validation error: Check input format (min 100 chars)
  4. Save error: Fix permissions on lesson_plans/ directory

Verification Commands:
# Check score files
tail -5 /srv/latvian_learning/logs/memory/writing_scores.jsonl
tail -5 /srv/latvian_learning/logs/memory/test_scores.jsonl

# Check CEFR update
cat /srv/latvian_learning/logs/agent/cefr_summary_latest.json | jq '.components'

# Check streak
cat /srv/latvian_learning/logs/memory/streak.json

# Check database
docker exec latvian-postgres psql -U latvian_user -d latvian_db -c "SELECT COUNT(*) FROM study_sessions WHERE created_at > NOW() - INTERVAL '1 day';"
Common Causes:
  • CEFR not updating: Check practice submission succeeded first
  • Stale data: Check cefr_tracker.py background process
  • Streak not recording: Call POST /api/daily/record endpoint
  • Wrong date: Check system timezone settings

Service Recovery:
# Restart core services
docker restart latvian-main-site anki-headless latvian-postgres

# Wait for services
sleep 10

# Check health
curl -s http://localhost:5002/health | jq '.microservices'
File System Recovery:
# Fix permissions
sudo chown -R david:david /srv/latvian_learning/logs/
sudo chmod -R 755 /srv/latvian_learning/lesson_plans/

# Reset streak counter
echo '{"current_streak": 0, "last_practice_date": null}' > \
     /srv/latvian_learning/logs/memory/streak.json

# Restore from backup if available
cp /srv/latvian_learning/backups/cefr_summary_latest.json.backup \
   /srv/latvian_learning/logs/agent/cefr_summary_latest.json
When to Escalate:
  • Multiple container failures simultaneously
  • Database completely inaccessible
  • All API endpoints returning 500 errors
  • File system corruption detected
Troubleshooting Tips
  • Always check Docker first: Most issues are container health
  • Check logs early: docker logs [container] --tail 50
  • Test endpoints directly: Use curl before debugging frontend
  • Verify file permissions: Many silent failures are permission issues
  • Check API quotas: GPT API quota exhaustion is common
Troubleshooting User Flows - With Diagrams

### System Health Check

# 1. Check all containers
docker ps --format "table {{.Names}}\t{{.Status}}\t{{.Ports}}"

# 2. Check main dashboard health
curl -s http://localhost:5002/health | jq '.microservices'

# 3. Check critical services
curl -s http://localhost:8765 >/dev/null && echo "Anki: OK" || echo "Anki: FAIL"
curl -s http://localhost:5432 >/dev/null && echo "DB: OK" || echo "DB: FAIL"

# 4. Check disk space and permissions
df -h /srv/latvian_learning
ls -la /srv/latvian_learning/logs/

### Decision Tree

graph TD
    A["Dashboard widget not loading"] --> B{Which widget}
    B -->|CEFR Progress| C["Check api cefr endpoint"]
    B -->|Anki Statistics| D["Check AnkiConnect"]
    B -->|Practice Progress| E["Check JSONL files"]
    B -->|Daily Streak| F["Check streak json"]
    C --> G{api cefr responds}
    G -->|No| H["Check Flask container logs"]
    G -->|Yes empty| I["Check cefr summary file exists"]
    G -->|Yes stale| J["Check cefr tracker last run"]
    D --> K{AnkiConnect reachable}
    K -->|No| L["Check anki headless container"]
    K -->|Yes empty| M["Check deck name matches filter"]
    E --> N{JSONL files exist}
    N -->|No| O["User has not done practice yet"]
    N -->|Yes empty| P["Check file permissions"]
    F --> R{streak file exists}
    R -->|No| S["Initialize with default values"]
    R -->|Yes corrupt| T["Restore from backup"]

### Diagnostic Steps

CEFR Progress Widget Failure

# Test API endpoint directly
curl -s http://localhost:5002/api/cefr | jq '.'

# Check if CEFR file exists and is recent
ls -la /srv/latvian_learning/logs/agent/cefr_summary_latest.json
stat /srv/latvian_learning/logs/agent/cefr_summary_latest.json

# Check systemd service that updates CEFR
systemctl --user status latvian_dashboard.service
journalctl --user -u latvian_dashboard.service -n 50

Anki Statistics Widget Failure

# Test AnkiConnect directly
curl -s http://localhost:8765 -d '{"action":"version","version":6}' | jq '.'

# Check container status
docker ps | grep anki-headless
docker logs anki-headless --tail 20

# Test deck query
curl -s http://localhost:8765 -d '{"action":"findCards","version":6,"params":{"query":"deck:\"Latvian (ChatGPT)\""}}' | jq '.result | length'

### Decision Tree

graph TD
    A["Submit button clicked"] --> B{Button responds}
    B -->|No response| C["Check browser console errors"]
    B -->|Loading forever| D["Check writing submit endpoint"]
    B -->|Error message| E{What error}
    C --> F["JavaScript errors in frontend"]
    C --> G["Network connectivity issues"]
    D --> H{Endpoint reachable}
    H -->|No| I["Check Flask container status"]
    H -->|Yes timeout| J["Check GPT API key and quota"]
    H -->|Yes error| K["Check Flask logs"]
    E -->|Invalid text| L["Check text length validation"]
    E -->|Processing failed| M["GPT API or processing error"]
    E -->|Score not updated| N["Check file write permissions"]
    J --> O{GPT API working}
    O -->|No API key| P["Set OPENAI API KEY env var"]
    O -->|Quota exceeded| Q["Check OpenAI billing"]
    O -->|Rate limited| R["Wait and retry"]
    N --> S{JSONL file writable}
    S -->|No| T["Fix permissions on logs memory dir"]
    S -->|Yes| U["Check disk space"]

### Diagnostic Commands

Frontend Issues

# Check browser console (F12 Developer Tools)
# Look for JavaScript errors or failed network requests

# Test endpoint with curl
curl -X POST http://localhost:5002/api/practice/writing/submit \
  -H "Content-Type: application/json" \
  -d '{"text":"Es mācu latvieőu valodu","prompt":"daily_life"}'

Backend Processing Issues

# Check Flask container logs
docker logs latvian-main-site --tail 50 | grep -i error

# Check GPT API connectivity
export OPENAI_API_KEY="your-key-here"
curl -H "Authorization: Bearer $OPENAI_API_KEY" \
  https://api.openai.com/v1/models | jq '.data[] | select(.id=="gpt-4.1")'

# Check file permissions
ls -la /srv/latvian_learning/logs/memory/writing_scores.jsonl
touch /srv/latvian_learning/logs/memory/test_write.txt

GPT API Troubleshooting

# Check environment variable
docker exec latvian-main-site env | grep OPENAI

# Test with minimal request
docker exec latvian-main-site python3 -c "
import openai
import os
client = openai.OpenAI(api_key=os.getenv('OPENAI_API_KEY'))
try:
    response = client.models.list()
    print('GPT API: OK')
except Exception as e:
    print(f'GPT API Error: {e}')
"

### Decision Tree

graph TD
    A["Generate Test clicked"] --> B{Any exercises returned}
    B -->|No exercises| C["Check AnkiConnect"]
    B -->|Partial exercises| D["Check specific exercise types"]
    B -->|Malformed| E["Check exercise generator logic"]
    C --> F{AnkiConnect responds}
    F -->|No| G["Restart anki headless container"]
    F -->|Empty deck| H["Import vocabulary first"]
    F -->|Wrong deck name| I["Check deck filter in code"]
    D --> J{Which types missing}
    J -->|Listening| K["Check audio files availability"]
    J -->|Translation| L["Check card field mappings"]
    J -->|Multiple choice| M["Check distractor generation"]
    K --> N{Audio mount accessible}
    N -->|No| O["Check anki media mount"]
    N -->|Files missing| P["Run HyperTTS batch generation"]

### Diagnostic Commands

AnkiConnect Issues

# Test basic connectivity
curl -s http://localhost:8765 -d '{"action":"version","version":6}'

# Test deck access
curl -s http://localhost:8765 -d '{
  "action": "findCards",
  "version": 6,
  "params": {"query": "deck:\"Latvian (ChatGPT)\""}
}' | jq '.result | length'

# Check for cards with audio
curl -s http://localhost:8765 -d '{
  "action": "findCards", 
  "version": 6,
  "params": {"query": "deck:\"Latvian (ChatGPT)\" has:audio"}
}' | jq '.result | length'

Audio Files Check

# Check mount point
ls -la /mnt/data/apps/anki-media/ | head -10

# Count audio files
find /mnt/data/apps/anki-media -name "*.mp3" | wc -l

# Check specific card audio
ls -la /mnt/data/apps/anki-media/hypertts_* | head -5

### Decision Tree

graph TD
    A["Create Plan clicked"] --> B{Plan creation response}
    B -->|No response| C["Check Flask endpoint"]
    B -->|Validation error| D["Check input format"]
    B -->|Processing error| E["Check GPT extraction"]
    B -->|Save error| F["Check file system"]
    C --> G{Plan create endpoint reachable}
    G -->|No| H["Check Flask container"]
    G -->|Timeout| I["Check GPT API performance"]
    D --> J{What validation failed}
    J -->|Empty text| K["Require minimum content"]
    J -->|Invalid format| L["Check GPT prompt parsing"]
    E --> M{GPT extraction working}
    M -->|No API access| N["Check OpenAI credentials"]
    M -->|Poor extraction| O["Review GPT prompt quality"]
    M -->|Timeout| P["Increase timeout or retry"]
    F --> Q{File system writable}
    Q -->|Permission denied| R["Fix lesson plans permissions"]
    Q -->|Disk full| S["Clean up old files"]
    Q -->|Archive failed| T["Check current plan exists"]

### Diagnostic Commands

Plan Creation Flow

# Test endpoint directly
curl -X POST http://localhost:5002/api/plan/create \
  -H "Content-Type: application/json" \
  -d '{"text":"Week 1: Basic greetings...","title":"Greetings"}'

# Check file system permissions
ls -la /srv/latvian_learning/lesson_plans/
touch /srv/latvian_learning/lesson_plans/test.yaml

# Check current active plan
ls -la /srv/latvian_learning/lesson_plans/*.yaml | grep -v _progress

GPT Extraction Issues

# Check extraction logs
docker logs latvian-main-site | grep -i "lesson.*extract"

# Test GPT extraction manually
docker exec -it latvian-main-site python3 -c "
from agent.modules.lesson_plan_extractor import LessonPlanExtractor
extractor = LessonPlanExtractor()
result = extractor.extract_plan('Week 1: Basic greetings', 'Greetings')
print(result)
"

### Decision Tree

graph TD
    A["Practice done but progress unchanged"] --> B{Which component affected}
    B -->|CEFR score| C["Check component file updates"]
    B -->|Dashboard display| D["Check API responses"]
    B -->|Streak counter| E["Check streak recording"]
    C --> F{Relevant JSONL updated}
    F -->|No| G["Check practice submission success"]
    F -->|Yes no change| H["Check CEFR calculation logic"]
    D --> I{API returns current data}
    I -->|No| J["Check API endpoint health"]
    I -->|Stale data| K["Check cache refresh logic"]
    E --> L{Streak file updated}
    L -->|No| M["Check daily record call"]
    L -->|Wrong date| N["Check system timezone"]

### Diagnostic Commands

CEFR Progress Tracking

# Check practice score files
tail -5 /srv/latvian_learning/logs/memory/writing_scores.jsonl
tail -5 /srv/latvian_learning/logs/memory/test_scores.jsonl

# Check CEFR summary update
stat /srv/latvian_learning/logs/agent/cefr_summary_latest.json
cat /srv/latvian_learning/logs/agent/cefr_summary_latest.json | jq '.components'

# Test CEFR calculation
docker exec latvian-main-site python3 -c "
from agent.modules.cefr_tracker import CEFRTracker
tracker = CEFRTracker()
summary = tracker.get_current_summary()
print(f'Current level: {summary.get(\"level\", \"Unknown\")}')
"

Streak Recording

# Check streak file
cat /srv/latvian_learning/logs/memory/streak.json

# Test streak update
curl -X POST http://localhost:5002/api/daily/record \
  -H "Content-Type: application/json" \
  -d '{"activity":"practice"}'

# Check system date/timezone
date
timedatectl status

### Immediate Recovery Actions

1. Service Recovery

# Restart core services
docker restart latvian-main-site anki-headless latvian-postgres

# Check service health after restart
sleep 10
curl -s http://localhost:5002/health | jq '.microservices'

2. File System Recovery

# Fix common permission issues
sudo chown -R david:david /srv/latvian_learning/logs/
sudo chmod -R 755 /srv/latvian_learning/lesson_plans/

# Clean up corrupted files
mv /srv/latvian_learning/logs/memory/corrupt_file.jsonl /tmp/backup/
touch /srv/latvian_learning/logs/memory/writing_scores.jsonl

3. Data Recovery

# Restore from backups if available
cp /srv/latvian_learning/backups/cefr_summary_latest.json.backup \
   /srv/latvian_learning/logs/agent/cefr_summary_latest.json

# Reinitialize empty files with valid defaults
echo '{"current_streak": 0, "last_practice_date": null}' > \
     /srv/latvian_learning/logs/memory/streak.json

### Advanced Diagnostics

1. Full System Check

#!/bin/bash
# Complete health check script

echo "=== Container Status ==="
docker ps --format "table {{.Names}}\t{{.Status}}"

echo "=== Service Health ==="
curl -s http://localhost:5002/health | jq '.'

echo "=== Database Connectivity ==="
docker exec latvian-postgres pg_isready -U latvian_user

echo "=== File System Status ==="
df -h /srv/latvian_learning
find /srv/latvian_learning -name "*.jsonl" -exec wc -l {} \;

echo "=== Recent Errors ==="
docker logs latvian-main-site --since="1h" | grep -i error | tail -10

2. User Journey Simulation

#!/bin/bash
# Test critical user paths

# Test dashboard loading
echo "Testing dashboard..."
curl -s http://localhost:5002/api/data >/dev/null && echo "βœ“ Dashboard API" || echo "βœ— Dashboard API"

# Test practice submission
echo "Testing writing practice..."
curl -X POST http://localhost:5002/api/practice/writing/submit \
  -H "Content-Type: application/json" \
  -d '{"text":"Sveiki","prompt":"greetings"}' >/dev/null && \
  echo "βœ“ Writing Practice" || echo "βœ— Writing Practice"

# Test exercise generation
echo "Testing exercise generation..."
curl -X POST http://localhost:5002/api/practice/test/generate \
  -H "Content-Type: application/json" \
  -d '{"type":"mixed","count":5}' >/dev/null && \
  echo "βœ“ Exercise Generation" || echo "βœ— Exercise Generation"

### Escalation Criteria

Immediate Escalation (System Down):
- Multiple container failures
- Database completely inaccessible
- All API endpoints returning 500 errors
- File system corruption

Standard Resolution (Service Degraded):
- Single service failures
- Partial functionality available
- Non-critical features affected
- Performance issues

Monitor and Document:
- Sporadic errors
- Single user reports
- Minor data inconsistencies
- Temporary network issues

---

Document Status: Complete troubleshooting decision trees for user interactions
Last Updated: 2026-05-17
Next Steps: Integrate with monitoring dashboard for automated diagnostics

Infrastructure & Deployment

Container inventory, image sources, GitHub repository map, and restoration procedures

Source Repository: github.com/davidgut1982/portainer - All Docker stacks and Dockerfiles for the Latvian Learning infrastructure
Complete Infrastructure Reference

### Host: latvian-learning (CT 130, 192.168.1.30)

The Latvian Learning host runs the user-facing services (TILTS dashboard, AI services, Anki integration).

### latvian-main-site (TILTS Dashboard)

Purpose: Main Latvian Learning user interface - lesson plans, practice exercises, dashboard widgets

| Field | Value |
|-------|-------|
| Image | latvian-learning-tilts-main:latest |
| Source Repo | github.com/davidgut1982/portainer β†’ stacks/learning-stack/ |
| Dockerfile | ./latvian-learning-tilts-main/Dockerfile (in repo) |
| Build Context | /srv/latvianlearning/ (project root) |
| Port | 5002 (host) β†’ 5000 (container) |
| Network | latvian
network |
| Volumes | /srv/latvianlearning:/srv/latvianlearning, /mnt/data/apps/anki-media:/mnt/data/apps/anki-media |
| Env Vars | POSTGRESHOST=latvian-postgres, POSTGRESDATABASE=tiltstezaurs, ANKIURL=http://anki-headless:8765, OMNIVOICEURL=http://omnivoice-lv:8000 |
| Restart | unless-stopped |
| Healthcheck | curl -f http://localhost:5000/ |

Restoration:

cd /srv/latvian_learning
docker compose build latvian-main-site
docker compose up -d --force-recreate latvian-main-site

### latvian-services-dashboard (AI Services Status)

Purpose: Health overview of all backend AI services (was confusingly named "Educational Stack")

| Field | Value |
|-------|-------|
| Image | latvian-learning-main-site:latest |
| Source Repo | github.com/davidgut1982/portainer β†’ stacks/learning-webapp/ |
| Port | 5003 (host) β†’ 5000 (container) |
| Network | latvian
network |
| Volumes | /srv/latvian_learning/logs:/app/logs |
| Healthcheck | curl -f http://localhost:5000/health |

All AI services use the latviannetwork and share volumes from /srv/latvianlearning/.

### morph-analyzer-lv (Morphological Analysis)
| Field | Value |
|-------|-------|
| Image | 192.168.1.4:5000/morph-analyzer-morph-analyzer-lv:latest |
| Port | 8091:8001 |
| Source | Educational Stack archive |

### udpipe-lv (POS Tagging)
| Field | Value |
|-------|-------|
| Image | 192.168.1.4:5000/udpipe-udpipe-lv:latest |
| Port | 8092:8002 |
| Source | Educational Stack archive |

### sentence-embedder-lv (Semantic Embeddings)
| Field | Value |
|-------|-------|
| Image | 192.168.1.4:5000/sentence-embedder-lv:latest (16.3GB) |
| Port | 8093:8003 |
| Memory Limit | 1GB (reduced from over-allocated 3GB) |

### fluency-index-lv (FAISS Index)
| Field | Value |
|-------|-------|
| Image | 192.168.1.4:5000/fluency-index-fluency-index-lv:latest |
| Port | 8094:8004 |
| Volume | fluencyindexdata:/data |

### fluency-gate-lv (Fluency Validation)
| Field | Value |
|-------|-------|
| Image | 192.168.1.4:5000/fluency-gate-fluency-gate-lv:latest |
| Port | 8095:8005 |

### grammar-gate-lv (DEPRECATED)
Status: Code marked deprecated per Decision #11. Kept for backward compatibility.

### comprehensive-validation-gateway (4E Orchestrator)
| Field | Value |
|-------|-------|
| Image | 192.168.1.4:5000/comprehensive-validation-gateway:latest |
| Port | 8097:8007 |
| Function | Routes to fluency-gate (4C) + grammar-correction (2E) in parallel |

### template-extractor-lv (Template Extraction)
| Field | Value |
|-------|-------|
| Image | 192.168.1.4:5000/template-extractor-template-extractor-lv:latest |
| Port | 8098:8008 |

### constrained-generator-lv (GPT Generator)
| Field | Value |
|-------|-------|
| Image | 192.168.1.4:5000/constrained-generator-constrained-generator-lv:latest |
| Port | 8099:8009 |
| Opt-in via | USECONSTRAINEDGENERATOR=true env flag |

### repair-loop-lv (Content Repair)
| Field | Value |
|-------|-------|
| Image | 192.168.1.4:5000/repair-loop-repair-loop-lv:latest |
| Port | 8100:8010 |

### asr-transcription-lv (Whisper Speech-to-Text)
| Field | Value |
|-------|-------|
| Image | 192.168.1.4:5000/asr-transcription-lv:latest |
| Port | 8101:8011 |
| GPU | RTX 3060 (deviceids: ['1']) |
| Memory Limit | 4GB |
| Model | Whisper Latvian (CT2 int8) |

### forced-aligner-lv (Audio Alignment)
| Field | Value |
|-------|-------|
| Image | 192.168.1.4:5000/forced-aligner-lv:latest |
| Port | 8102:8003 |
| CPU Only | CUDA
VISIBLE_DEVICES="" (prevents VRAM leak) |

### grammar-correction-lv (2E Grammar Validation)
| Field | Value |
|-------|-------|
| Image | 192.168.1.4:5000/grammar-correction-lv:latest |
| Port | 8103:8007 |
| Function | UDPipe + rule-based + GPT repair |

### back-translator-lv (NLLB Translation)
| Field | Value |
|-------|-------|
| Image | 192.168.1.4:5000/back-translator-lv:latest (21.1GB) |
| Port | 8104:8008 |
| GPU | Required for performance |
| Memory Limit | 5GB |
| Model | NLLB-200-distilled-1.3B-ct2-int8 (CTranslate2 INT8) |

### morphological-analyzer-lv (Database-backed Morphology)
Status: Configured but rarely called. Code uses GPT-based AIRouter instead.

### vocabulary-database-lv (Vocab Lookup)
Status: Configured but unused in production code.

### vocal-isolator-lv (Demucs Vocal Separation)
| Field | Value |
|-------|-------|
| Image | 192.168.1.4:5000/vocal-isolator-lv:latest (13.2GB) |
| Port | 8106:8003 |
| GPU | Tesla P4 (GPU 0) |
| Memory Limit | 4GB |
| Volume | vocalisolatorcache:/cache (~30GB for 200 films) |
| Source | /srv/latvianlearning/tilts-system/docker/vocal-isolator-lv/ |
| Model | Demucs htdemucs (chunked, 5-min segments, 10s overlap) |
| Network | Must be on BOTH latvian
network AND dockerlatviannetwork |

### fake-subsai-asr-gateway (Bazarr Subtitle Gateway)
| Field | Value |
|-------|-------|
| Image | 192.168.1.4:5000/fake-subsai-asr-gateway:latest |
| Port | 9001 (host network mode) |
| Source | /srv/latvianlearning/tilts-system/docker/fake-subsai-asr-gateway/ |
| Memory Limit | 3.5GB |
| Function | Orchestrates: vocal-isolator β†’ ASR β†’ forced-aligner β†’ back-translator |
| Network | network
mode: host (not on latviannetwork) |

### omnivoice-lv (TTS)
| Field | Value |
|-------|-------|
| Image | 192.168.1.4:5001/omnivoice-lv:latest (19.9GB) |
| Port | 8021 (host) β†’ 8000 (container) |
| GPU | RTX 3060, ~5.8GB VRAM |
| Source | /srv/latvian
learning/tilts-system/docker/omnivoice/ |
| Network | Must be on BOTH latviannetwork AND dockerlatvian_network |
| Function | XTTS-derived Latvian TTS for Anki cards |

### latvian-postgres (Database)
| Field | Value |
|-------|-------|
| Image | pgvector/pgvector:pg16 |
| Port | 5433 (host) β†’ 5432 (container) |
| Database | latvianlearning, also tiltstezaurs for TILTS |
| User | latvianuser |
| Volume | postgres
data:/var/lib/postgresql/data (1.59GB) |

### Dashboard Data Sources

The dashboard widgets are populated from these endpoints. Understanding which endpoint feeds which widget is critical when data appears inconsistent.

| Widget | Source Endpoint | Notes |
|--------|----------------|-------|
| CEFR Score / Level | /api/cefr | Has fallback logic for broken snapshots (see below) |
| Unique Words (statWords) | /api/cefr.components.vocab.uniquewords | Reads from CEFR, not /api/anki directly |
| Total Cards (statCards) | /api/cefr.components.vocab.total
cards | From CEFR snapshot |
| Mature (statMature) | /api/cefr.components.maturity.maturecards | Anki definition |
| New (statNew) | /api/cefr (Definition B - learning interval based) | Differs from /api/anki/cards |
| Reviews Today | /api/anki.reviews
today | Live AnkiConnect query |

### CEFR Endpoint Stale-Data Behavior

/api/cefr includes self-healing logic added 2026-05-17 (see kbeb546db3d07f):

# If latest snapshot has vocab=0 (broken run), fall back to last valid history entry
if summary['components']['vocab']['unique_words'] == 0:
    # Search history for last entry with vocab > 0
    for entry in reversed(history):
        if entry['components']['vocab']['unique_words'] > 0:
            summary = entry
            break

Root cause not yet fixed: The CEFR tracker still sometimes writes broken vocab=0 snapshots. The endpoint just compensates. TODO: investigate agent/modules/cefrtracker.py to prevent writing broken snapshots.

### Anki Card Classification Discrepancy

New vs Learning split differs between endpoints (cosmetic, totals reconcile):

| Endpoint | New | Learning | Mature | Total |
|----------|-----|----------|--------|-------|
| /api/anki/cards (Definition A - reps=0) | 4,376 | 349 | 94 | 4,819 |
| /api/cefr (Definition B - interval based) | 4,182 | 543 | 94 | 4,819 |

See kb_fa56af09911e for full explanation. Both are valid Anki concepts; pick endpoint based on use case.

### Recently Added Stub Endpoints

- /api/queue/status - returns {"pending": 0, ...} (stub added 2026-05-17 to silence /tools console errors)
- /api/plan//anki/status - fixed 2026-05-17 (path resolution bug in container)

Container: omnivoice-lv at port 8021 (host) / 8000 (internal)
Service docs: Available at http://192.168.1.30:8021/docs
KB Entry: kba4fb069b755e

### Available Voices (5 total, 4 Latvian-safe)

| Voice ID | Gender | Description | Safe for Latvian? |
|----------|--------|-------------|-------------------|
| voice2
conversational | Female | Conversational Latvian (default) | βœ… Yes |
| voice4conversationalclean | Female | Clean Latvian | βœ… Yes |
| voice1anki | Female | Anki vocab style (3s) | βœ… Yes |
| voice5
maleclean | Male | Clean Latvian | βœ… Yes |
| ~~voice3
male~~ | Male | ENGLISH audio - XTTS leftover | ❌ NO - excluded from form |

### Critical Warning

voice3male was carried over from XTTS migration and contains English reference audio. Using it for Latvian text causes "accent bleed" - English-sounding Latvian. Never map Latvian voices to voice3male.

### Legacy Voice Mappings (Narakeet β†’ OmniVoice)

For backward compatibility, these aliases are mapped:

| Legacy Name | Maps To | Notes |
|-------------|---------|-------|
| inese, betty, female | voice2conversational | Female Latvian βœ“ |
| arturs, john, male | voice5
maleclean | Male Latvian βœ“ (FIXED 2026-05-18 from broken voice3male) |

### Quick TTS Endpoint

POST /api/tts/generate
{
  "text": "Sveiki, kā jums klājas?",
  "voice": "voice2_conversational",   # or any of the 4 Latvian voices
  "speed": "normal",                   # "slow", "normal", "with_pauses"
  "mode": "single"                     # "single" or "sentences"
}

Response:

{
  "success": true,
  "mode": "single",
  "audio_urls": ["/api/tts/audio/20260518_xxx.wav"]
}

### Adding New Voices

1. Add WAV to /app/speakers/ inside omnivoice-lv container
2. Add transcript to /app/speakers/transcripts.yaml (verified by ASR)
3. Test: curl POST http://localhost:8021/ttstoaudio/
4. Add to form template quicktts.html
5. Update kb
a4fb069b755e

### How TTS Flows Through System

[User] β†’ Quick TTS form (/tts)
       β†’ POST /api/tts/generate (tts_bp.py)
       β†’ resolves voice via XTTS_VOICE_MAP
       β†’ POST http://omnivoice-lv:8000/tts_to_audio/
       β†’ uses /app/speakers/.wav as reference
       β†’ XTTS model on RTX 3060 generates WAV
       β†’ saved to /workspace/media/tts/
       β†’ served at /api/tts/audio/

### Device Ordering (Verified 2026-05-17)

Default: PCIBUSID order (NOT FASTESTFIRST)

nvidia-smi index | name              | PCI bus
0                | Tesla P4          | 04:00.0
1                | NVIDIA RTX 3060   | 08:00.0

Inside containers: cuda:0 = Tesla P4, cuda:1 = RTX 3060

### Per-Service GPU Assignment

| Service | GPU | VRAM Used | Rationale |
|---------|-----|-----------|-----------|
| omnivoice-lv | RTX 3060 | 5,272 MiB | Largest model, needs RTX 3060's 12 GB |
| asr-transcription-lv | RTX 3060 | 1,928 MiB | Whisper benefits from sm
86 compute |
| polycr-paddleocr-1 | RTX 3060 | 264 MiB | OCR service, was already there |
| back-translator-lv | Tesla P4 | 1,885 MiB (3.5 GB declared in GPU supervisor) | Rebalanced 2026-05-17 to use P4 |
| vocal-isolator-lv | Tesla P4 | 344 MiB (idle) | Always pinned to P4 (per KB) |
| forced-aligner-lv | CPU only | N/A | Per KB - prevents VRAM leak |

### Total VRAM Usage

Tesla P4:   3,149 MiB / 7,680 MiB (41% - actively used)
RTX 3060:   7,538 MiB / 12,288 MiB (61% - safe headroom)

### How to Pin a Service to a Specific GPU

service-name:
  environment:
    NVIDIA_VISIBLE_DEVICES: "0"   # "0"=P4, "1"=RTX 3060
    CUDA_VISIBLE_DEVICES: "0"      # Container sees only one GPU, becomes cuda:0
  runtime: nvidia
  deploy:
    resources:
      reservations:
        devices:
          - driver: nvidia
            device_ids: ['0']      # Matches NVIDIA_VISIBLE_DEVICES
            capabilities: [compute, utility]

KB Entry: kb_fca78bc57c59 (full details on GPU ordering + rebalance)

### Status: βœ… DEPLOYED 2026-05-17 - Real Anki Working with 5,004 cards / 41 decks

Verification:

curl -s -X POST http://localhost:8765 -H "Content-Type: application/json" -d '{"action":"deckNames","version":6}'
# Returns 41 real decks including "Latvian (ChatGPT)::Vocab & Sentences"

curl -s https://latvian.shifting-ground.link/api/anki
# Returns: {"available":true,"total_cards":4819,"total_notes":1715,...}

### Public URL Routing (THE REAL PICTURE)

[Anki Desktop on user devices]
        ↓ syncs via
[sync.shifting-ground.link]  ← THE REAL PRODUCTION URL βœ… WORKING
        ↓ pfSense HAProxy routes directly to
[192.168.1.30:27701] anki-sync container
        ↓ writes to
[/mnt/data/apps/anki-sync/data/david/collection.anki2]

[TILTS Dashboard latvian.shifting-ground.link]
        ↓ uses internal Docker network
[anki-headless:8765] AnkiConnect API βœ… WORKING
        ↓ reads from
[Anki collection synced from anki-sync]

### Active URLs Reference

| URL | Status | Purpose | Used By |
|-----|--------|---------|---------|
| sync.shifting-ground.link | βœ… WORKING (PRIMARY) | Anki Desktop sync server | User's Anki Desktop, iPhone, etc. |
| latvian.shifting-ground.link | βœ… Working | Latvian Learning Dashboard | Browser users |
| latvian.shifting-ground.link/api/anki/* | βœ… Working | Real Anki data via internal Docker | Dashboard frontend |
| anki.shifting-ground.link | ⚠️ 503 (cosmetic) | Anki Web UI (KasmVNC) | Not used in normal workflow |
| ankiconnect.shifting-ground.link | ⚠️ 503 (cosmetic) | External AnkiConnect API | Not used in normal workflow |

### Real Sync Activity Proof

The anki-sync container actively serves user david's Anki Desktop 25.09.2:

INFO request{uri="/msync/uploadChanges" ip="192.168.1.1" uid="david" client="25.09.2,3890e12c,linux"}: finished httpstatus=200
INFO request{uri="/msync/mediaSanity" ip="192.168.1.1" uid="david"}: finished httpstatus=200

Files being uploaded include Latvian vocabulary card images:
vΔ“lvienreiz.png, Čehija.png, četri.png, Δ£imene.png, Ε veice.png, ΕΎurnāls.png, etc.

The source IP 192.168.1.1 (pfSense) confirms traffic flows: External β†’ pfSense HAProxy β†’ anki-sync:27701.

### NPMplus Proxy Hosts (for non-critical Anki URLs)

These NPMplus entries exist but their pfSense backends are unhealthy. Only matters if you need direct browser access to Anki Web UI - not required for normal usage.

| Subdomain | Config File | Upstream | pfSense Status |
|-----------|-------------|----------|----------------|
| anki.shifting-ground.link | /data/nginx/proxy
host/13.conf | 192.168.1.30:3000 | ⚠️ Backend down |
| ankiconnect.shifting-ground.link | /data/nginx/proxyhost/16.conf | 192.168.1.30:8765 | ⚠️ Backend down |

The fact that these URLs return 503 does NOT impact:
- Anki Desktop sync (uses sync.shifting-ground.link instead)
- Dashboard Anki data display (uses internal Docker network)
- Test/quiz generation in TILTS (uses internal AnkiConnect)

These URLs would only be needed for:
- VNC-browsing Anki Desktop GUI through web (rare)
- External tools calling AnkiConnect from outside the LAN (rare)

### Source of Truth: github.com/davidgut1982/portainer/stacks/anki/

Local clones (verified):
- /home/david/portainer/stacks/anki/docker-compose.yml
- /mnt/data/apps/portainer-export/stacks/anki/docker-compose.yml

The Stack Has TWO Services (NOT just one!):

### Service 1: anki-headless (Anki Desktop + AnkiConnect API)

| Field | Value |
|-------|-------|
| Image | registry.shifting-ground.link/anki-headless:latest |
| (Mirror) | 192.168.1.4:5000/anki-headless:latest (same SHA: 3e5bae7d9f74) |
| Image Type | linuxserver/webtop + custom-cont-init.d scripts |
| User | 0:0 (root - required for s6-overlay) |
| Ports | 8765:8765 (AnkiConnect API), 3000:3000 (KasmVNC web UI) |
| Networks | anki
network, frontendnet |
| Restart | unless-stopped |

Environment Variables (REQUIRED):

ANKI_API_KEY=
DISPLAY=:99
ANKICONNECT_BIND_HOST=0.0.0.0   # CRITICAL - default is localhost only!
ANKICONNECT_PORT=8765
PUID=1005
PGID=136
TZ=America/Chicago

Volume Mounts (REQUIRED):

volumes:
  - /mnt/data/apps/anki:/config                                                          # Main Anki config + addons + media
  - /mnt/data/apps/anki/custom-cont-init.d:/custom-cont-init.d                          # THE SECRET SAUCE - init scripts
  - /mnt/data/apps/anki/nginx:/var/lib/nginx                                            # Nginx runtime
  - /mnt/data/apps/anki/nginx/logs:/var/log/nginx                                       # Nginx logs
  - /mnt/data/apps/anki-media:/config/.local/share/Anki2/User 1/collection.media        # Card media

### The Custom Init Scripts (CRITICAL for it to work!)

Location: /mnt/data/apps/anki/custom-cont-init.d/

00-anki-setup.sh - Creates Anki user directory:

#!/command/with-contenv bash
set -e
mkdir -p /config/.local/share/Anki2/"User 1"
chown -R abc:abc /config

05-enable-ankiconnect.sh - Enables AnkiConnect addon programmatically:

#!/command/with-contenv bash
set -e

CFG=/config/.local/share/Anki2/addons21.json
mkdir -p /config/.local/share/Anki2

if [ ! -f "$CFG" ]; then
  cat >"$CFG" <

Without these init scripts, the webtop image starts but AnkiConnect addon is never enabled.

### Service 2: anki-sync (Anki Sync Server) - βœ… DEPLOYED 2026-05-17

| Field | Value |
|-------|-------|
| Image | ghcr.io/luckyturtledev/anki:latest (27MB Rust binary, NOT Python Anki) |
| Container | anki-sync |
| Port | 27701:8080 |
| Networks | latvian
network (in our setup) / frontendnet (per portainer compose) |
| Restart | unless-stopped |
| Env | TZ=America/Chicago, SYNC
USER1=david: |
| Volumes | /mnt/data/apps/anki-sync/{config,data,logs} |

Status: Running on port 27701, listening at 0.0.0.0:8080 inside container.

Verification:

docker logs anki-sync | tail -5
# INFO listening addr=0.0.0.0:8080

curl -s -o /dev/null -w "%{http_code}\n" http://localhost:27701/
# Returns 404 (NORMAL - sync server only has /sync/* endpoints)

KB Entry: kb36b2f93e8edd

This is the Anki Sync Server that:
- Receives sync requests from Anki Desktop (in container OR on personal devices)
- Stores the master collection.anki2 at /mnt/data/apps/anki-sync/data/david/collection.anki2
- That file was modified TODAY because real Anki sync is happening!
- Provides AnkiWeb-protocol-compatible sync API

The Flow:

[Personal device/iPhone Anki] β†’ syncs to β†’ anki-sync:27701 β†’ writes /mnt/data/apps/anki-sync/data/david/
[Container anki-headless]    β†’ syncs to β†’ anki-sync:27701 β†’ keeps in sync
[TILTS dashboard]            β†’ queries β†’ anki-headless:8765 (AnkiConnect API) β†’ reads real cards

### Deployment

Real docker-compose.yml (from /home/david/portainer/stacks/anki/docker-compose.yml):

version: "3.9"
services:
  anki-headless:
    image: registry.shifting-ground.link/anki-headless:latest
    container_name: anki-headless
    user: "0:0"
    environment:
      - ANKI_API_KEY=${ANKI_API_KEY}
      - DISPLAY=:99
      - ANKICONNECT_BIND_HOST=0.0.0.0
      - ANKICONNECT_PORT=8765
      - PUID=1005
      - PGID=136
    networks:
      - anki_network
      - frontend_net
    ports:
      - "8765:8765"
      - "3000:3000"
    volumes:
      - ${CONFIGURATION_FILES}/anki:/config
      - ${CONFIGURATION_FILES}/anki/custom-cont-init.d:/custom-cont-init.d
      - ${CONFIGURATION_FILES}/anki/nginx:/var/lib/nginx
      - ${CONFIGURATION_FILES}/anki/nginx/logs:/var/log/nginx
      - /mnt/data/apps/anki-media:/config/.local/share/Anki2/User 1/collection.media
    restart: unless-stopped

  anki-sync:
    image: ghcr.io/luckyturtledev/anki:latest
    container_name: anki-sync
    restart: unless-stopped
    ports:
      - "27701:8080"
    volumes:
      - ${CONFIGURATION_FILES}/anki-sync/config:/config
      - ${CONFIGURATION_FILES}/anki-sync/data:/data
      - ${CONFIGURATION_FILES}/anki-sync/logs:/logs
    environment:
      - "TZ=America/Chicago"
      - "SYNC_USER1=david:${ANKI_SYNC_PASSWORD}"
    networks:
      - frontend_net

networks:
  anki_network:
    driver: bridge
  frontend_net:
    driver: bridge

Required .env file:

ANKI_API_KEY=
ANKI_SYNC_PASSWORD=
CONFIGURATION_FILES=/mnt/data/apps

Deploy command:

cd /home/david/portainer/stacks/anki
# Create .env with real credentials first
docker compose up -d

### Verification (REAL Anki, not mock)

# AnkiConnect API responding
curl -s -X POST http://localhost:8765 \
  -H "Content-Type: application/json" \
  -d '{"action":"version","version":6}'
# Expected: {"result":6,"error":null}

# Get real deck names
curl -s -X POST http://localhost:8765 \
  -d '{"action":"deckNames","version":6}'
# Expected: array of real deck names (e.g., "Latvian (ChatGPT)::Vocab & Sentences")

# Anki Sync Server health
curl -s http://localhost:27701/
# Expected: Anki Sync Server response

# Container status
docker ps --filter "name=anki" --format "{{.Names}}|{{.Status}}"

### Common Issues

| Issue | Cause | Fix |
|-------|-------|-----|
| Port 8765 not listening | Custom init scripts missing | Mount /mnt/data/apps/anki/custom-cont-init.d |
| AnkiConnect bind to localhost only | Missing env var | Set ANKICONNECT
BINDHOST=0.0.0.0 |
| Permission denied on nginx | wrong user | Must use user: "0:0" (root) |
| Webtop loops on s6-init | seccomp blocks setgroups | Add security
opt: - seccomp:unconfined + privileged: true if in LXC |
| Collection not syncing | Sync server password mismatch | Check SYNC_USER1=david:password matches Anki Desktop sync config |

### Primary Repository: github.com/davidgut1982/portainer

Structure:

portainer/
β”œβ”€β”€ stacks/                           # Portainer-deployable stacks
β”‚   β”œβ”€β”€ anki/                         # Anki sync server + media
β”‚   β”œβ”€β”€ arr-stack/                    # Media management
β”‚   β”œβ”€β”€ learning-stack/               # Main TILTS infrastructure
β”‚   β”œβ”€β”€ learning-webapp/              # Latvian Flask web app source
β”‚   β”œβ”€β”€ mcp-dashboard/                # MCP dashboard
β”‚   β”œβ”€β”€ npmplus-setup/                # Reverse proxy
β”‚   β”œβ”€β”€ plex-stack/                   # Plex media
β”‚   β”œβ”€β”€ utilities-stack/              # Registry, npm, etc.
β”‚   └── xtts/                         # XTTS training infrastructure
β”œβ”€β”€ educational-stack-full.yml        # All 16 AI services (582 lines)
β”œβ”€β”€ srv-compose-files/                # Service compose files
└── docker-infrastructure-temp/       # Alternative configs

### Project Source Code

Latvian Learning (this repo): /srv/latvian_learning/
- tilts-system/ - Main TILTS application (Python/Flask)
- agent/dashboard/ - Dashboard blueprints, templates, modules
- docker/ - Per-service Dockerfiles for AI pipeline
- latvian-learning-tilts-main/ - TILTS frontend image source
- latvian-learning-webapp/ - Services dashboard image source
- docs/user-journeys/ - This documentation
- docs/diagrams/ - Architecture diagrams

### latviannetwork (primary)
- Subnet: 172.18.0.0/16
- All TILTS services use this
- External=true in compose files

### dockerlatviannetwork (legacy)
- Subnet: 172.22.0.0/16
- Some services (omnivoice, vocal-isolator) historically on this
- CRITICAL: Services on this MUST also be on latvian
network for dashboard to reach them

### ankinetwork
- For anki-headless service
- Also requires latvian
network connection

### "Service X is down" β†’ Standard Recovery

# 1. Check status
docker ps -a | grep 

# 2. Check logs for last 50 lines
docker logs  --tail 50

# 3. Try restart
docker restart 

# 4. If restart fails, recreate from compose:
docker compose stop 
docker compose rm -f 
docker compose up -d 

# 5. If image is missing/corrupt, pull from registry
docker pull 192.168.1.4:5000/:latest

### "Can't reach service from another container" β†’ Network Issue

# Check what networks the service is on
docker inspect  --format '{{range $name, $net := .NetworkSettings.Networks}}{{$name}}: {{$net.IPAddress}}{{"
"}}{{end}}'

# Connect to additional network if needed
docker network connect latvian_network 

### "Container starts but service unresponsive" β†’ Common Causes

1. Missing volume mount: Check docker inspect --format '{{.Mounts}}'
2. Wrong network: Service on wrong subnet (see Networks above)
3. GPU not allocated: Need deploy.resources.reservations.devices in compose
4. CUDA mismatch: Containers built for CUDA 12.1, host runs 13.x (forward compat OK)
5. CPU pinning wrong: Host has sparse cores (0,3,6,9,12,15) not contiguous (0-3)

### Full System Recovery

# 1. Pull all images from registry
docker compose pull

# 2. Recreate all containers
docker compose down
docker compose up -d

# 3. Wait for healthchecks
sleep 60
docker ps --filter health=unhealthy

# 4. Verify dashboards
curl -s -o /dev/null -w "TILTS: %{http_code}
" http://localhost:5002/
curl -s -o /dev/null -w "Services Dashboard: %{http_code}
" http://localhost:5003/

Host: docker-registry (CT 104)

List all images:

curl -s http://192.168.1.4:5000/v2/_catalog

Get tags for specific image:

curl -s http://192.168.1.4:5000/v2//tags/list

Registry build host: Some images built on 192.168.1.4 (registry host) and pushed locally.

| Issue | Status | Notes |
|-------|--------|-------|
| anki-headless using mock | πŸ”΄ BROKEN | Real image needs investigation - registry image is KasmVNC/GUI based |
| 5 diagrams render tiny | βœ… FIXED | Added explicit width/height to SVGs |
| Shadow dir permission | βœ… FIXED | Created /srv/latvian_learning/workspace/media/shadowing with 777 |
| Writing 500 error | βœ… FIXED | chmod 777 on logs/memory dir |
| Grammar JS broken | βœ… FIXED | Closed unterminated <script> tag |
| Missing /api/anki/* routes | βœ… FIXED | Added 5 new endpoints |
| Test gen slow | ⚠️ WORKAROUND | 21s/5q, 54s/10q - linear scaling |
| GPU CUDA mismatch | βœ… FIXED | Added runtime: nvidia to compose |

Deployed: 2026-05-17 - KB Entry: kb3c2d2a671885

### What's Monitored

| Layer | Tool | Catches |
|-------|------|---------|
| Container health | cadvisor + ContainerDown alert | Crashed/stopped containers |
| System resources | node-exporter | Disk full, RAM, CPU spikes |
| GPU usage | nvidia-gpu-exporter | VRAM exhaustion, idle GPUs |
| Public URLs | blackbox-exporter | 503s, timeouts, SSL issues |
| Application data | latvian-exporter (custom) | Mock servers, stale CEFR, zero cards |

### Key Custom Metrics (latvian-exporter)

Available at http://192.168.1.8:9116/metrics:
- latvian
ankitotalcards - real card count (4819 = healthy, <100 = MOCK)
- latvianankimocksuspected - 1 = ALERT, mock detected
- latvian
cefrsnapshotvalid - 0 = broken CEFR run
- latvianankidecks_count - 41 decks confirmed
- + 12 more metrics

### Alert Rules β†’ Actions

| Alert | Trigger | Auto-Action |
|-------|---------|-------------|
| ContainerDown | Container missing 5+ min | Restart container |
| AnkiConnectNoRealData | Direct probe fails | Restart anki-headless |
| SyncServerDown | sync.shifting-ground.link fails | Restart anki-sync |
| AnkiCardCountSuspiciouslyLow | Cards < 100 | Alert only (mock danger) |
| CEFRStaleSnapshot | vocab=0 but Anki OK | Trigger CEFR recalc |
| AnkiSyncStale | No sync 24h | Restart anki-sync |
| PublicURLDown | Any public URL down | Alert only (pfSense issue) |

### Grafana Dashboards (http://192.168.1.8:3001)

1. Latvian Learning Health - Single-pane status view
2. Anki Sync Activity - Card growth, daily reviews trend
3. GPU Per-Service - Per-process VRAM, P4 vs RTX 3060

### How to Test Monitoring

# Verify all metrics
ssh root@192.168.1.8 'curl -s http://localhost:9116/metrics | grep latvian_'

# Force trigger an alert (test mode)
ssh root@192.168.1.8 'curl -X POST http://localhost:9093/api/v2/alerts -d "[{...}]"'

# View dashboards
open http://192.168.1.8:3001

### 2026-05-18 - Quick TTS / OmniVoice Voice Fix
User found Quick TTS form was using old Narakeet voice names (Betty/John) which mapped to broken voice3male (English audio). KB: kba4fb069b755e
- Fixed XTTSVOICEMAP: john/arturs/male β†’ voice5maleclean (was voice3male English)
- Updated Quick TTS form to show all 4 Latvian OmniVoice voices with native names
- Added warning about voice3
male being English-only

### 2026-05-17 - Audit Blind Spots Found by User
User found 2 bugs the audits missed (upload + TTS). KB: kb4f47cd82433b
- Fixed: ffmpeg missing in Dockerfile (audio upload broken)
- Documented: TTS API actually works, just had scary supervisor warnings

### 2026-05-17 - Monitoring Stack Deployed (5 phases)
Comprehensive Prometheus/Grafana/Alertmanager monitoring with auto-remediation. KB: kb
3c2d2a671885
- Blackbox exporter for URL monitoring
- Custom latvian-exporter for app metrics
- 12 alert rules with moderate auto-remediation
- 3 Grafana dashboards

### 2026-05-17 - Comprehensive System Audit
Found 7 issues, fixed 7. KB: kbebe48d69afea
- Fixed: /api/cefr returning stale broken data (showed A1- instead of A2-)
- Fixed: Dashboard statWords "--" placeholder (auto-resolved by CEFR fix)
- Fixed: /api/queue/status 404 (added stub endpoint)
- Fixed: /api/plan//anki/status 404 (path resolution bug)
- Fixed: Test generation no progress UI (added spinner + status message)
- Fixed: /progress/weekly contradictory text (improved error handling)
- Documented: Anki new/learning split discrepancy (cosmetic only)

### 2026-05-17 - GPU Rebalance
back-translator moved to Tesla P4. KB: kb
fca78bc57c59
- P4: 4.6% β†’ 41% utilization (now actively used)
- RTX 3060: 84% β†’ 61% utilization (safer headroom)

### 2026-05-17 - Real Anki Restoration
Replaced mock Python server with real Anki + AnkiConnect. KB: kbd80f5d3c855e
- 5,004 real cards across 41 decks
- Used registry image with custom-cont-init.d scripts

### 2026-05-17 - Anki Sync Server Deployed
Added anki-sync container for sync.shifting-ground.link. KB: kb
36b2f93e8edd
- ghcr.io/luckyturtledev/anki:latest, port 27701
- Active sync confirmed from user's Anki Desktop 25.09.2

| Issue | Impact | Workaround |
|-------|--------|-----------|
| /diagrams console "Unexpected token '<'" | Cosmetic console error | None needed |
| anki.shifting-ground.link 503 | Direct web UI access blocked | Use sync.shifting-ground.link for sync (works) |
| ankiconnect.shifting-ground.link 503 | External AnkiConnect blocked | Dashboard uses internal Docker network (works) |
| CEFR tracker writes vocab=0 snapshots | Fixed by /api/cefr fallback | Long-term: fix cefr_tracker.py |
| New vs Learning split varies | Cosmetic only - totals match | Use endpoint that matches your need |

Decision tree:
1. Identify which service is broken (from error message or health check)
2. Look up that service in the inventory above
3. Check the source repo and image name
4. Try "Standard Recovery" procedure
5. If still broken, check Known Issues
6. If novel issue, check container logs and document the fix here
7. Add to Audit History above so future agents have context

System Flow Diagrams

Live architecture and data-flow diagrams rendered from docs/diagrams/system-flows.md

1. Overall System Architecture
graph TB
    User["User (browser)"] --> nginx["nginx\nlatsvian.shifting-ground.link"]
    nginx --> main["latvian-main-site\n:5002 (gunicorn)"]
    nginx --> healthproxy["health-monitor-proxy/*\n(reverse proxy)"]
    healthproxy --> healthmon["system-health-monitor\n:5004"]

    main --> postgres[("latvian-postgres\n:5433\ntilts_tezaurs / learning.*")]
    main --> tts["omnivoice-lv\n:8000 XTTS"]
    main --> asr["asr-transcription-lv\n:8011 Whisper"]
    main --> aligner["forced-aligner-lv\n:8102 MMS"]
    main --> gateway["comprehensive-validation-gateway\n:8097 / internal :8007"]
    main --> bt["back-translator-lv\n:8104 NLLB-200 CT2 INT8"]
    main --> repair["repair-loop-lv\n:8100 / :8010"]
    main --> openrouter["OpenRouter\nhttps://openrouter.ai/api/v1"]

    openrouter --> tutor["TUTOR_MODEL\ndeepseek-chat-v3.1\nor gemini-2.5-flash"]
    openrouter --> quiz["QUIZ_MODEL\ngpt-4o-mini"]
    openrouter --> repairmodel["REPAIR_MODEL\ngpt-4o-mini"]

    gateway --> fluencygate["fluency-gate-lv\n:8095 / :8005"]
    gateway --> grammargate["grammar-gate-lv\n:8096 / :8006"]
    fluencygate --> fluencyindex["fluency-index-lv\n:8094 / :8004\nFAISS LVTB"]
    fluencyindex --> embedder["sentence-embedder-lv\n:8093 / :8003"]
    grammargate --> udpipe["udpipe-lv\n:8092 / :8002"]

    repair --> repairmodel
    repair --> gateway

    subgraph Monitoring ["Monitoring Tier (192.168.1.8 bastion)"]
        prom["Prometheus\n:9090"]
        graf["Grafana\n:3001"]
        loki["Loki\n:3100"]
        am["Alertmanager\n:9093"]
    end

    main -->|"/metrics scrape 30s"| prom
    prom --> am
    prom --> graf
    loki --> graf

    style postgres fill:#b8d4e8
    style openrouter fill:#ffe4b5
    style gateway fill:#d4edda
    style repair fill:#fff3cd
    style Monitoring fill:#f0f4ff
2. AI Tutor Chat Flow (Text + Voice)
flowchart TD
    subgraph Input ["Input β€” choose one"]
        TextIn["User types Latvian text\n(Chat mode)"]
        VoiceIn["User records audio\n(Voice Mode)"]
    end

    subgraph VoicePipeline ["Voice Pipeline (voice-input only)"]
        ASR["asr-transcription-lv:8011\nWhisper (lv language)"]
        Align["forced-aligner-lv:8102\nPer-word {word, start, end, score}"]
        ColoredWords["Colored word display\ngreen/yellow/red per score"]
    end

    subgraph ChatPipeline ["Chat Pipeline (both paths)"]
        UserValidate["Validate user text\ncomprehensive-validation-gateway:8097"]
        UserBadge["Teal badge on user message\nquality_score + assessment"]
        LLM["OpenRouter β†’ TUTOR_MODEL\n(deepseek-chat-v3.1)"]
        AIValidate["Validate AI response\ncomprehensive-validation-gateway:8097"]
        RepairCheck{"quality_score < 0.70?"}
        Repair["repair-loop-lv:8100\ngpt-4o-mini rewrites text"]
        ReRepair["Re-validate repaired text"]
        BackTransAI["back-translator-lv:8104\nLV β†’ EN translation"]
        BackTransUser["back-translator-lv:8104\nLV β†’ EN (user text)"]
        TTSOut["omnivoice-lv:8000\nXTTS β†’ audio URL"]
    end

    subgraph Output ["Output"]
        SaveDB["INSERT learning.messages\n(tilts_tezaurs)"]
        UI["Browser renders:\n- AI message + green/amber badge\n- English back-translation\n- Audio playback button\n- User teal badge + EN translation"]
    end

    TextIn --> UserValidate
    VoiceIn --> ASR --> Align --> ColoredWords
    ColoredWords --> UserValidate

    UserValidate --> UserBadge
    UserBadge --> LLM

    LLM --> AIValidate
    AIValidate --> RepairCheck
    RepairCheck -- Yes --> Repair --> ReRepair --> BackTransAI
    RepairCheck -- No --> BackTransAI
    BackTransAI --> BackTransUser
    BackTransUser --> TTSOut
    TTSOut --> SaveDB --> UI
3. Pronunciation Checker Flow
flowchart TD
    subgraph TextSource ["Reference Text β€” choose source"]
        VocabSrc["GET /api/pronunciation/sample?source=vocab\nRandom word from learning.vocabulary\n(example_lv preferred)"]
        TranscriptSrc["GET /api/pronunciation/sample?source=transcript\nRandom Latvian sentence from study_sessions\n(heuristic Latvian filter)"]
        QueueSrc["GET /api/pronunciation/sample?source=queue\nRandom item from SRS review queue"]
    end

    subgraph RecordFlow ["Recording & Alignment"]
        Browser["Browser MediaRecorder\nWebM/Opus audio"]
        Upload["POST /api/pronunciation/align\nmultipart: audio (WebM) + text (str)"]
        FFmpeg["ffmpeg\nWebM -> WAV (16kHz mono)"]
        Aligner["forced-aligner-lv:8102/align\nMMS forced alignment"]
        Words["Per-word result:\n{word, start, end, score 0-1}"]
    end

    subgraph Display ["Display & Playback"]
        ColorCode["JS colors each word span:\ngreen >= 0.75\nyellow >= 0.45\nred < 0.45"]
        Playback["Playback buttons:\n- Your recording\n- Hear correct pronunciation (TTS)\n- Click any word -> word TTS"]
        TTS["GET /api/pronunciation/tts?text=...\nomnivoice-lv:8000 XTTS"]
    end

    VocabSrc --> Browser
    TranscriptSrc --> Browser
    QueueSrc --> Browser

    Browser --> Upload --> FFmpeg --> Aligner --> Words --> ColorCode
    ColorCode --> Playback
    Playback -- "reference audio" --> TTS
4. Validation Pipeline Flow
flowchart TD
    Input["Latvian text\n(user message or AI response)"]

    subgraph Gateway ["comprehensive-validation-gateway:8097"]
        FluencyCall["fluency-gate-lv:8095\nFAISS LVTB nearest-neighbor\nWeight: 60%"]
        GrammarCall["grammar-gate-lv:8096\nUDPipe morphological rules\nWeight: 40%"]
        Combine["quality_score =\nfluency x 0.60 + grammar x 0.40"]
    end

    subgraph FluencyDetail ["Fluency Gate internals"]
        FIndex["fluency-index-lv:8094\nFAISS index over LVTB corpus"]
        Embedder["sentence-embedder-lv:8093\nparaphrase-multilingual vectors"]
    end

    subgraph GrammarDetail ["Grammar Gate internals"]
        UDPipe["udpipe-lv:8092\nMorphological parsing"]
    end

    Assess{"quality_score threshold"}
    GoodLabel["overall_assessment: good\n(>= 0.85)"]
    AcceptLabel["overall_assessment: acceptable\n(>= 0.70)"]
    PoorLabel["overall_assessment: poor\n(>= 0.55)"]
    InvalidLabel["overall_assessment: invalid\n(< 0.55)"]

    RepairCheck{"quality_score < 0.70?"}
    Repair["repair-loop-lv:8100\ngpt-4o-mini via OpenRouter\nPOST /repair"]
    ReValidate["Re-validate repaired text\n(best-effort β€” use original if still low)"]

    BackTrans["back-translator-lv:8104\nNLLB-200-distilled-1.3B-ct2-int8\nEN↔LV (Tesla P4 GPU, CTranslate2 INT8)"]
    ENDisplay["English shown to user\nas comprehension aid"]

    Save["Save to learning.messages\n(tilts_tezaurs, schema learning)"]

    Input --> FluencyCall
    Input --> GrammarCall
    FluencyCall --> FIndex --> Embedder
    GrammarCall --> UDPipe
    FluencyCall --> Combine
    GrammarCall --> Combine
    Combine --> Assess
    Assess --> GoodLabel
    Assess --> AcceptLabel
    Assess --> PoorLabel
    Assess --> InvalidLabel

    Combine --> RepairCheck
    RepairCheck -- Yes --> Repair --> ReValidate --> BackTrans
    RepairCheck -- No --> BackTrans

    BackTrans --> ENDisplay
    BackTrans --> Save

    style GoodLabel fill:#90EE90
    style AcceptLabel fill:#d4edda
    style PoorLabel fill:#fff3cd
    style InvalidLabel fill:#f8d7da
    style Repair fill:#ffe4b5
5. Observability Stack
graph TB
    tilts["latvian-main-site\n:5002\n/metrics"] -->|scrape 30s| prom["Prometheus\n192.168.1.8:9090"]
    cadvisor["cAdvisor\n:8082"] -->|scrape| prom
    nodeexp["node-exporter\n:9100"] -->|scrape| prom
    latvexp["latvian-exporter\n:9116"] -->|scrape| prom
    bb["blackbox-exporter\n:9115"] -->|probe| prom
    prom -->|alerts| am["Alertmanager\n:9093"]
    am -->|webhook| rem["remediation-agent\n:8888"]
    prom -->|datasource| graf["Grafana\n192.168.1.8:3001"]
    promtail["promtail\nDocker SD"] -->|logs| loki["Loki\n:3100"]
    loki -->|datasource| graf

    style prom fill:#f5a623,color:#000
    style graf fill:#5c9bd6,color:#fff
    style loki fill:#6dc066,color:#fff
    style am fill:#e87040,color:#fff
6. Anki Sync Architecture
graph LR
    headless["anki-headless\n:8765 AnkiConnect\n(content manager)"]
    sync["anki-sync\n:27701\n(sync server)"]
    laptop["User Laptop Anki\n(study client)"]

    headless -->|"creates cards, adds/removes tags\nalways UPLOADS on conflict\n(aqt/sync.py patched)"| sync
    laptop -->|"uploads study progress\n(reviews, intervals, ease)"| sync
    sync -->|"downloads content changes"| laptop

    batch["AudioBatchWorker\nSQLite: workspace/batch_tracking/media_batches.db\n(NOT PostgreSQL)"]
    omni["omnivoice-lv:8000\nXTTS TTS\ncheck model_loaded β€” HTTP 200 β‰  ready"]
    media["/mnt/data/apps/anki/\n.local/share/Anki2/User 1/collection.media/\n{note_id}_lemma.mp3\n{note_id}_example.mp3"]

    batch -->|"POST /tts_to_audio/"| omni
    omni --> media
    headless -->|"enqueues audio jobs\nvia AnkiConnect"| batch
    media -->|"syncs with collection"| sync

    style headless fill:#d4edda
    style sync fill:#b8d4e8
    style laptop fill:#fff3cd
    style omni fill:#ffe4b5
7. Lesson Synthesis Intelligence
graph TD
    transcript["Lesson Transcript"] --> rawvocab["Raw Vocabulary Extract"]
    rawvocab --> intervalcheck["Check Anki Intervals\nvia AnkiConnect :8765"]
    intervalcheck --> split{"sr_interval?"}
    split -->|"> 21d (mastered)"| excluded["EXCLUDED\nAnki long-term schedule"]
    split -->|"<= 21d (learning)"| srs["SRS Review Pool"]
    srs --> curation["GPT Curation\n8-12 review + 3-5 new"]
    newwords["New session words"] --> curation
    curation --> plan["Lesson Plan\n15 curated words"]
    plan --> tags["current_week tag applied\nFiltered deck rebuilds"]
    plan --> anki["Anki: _manage_active_window()\nOld cards suspended"]
Documents
Ai Tutor
Anki Architecture
Curriculum
Documentation Status
Emergency Procedures
Lesson Recording
Master Documentation
Monitoring
Openrouter Models
Pronunciation Checker
Quick Reference
Readme
Validation Pipeline
Diagrams

14

System Overview

4 Pipelines

User Flows

2 Flows

Guides

6