System Documentation
Comprehensive visualization of architecture, dependencies, and user interaction flows
Complete Reference
Diagrams + User Flows
About Architecture Diagrams
System Architecture Overview
These diagrams provide a comprehensive view of the Latvian Learning Platform's architecture:
- System Overview: High-level architecture and main components
- Service Dependencies: How services interact and depend on each other
- Network Topology: Physical and logical network layout
- Data Flow: How data moves through the system
- Failure Points Analysis: Critical points and resilience mechanisms
- Upload Process Flow: Complete upload and processing workflow
Tip: Click "View" to open diagrams in full resolution, or download SVG/PDF versions for use in documentation.
How TILTS Works - Complete Breakdown
Master Overview
The existing documentation showed what the system does but not how users interact with it. These documents provide complete traceability from:
- User Action β Button Click β API Endpoint β Service Chain β Database Update β UI Feedback
This enables systematic troubleshooting when users report "it's not working" by providing a complete audit trail.
### 1. [Recording Upload Flow](01-recording-upload-flow.md)
Purpose: Complete audio lesson processing pipeline
User Action: Upload audio file for lesson generation
System Scope: 17+ microservices, GPU transcription, AI processing
Key Flows:
- File upload β Temporary storage β Job queue
- Whisper transcription (GPU) β AI processing chain
- Lesson assembly β AnkiConnect export β CEFR update
Critical Dependencies:
- GPU access in LXC containers
- Database connectivity across services
- Volume mounts for model files
### 2. [Learning Exercise Flow](02-learning-exercise-flow.md)
Purpose: Interactive practice and study sessions
User Action: Complete writing/test/grammar exercises
System Scope: Dashboard frontend, Flask backend, PostgreSQL
Key Flows:
- Exercise generation from AnkiConnect data
- GPT-powered corrections and feedback
- Progress tracking and CEFR calculation
### 3. [Content Processing Pipeline](03-content-processing-pipeline.md)
Purpose: Joplin note ingestion and vocabulary processing
User Action: Add vocabulary notes in Joplin
System Scope: Background daemon, GPT enhancement, Anki sync
Key Flows:
- Joplin polling β Classification β Enhancement β Deduplication β Anki import
- Support for structured vocab, chat logs, and image extraction
### 4. [Dashboard Widget Data Flows](04-dashboard-widget-data-flows.md) β NEW
Purpose: Map data sources for each dashboard widget
User Action: View dashboard to check progress
System Scope: Real-time APIs, cached files, external services
Key Widgets:
- CEFR Progress: 7-component calculation from multiple sources
- Anki Statistics: Real-time AnkiConnect queries
- Practice Progress: JSONL file aggregation
- Daily Streak: Persistent counter with date tracking
Data Sources Mapped:
CEFR Widget β /api/cefr β cefr_summary_latest.json + PostgreSQL + AnkiConnect
Anki Widget β /api/anki β AnkiConnect direct queries
Progress Widget β /api/data β Multiple JSONL files
Streak Widget β /api/daily/streak β streak.json### 5. [Button Click Journeys](05-button-click-journeys.md) β NEW
Purpose: Complete system flows from button clicks to data persistence
User Action: Submit corrections, generate tests, create lesson plans
System Scope: End-to-end traceability with failure points
Critical Journeys Mapped:
#### "Submit for Correction" Flow
User Text Input β POST /api/practice/writing/submit β
WritingProcessor β GPT-4.1 API β Corrections β
MemoryStore β writing_scores.jsonl β
CEFRTracker β cefr_summary_latest.json β
PostgreSQL shadow-write β UI Update#### "Generate Test" Flow
Test Options β POST /api/practice/test/generate β
ExerciseGenerator β AnkiConnect queries β
AdaptiveEngine β Exercise array β
Audio file resolution β UI display#### "Create Lesson Plan" Flow
Lesson Text β POST /api/plan/create β
Archive current plan β GPT extraction β
YAML file save β AnkiConnect sync β
PostgreSQL progress init β UI redirect### 6. [Troubleshooting User Flows](06-troubleshooting-user-flows.md) β NEW
Purpose: Systematic diagnostics for user interaction failures
User Action: "It's not working" β Root cause analysis
System Scope: Decision trees, diagnostic commands, recovery procedures
Troubleshooting Patterns:
#### Widget Not Loading
flowchart TD
A[User Report] --> B[Identify Widget]
B --> C[Check API Endpoint]
C --> D[Test Service Health]
D --> E[Check Data Sources]
E --> F[Identify Root Cause]
F --> G[Apply Fix]#### Practice Submission Fails
flowchart TD
A[Button Click] --> B[Frontend Check]
B --> C[API Health]
C --> D[GPT API Status]
D --> E[File Permissions]
E --> F[Database Connectivity]
F --> G[Recovery Action]#### Progress Not Updating
flowchart TD
A[Completed Action] --> B[Check File Updates]
B --> C[Verify API Responses]
C --> D[Test Calculations]
D --> E[Check Cache Refresh]
E --> F[Fix Data Flow]### Frontend β Backend Mapping
| UI Component | Route | API Endpoint | Backend Service | Data Store |
|--------------|-------|--------------|----------------|------------|
| Dashboard CEFR Card | / | /api/cefr | CEFRTracker | cefrsummarylatest.json |
| Writing Practice | /practice | /api/practice/writing/submit | WritingProcessor + GPT | writing_scores.jsonl |
| Test Generator | /practice | /api/practice/test/generate | ExerciseGenerator | AnkiConnect |
| Lesson Plans | /plan | /api/plan/create | LessonPlanExtractor + GPT | YAML files |
| Progress Stats | /progress | /api/data | Multiple sources | JSONL + PostgreSQL |
| Vocabulary Lookup | /morphology | /api/morphology/analyze | MorphAnalyzer | Database |
### Service Interaction Patterns
#### Real-Time Services (Low Latency)
- AnkiConnect: Direct deck queries for test generation
- Dashboard APIs: Cached file reads for fast display
- Health Checks: Socket connections for service status
#### Processing Services (High Latency)
- GPT APIs: 2-10 second response times for corrections/extractions
- Audio Processing: 30-120 seconds for transcription
- Lesson Pipeline: 5-15 minutes for complete processing
#### Background Services (Asynchronous)
- Joplin Daemon: 20-second polling for new notes
- CEFR Calculator: Triggered by practice submissions
- File Cleanup: Daily maintenance jobs
### Data Flow Patterns
#### User-Triggered Updates
User Action β API Call β Service Processing β
Database Update β File Update β Cache Refresh β
UI Update (via page reload or AJAX)#### Background Updates
Scheduled Job β Data Collection β
Processing β File/Database Write β
Next UI Load Shows New Data#### External Dependencies
User Request β Internal Processing β
External API (GPT/AnkiConnect) β
Response Processing β Data Storage β
User Feedback### 1. Symptom Identification
- Which widget/feature is affected?
- What was the user trying to accomplish?
- What error message or behavior was observed?
### 2. Flow Isolation
- Identify the specific user journey involved
- Locate the primary API endpoint
- Check the service chain dependencies
### 3. Component Testing
- Test frontend JavaScript (browser console)
- Test API endpoint directly (curl)
- Test service dependencies (health checks)
- Test data stores (file existence, database queries)
### 4. Root Cause Analysis
- Follow the data flow backward from failure point
- Check logs at each service boundary
- Verify external API availability
- Confirm configuration and permissions
### 5. Recovery Actions
- Immediate: Restart failed services
- Short-term: Fix configuration or permissions
- Long-term: Address systemic issues
### Existing Architecture Documents
- System Architecture Diagrams: Show service relationships
- Service Interactions: Define API contracts
- Database Schema: Data storage structure
### New User Journey Documents
- Complete Flow Tracing: User action to system response
- Troubleshooting Guidance: Failure diagnosis and recovery
- Performance Monitoring: Critical path identification
### Combined Value
- Developers: Understand impact of code changes on user experience
- Operations: Diagnose user-reported issues systematically
- Users: Self-service troubleshooting for common issues
### Emergency Diagnostics
# Check all critical services
curl -s http://localhost:5002/health | jq '.microservices.down'
# Test user critical paths
curl -s http://localhost:5002/api/cefr >/dev/null && echo "Dashboard: OK"
curl -s http://localhost:8765 >/dev/null && echo "Anki: OK"
docker exec latvian-postgres pg_isready && echo "DB: OK"### Common Issues
1. Dashboard Empty: Check cefrsummarylatest.json existence
2. Anki Unavailable: Restart anki-headless container
3. Practice Fails: Check GPT API key and quota
4. Tests Empty: Verify AnkiConnect deck access
5. Plans Don't Save: Check lesson_plans/ permissions
### Recovery Commands
# Restart core services
docker restart latvian-main-site anki-headless latvian-postgres
# Fix file permissions
sudo chown -R david:david /srv/latvian_learning/logs/
# Reset corrupted state
rm /srv/latvian_learning/logs/memory/corrupt_file.jsonl
systemctl --user restart latvian_dashboard.service---
Document Status: Complete user interaction flow documentation suite
Last Updated: 2026-05-17
Purpose: Bridge the gap between system architecture and user experience troubleshooting
Next Steps: Integrate with monitoring system for automated flow validation
Detailed Processing Pipelines
Recording Upload Flow
### Phase 1: Initial Access & Upload
#### Step 1.1: User Navigation
- User Action: Access https://latvian.shifting-ground.link/recording
- System Response: Nginx proxy routes to 192.168.1.11:5002 (latvian-main-site container)
- Frontend Load: Flask serves upload interface with drag-and-drop (200MB limit)
- Dependencies:
- Nginx proxy configuration
- Docker container latvian-main-site running on port 5002
- Flask application routing
Diagnostic Commands:
# Check domain resolution and proxy
curl -I https://latvian.shifting-ground.link/recording
# Check direct container access
curl -I http://192.168.1.11:5002/recording
# Verify container status
docker ps | grep latvian-main-site#### Step 1.2: File Selection & Validation
- User Action: Drag/drop or select audio file
- Frontend Validation:
- File size check (β€200MB)
- File type validation (audio formats)
- JavaScript client-side checks
- UI Feedback: Progress indicators, file info display
Potential Failure Points:
- File size exceeding 200MB limit
- Unsupported audio format
- JavaScript errors preventing validation
#### Step 1.3: Upload Initiation
- User Action: Click "Upload & Process" button
- Frontend: Creates multipart/form-data POST request to /recording/upload
- Network Path: User browser β Nginx β Docker bridge β latvian-main-site:5000
### Phase 2: Backend Processing Initiation
#### Step 2.1: Flask Route Handling
- Endpoint: POST /recording/upload
- Process:
1. Flask receives multipart form data
2. Validates file on server side
3. Extracts audio file from form
4. Generates unique jobid (timestamp-based)
File Locations:
- Temp Storage: /srv/latvianlearning/tempuploads/
- Processing: /srv/latvianlearning/workspace/jobs/
#### Step 2.2: AudioIngestor Invocation
- Module: AudioIngestor (inferred from architecture analysis)
- Operations:
1. Creates job directory structure
2. Moves file from temp to processing workspace
3. Generates metadata.json with job details
4. Audio format normalization to standard WAV
Directory Structure Created:
/srv/latvian_learning/workspace/jobs/pending//
βββ raw_audio.wav # Normalized audio input
βββ metadata.json # Job metadata and timestamps
βββ status.json # Processing stage tracker #### Step 2.3: Queue Placement
- Action: Job moved to pending queue
- Location: /srv/latvianlearning/workspace/jobs/pending/
- Status: status.json initialized with "stage": "transcribing"
- Trigger: Background lessonagent.py picks up pending jobs
### Phase 3: AI Processing Pipeline
#### Step 3.1: Transcription Stage
- Processor: lessonagent.py β WhisperProcessor
- Service Called: asr-transcription-lv:8101 (GPU-dependent)
- Input: rawaudio.wav
- Output: transcript.json
- Technology: Whisper model with CUDA acceleration
Service Communication:
# Internal container communication
latvian-main-site β asr-transcription-lv:8101
# Network: latvian_network (172.18.0.0/16)
# Protocol: HTTP POST with audio dataCritical Dependencies:
- GPU device access (deviceids: ['1'])
- NVIDIA Docker runtime
- Whisper model files in /srv/latvianxtts/models/whisper-lv-ct2
- CUDA environment variables
Diagnostic Commands:
# Check GPU access in container
docker exec asr-transcription-lv nvidia-smi
# Test service health
curl http://192.168.1.11:8101/health
# Check model files
docker exec asr-transcription-lv ls -la /srv/latvian_xtts/models/whisper-lv-ct2#### Step 3.2: Segmentation Stage
- Processor: LessonSegmenter module
- Input: transcript.json
- Outputs:
- transcriptnorm.json (normalized transcript)
- dialoguelines.json (segmented dialogue)
- Processing: Text normalization, dialogue segmentation
#### Step 3.3: Content Generation Pipeline
Service Chain (Sequential Processing):
graph TD
A[transcript_json] --> B[UDPipe_Analysis_8092]
B --> C[Sentence_Embeddings_8093]
C --> D[Fluency_Scoring_8094]
D --> E[Validation_Gateway_8097]
E --> F[Template_Extraction_8098]
F --> G[Constrained_Generation_8099]
G --> H[Repair_Loop_8100]
H --> I[lesson_json_final_output]Detailed Service Interactions:
| Stage | Service | Port | Input | Output | Purpose |
|-------|---------|------|-------|--------|---------|
| Morphology | udpipe-lv | 8092 | Text segments | Linguistic analysis | Word structure analysis |
| Embedding | sentence-embedder-lv | 8093 | Analyzed text | Vector embeddings | Semantic understanding |
| Fluency | fluency-index-lv | 8094 | Embeddings | Fluency scores | Difficulty assessment |
| Validation | comprehensive-validation-gateway | 8097 | Scored content | Validated content | Quality control |
| Extraction | template-extractor-lv | 8098 | Validated text | Language patterns | Educational content |
| Generation | constrained-generator-lv | 8099 | Patterns | Generated exercises | AI content creation |
| Refinement | repair-loop-lv | 8100 | Generated content | Polished content | Quality improvement |
Database Dependencies:
# Multiple services require database access
morphological-analyzer-lv β latvian-postgres:5432/tilts_tezaurs
vocabulary-database-lv β latvian-postgres:5432/tilts_tezaurs
template-extractor-lv β latvian-postgres:5432/tilts_tezaurs
latvian-main-site β latvian-postgres:5432 (multiple databases)### Phase 4: Export & Integration
#### Step 4.1: Lesson Assembly
- Stage: assembling
- Input: All processed content from previous stages
- Output: lesson.json (complete lesson package)
- Process: Consolidates all AI-generated content into structured lesson
#### Step 4.2: Anki Export Preparation
- Module: IntelligentCardGenerator
- Input: lesson.json
- Output: ankicards.json
- Enhancement: GPT-powered enrichment (phonetic, examples, grammar notes)
#### Step 4.3: AnkiConnect Integration
- Service: anki-headless:8765
- Process:
1. Connect to AnkiConnect API
2. Deduplication check against existing cards
3. Card creation with 4-Card Template v2 structure
4. CEFR tracking updates
Anki Card Structure:
- Deck: Latvian (ChatGPT) > Vocab & Sentences
- Template: 4-Card Template v2
- Fields: latvian, english, phonetic, gender, plural, examplelv, exampleen, notes, audio-latvian, image, morphology
### Phase 5: Job Completion
#### Step 5.1: Status Updates
- Action: status.json updated to "stage": "done"
- Location: Job moved to /srv/latvianlearning/workspace/jobs/done/
- Logs: Processing logs saved for diagnostic purposes
#### Step 5.2: User Notification
- UI Update: Progress indicator shows completion
- Dashboard: New vocabulary appears in CEFR tracking dashboard
- Anki: Cards available for study in Anki application
User Audio Upload
β
Flask Temp Storage β AudioIngestor β Pending Queue
β
Whisper Transcription (GPU) β transcript.json
β
Segmentation β dialogue_lines.json + transcript_norm.json
β
AI Processing Chain (17 services) β Enhanced Content
β
Lesson Assembly β lesson.json
β
AnkiConnect Export β Flashcards in Anki
β
CEFR Tracking Update β Dashboard Statistics### 1. GPU Access Issues (HIGH RISK)
Services Affected: asr-transcription-lv, back-translator-lv
Problem: LXC may not properly expose GPU devices
Diagnostic Steps:
# Check GPU visibility in containers
docker exec asr-transcription-lv nvidia-smi
docker exec back-translator-lv nvidia-smi
# Check NVIDIA Docker runtime
docker info | grep -i nvidia
# Verify device mappings
docker inspect asr-transcription-lv | grep -A5 DeviceRequests### 2. Database Connectivity (HIGH RISK)
Services Affected: Multiple AI services requiring database access
Problem: Connection strings or network routing issues
Diagnostic Steps:
# Test database connectivity from dependent services
docker exec morphological-analyzer-lv nc -z latvian-postgres 5432
docker exec vocabulary-database-lv nc -z latvian-postgres 5432
docker exec template-extractor-lv nc -z latvian-postgres 5432
# Check database container
docker exec latvian-postgres pg_isready -U postgres### 3. Volume Mount Failures (MEDIUM RISK)
Affected: Model files, workspace directories
Problem: File permissions or mount point changes
Diagnostic Steps:
# Check workspace permissions
docker exec latvian-main-site ls -la /srv/latvian_learning/workspace/
docker exec asr-transcription-lv ls -la /srv/latvian_xtts/models/
# Verify volume mounts
docker inspect latvian-postgres | grep -A10 Mounts
docker inspect asr-transcription-lv | grep -A10 Mounts### 4. Service Communication (MEDIUM RISK)
Problem: Internal Docker network routing issues
Diagnostic Steps:
# Test internal service connectivity
docker exec latvian-main-site wget -q -O- http://asr-transcription-lv:8101/health
docker exec latvian-main-site wget -q -O- http://comprehensive-validation-gateway:8097/health
# Check Docker network
docker network inspect latvian_network### Immediate Recovery Steps
1. Check container health: docker ps --format "table {{.Names}}\t{{.Status}}"
2. Restart failed services: docker-compose restart
3. Check logs: docker logs
4. Test service endpoints: Use curl to verify service health endpoints
### Advanced Diagnostics
1. Network debugging: Use docker exec to test internal connectivity
2. GPU troubleshooting: Verify NVIDIA runtime and device access
3. Database verification: Check PostgreSQL connection and schema
4. Volume inspection: Verify file permissions and mount points
### Key Metrics to Track
- Upload success rate: Percentage of successful file uploads
- Processing time: End-to-end time from upload to Anki export
- Service health: Individual microservice availability
- GPU utilization: Monitor GPU memory and compute usage
- Database performance: Query response times and connection counts
### Monitoring Endpoints
- Main Health: http://192.168.1.11:5002/health
- Services Status: http://192.168.1.11:5002/services/status
- Individual Services: http://192.168.1.11:
---
Document Status: Complete user journey mapping for recording upload flow
Last Updated: 2026-05-16
Next Steps: Validate each step against current system state post-migration
Learning Exercise Flow
### Phase 1: Educational Dashboard Access
#### Step 1.1: Dashboard Navigation
- Entry Point: https://latvian.shifting-ground.link or http://192.168.1.11:5002
- Landing: Main educational dashboard
- Navigation Options:
- Dashboard (CEFR tracking)
- Lesson Plans (weekly structure)
- Stats (Anki integration)
- Practice (interactive exercises)
- Shadowing (audio practice)
Service Chain:
User Browser β Nginx Proxy β latvian-main-site:5000 (Flask)Template System:
- Base: Flask with Jinja2 templating
- Navigation: Centralized via templates/partials/navbar.html
- Styling: Mobile-responsive CSS with breakpoints (768px, 375px)
#### Step 1.2: CEFR Progress Display
- Data Source: logs/agent/cefrsummarylatest.json
- Components Tracked:
- Vocabulary (35%) - Unique words in Anki
- Maturity (20%) - Card intervals β₯21 days
- Quality (15%) - Retention + consistency + streak
- Writing (12%) - GPT correction scores
- Tests (8%) - Quiz accuracy
- Grammar (8%) - Grammar exercise scores
- AI Assessment (2%) - Weekly GPT evaluation
Score Calculation:
A1-: 0.0-0.5
A1: 0.5-0.8
A1+: 0.8-1.0
A2-: 1.0-1.3
A2: 1.3-1.7
A2+: 1.7-2.0### Phase 2: Lesson Plan System
#### Step 2.1: Weekly Lesson Access
- Route: /plan or /plan/
- Data Source: lessonplans/*.yaml files
- Structure:
- Vocabulary (~20 words with focus forms)
- Grammar topic with rules and examples
- Writing prompts with requirements
- Exercise tracking
Lesson Plan Structure:
meta:
id: 2025_w48
title: "Locative Case & Places"
week_number: 48
due_date: "2025-12-06"
vocabulary:
- latvian: "veikals"
english: "store"
gender: "m"
focus_forms:
- form: "veikalΔ"
case: "locative"
usage: "Es esmu veikalΔ."
grammar:
topic: "Locative Case"
rules:
- pattern: "-a β -Δ"
examples: [{base: "mΔja", inflected: "mΔjΔ"}]#### Step 2.2: Lesson Lifecycle Management
- Active Lesson: One lesson active at a time
- Archival: Creating new lesson archives current one
- Carry-Forward: Unmastered vocab (max 10) and incomplete writing (max 2) transfer to new lesson
- Priority Algorithm:
priority = (1 - combined_accuracy) * 100 + min(total_attempts, 20)
+ min(anki_lapses * 10, 30) # Anki failure penalty
+ recent_failure_boost(25) # Failed within 7 days
```
#### Step 2.3: Anki Integration
- **Sync Endpoint:** `/api/plan//anki/sync`
- **Tags Applied:** `week_XX::vocab`, `week_XX::grammar`, `current_week`
- **Image Generation:** DALL-E icons for cards without images
- **Filtered Deck:** `tag:current_week` shows current week's vocabulary
### Phase 3: Interactive Practice System
#### Step 3.1: Writing Practice Flow
**Route:** `/practice/writing`
**User Journey:**
1. User selects writing prompt (configurable or lesson-based)
2. Writes Latvian text in text area (minimum word count enforced)
3. Submits for GPT correction via `/api/practice/writing/submit`
4. Receives instant feedback with error categorization
5. Views corrected text with inline comparisons
6. Score recorded to CEFR writing component
**GPT Processing:**
- **Model:** gpt-4.1 (tutor model slot)
- **Error Categories:** Spelling, grammar, case, gender, word order
- **Output:** Corrected text + detailed feedback + score (0-100)
**Data Persistence:**
- **File:** `logs/memory/writing_scores.jsonl`
- **Format:** `{"timestamp": "...", "score": 85, "errors": 3, "prompt": "..."}`
#### Step 3.2: Test Practice Flow
**Route:** `/practice/test`
**Exercise Types:**
1. **Fill-in-blank:** Complete sentences with missing words
2. **Translation LVβEN:** Translate Latvian to English
3. **Translation ENβLV:** Translate English to Latvian
4. **Multiple Choice:** Select correct answer from options
5. **Listening:** Audio comprehension with HyperTTS
**Question Generation Process:** User selects strategy β API call to /api/practice/test/generate
β
AnkiConnect queries cards β Strategy filtering:
- weak: Low success rate cards
- recent: Recently added cards
- random: Random selection
- mixed: Equal distribution of all types
β
GPT generates questions β Format validation β Return to frontend
**Audio Integration:**
- **Source:** Anki HyperTTS audio files
- **Verification:** Audio files must exist in Docker container
- **Path:** `/api/anki/audio/`
- **Format:** MP3 files generated by HyperTTS
#### Step 3.3: Grammar Practice Flow
**Route:** `/practice/grammar`
**Topic Sources:**
1. **A1 Grammar Bank** (18 topics) - Always available
2. **Lesson Plan Topics** - From weekly lesson plans
**A1 Grammar Bank Topics:**
- Personal Pronouns, Verb "bΕ«t", Noun Genders
- Cases: Nominative, Accusative, Locative, Dative
- Tenses: Present, Past, Future
- Adjective Agreement, Possessive/Demonstrative Pronouns
- Question Words, Negation, Numbers 1-100
- Prepositions, Reflexive Verbs
**Exercise Types:**
1. **Fill-in-blank:** Grammar pattern completion
2. **Multiple Choice:** Select correct grammatical form
3. **Transformation:** Convert between tenses/cases
4. **Error Correction:** Identify and fix grammatical errors
5. **Conjugation:** Verb form exercises
**Scoring & Tracking:**
- **Individual answers:** Immediate feedback with explanations
- **Final score:** Recorded to CEFR grammar component
- **Data:** `logs/memory/grammar_scores.jsonl`
- **Mastery tracking:** Per-topic progress tracking
### Phase 4: Shadowing Practice System
#### Step 4.1: Shadowing Interface
**Route:** `/shadow`
**Features:**
- **Speed Control:** 0.8x (slow), 1.0x (normal), 1.1x (fast)
- **Composite Audio:** Progressive practice tracks
- **Dual Display:** Latvian screenplay + English translation
- **Voice Variety:** Betty (female) and John (male) Latvian voices
#### Step 4.2: Audio Generation Pipeline
**TTS Service:** Narakeet API
**Voice Configuration:** yamlnarakeet:
voices:
female: "betty" # Latvian female voice
male: "john" # Latvian male voice
speeds:
slow: 0.8
normal: 1.0
fast: 1.1
**Audio Processing:**
1. User creates/selects dialogue
2. Text split into lines with speaker assignment
3. Narakeet API calls for each line/speed combination
4. Audio files cached with hash-based naming
5. Composite tracks generated (slow β normal β fast)
**File Structure:**workspace/media/shadowing/
βββ line1slow.mp3 # Individual line audio
βββ line1normal.mp3
βββ line1fast.mp3
βββ compositeslow.mp3 # Combined track for speed
βββ compositenormal.mp3
βββ compositefast.mp3
#### Step 4.3: Dialogue Management
**API Endpoints:**
- `GET /api/shadow/list` - List all dialogues
- `POST /api/shadow/dialogue` - Create new dialogue
- `PATCH /api/shadow/dialogue/` - Rename dialogue
- `DELETE /api/shadow/dialogue/` - Delete dialogue
- `POST /api/shadow/dialogue//generate` - Regenerate audio
**Dialogue Structure:** json{
"id": "dialogueuuid",
"title": "At the Store",
"lines": [
{"speaker": "A", "textlv": "Labdien!", "texten": "Hello!"},
{"speaker": "B", "textlv": "Sveiki!", "texten": "Hi!"}
],
"createdat": "2026-05-16T10:30:00Z"
}
### Phase 5: Progress Tracking & Analytics
#### Step 5.1: Daily Suggestions System
**Endpoint:** `/api/daily/suggestions`
**Recommendation Engine:**
- **Streak tracking:** Persistent storage in `logs/memory/streak.json`
- **Weak cards identification:** Based on Anki success rates
- **Balanced practice:** Writing, vocabulary, grammar rotation
- **Adaptive goals:** Based on streak length and performance
**Streak Calculation:**json{
"currentstreak": 7,
"lastactivity": "2026-05-16",
"longeststreak": 15,
"totaldays": 45,
"activities": ["writing", "vocabulary", "grammar"]
}
``
#### Step 5.2: Comprehensive Statistics
Endpoint:
/api/stats/full`Data Sources:
- Anki stats: Card counts, retention, intervals
- Practice history: Writing/test/grammar scores
- Learning suggestions: Weak areas identification
- CEFR progression: Historical tracking
Integrated Display:
- Dashboard charts: CEFR component visualization
- Progress metrics: Study consistency, retention rates
- Performance trends: Improvement over time
- Recommendations: Personalized study suggestions
### Core Service Interactions
flowchart TD
A[User Interface Flask] --> B["Practice APIs /api/practice/"]
B --> C[GPT Processing OpenAI API]
C --> D["AnkiConnect anki-headless:8765"]
D --> E[Data Persistence File System]
E --> F[CEFR Calculations and Dashboard Updates]### Database Dependencies
- AnkiConnect: Card management, statistics
- File System: Progress tracking, scores, lesson plans
- GPT API: Content generation, correction, scoring
### Critical Integration Points
1. Anki Integration:
- Health Check: /api/anki/status verifies AnkiConnect
- Card Queries: Real-time card data for exercise generation
- Audio Access: Serving HyperTTS files via Flask
2. GPT Processing:
- Model Selection: ai_router.py manages model assignments
- Rate Limiting: Built-in retry and backoff mechanisms
- Quality Control: Response validation and error handling
3. Audio System:
- Narakeet API: External TTS service with caching
- File Management: Hash-based naming prevents duplication
- Streaming: Direct file serving for audio playback
### High-Risk Areas
1. AnkiConnect Unavailable:
- Impact: No exercise generation, progress tracking disabled
- Recovery: Check anki-headless container, restart if needed
- Diagnostic: curl http://192.168.1.11:8765/version
2. GPT API Limits:
- Impact: Writing corrections, exercise generation fail
- Recovery: Retry with exponential backoff
- Monitoring: Track API usage and rate limits
3. Audio File Corruption:
- Impact: Listening exercises, shadowing practice affected
- Recovery: Regenerate audio through Narakeet API
- Prevention: Hash verification, backup strategies
### Monitoring & Alerting
Health Endpoints:
- Main Dashboard: http://192.168.1.11:5002/health
- AnkiConnect: http://192.168.1.11:8765/version
- Service Status: http://192.168.1.11:5002/services/status
Key Metrics:
- Exercise completion rate: Percentage of started exercises completed
- Audio playback success: Audio file availability and playback
- API response times: GPT and Narakeet response latencies
- Data persistence: File write success rates
---
Document Status: Complete learning exercise flow mapping
Last Updated: 2026-05-16
Next Steps: Validate exercise functionality against current system state
Content Processing Pipeline
### Processing Stages Overview
Raw Audio β Transcription β Linguistic Analysis β Content Generation β Quality Control β Educational Output### Service Topology
flowchart TD
A[Raw Audio Input] --> B["ASR Transcription :8101 GPU"]
B --> C[Text Normalization and Segmentation]
C --> D{Parallel Processing}
D --> E["UDPipe Analysis :8092"]
D --> F["Sentence Embedder :8093"]
D --> G["Morphological Analyzer :8087"]
D --> H["Vocabulary Database :8088"]
E & F & G & H --> I{Fluency Assessment}
I --> J["Fluency Index :8094"]
I --> K["Fluency Gate :8095"]
J & K --> L{Validation and Quality Control}
L --> M["Grammar Gate :8096"]
L --> N["Grammar Correction :8103"]
L --> O["Comprehensive Validation Gateway :8097"]
M & N & O --> P{Content Generation}
P --> Q["Template Extractor :8098"]
P --> R["Constrained Generator :8099"]
P --> S["Repair Loop :8100"]
Q & R & S --> T{Export and Integration}
T --> U[Anki Cards Generation]
T --> V[Lesson Plan Integration]
T --> W[CEFR Progress Updates]### Stage 1: Audio Transcription
#### Service: ASR Transcription (Port 8101)
Technology: OpenAI Whisper with CUDA acceleration
Container: asr-transcription-lv
GPU Requirements: Device ID 1, Tesla P4 or equivalent
Processing Flow:
1. Input Validation:
- Audio format verification (WAV, MP3, M4A, etc.)
- Duration limits (typically 30+ minutes supported)
- File size validation (up to several GB)
2. Audio Preprocessing:
- Format normalization to 16kHz WAV
- Noise reduction (optional)
- Volume normalization
3. Whisper Inference:
- Model: whisper-lv-ct2 (Latvian-optimized)
- Location: /srv/latvianxtts/models/whisper-lv-ct2
- Output: Timestamped transcript with confidence scores
Output Format (transcript.json):
{
"language": "lv",
"duration": 1247.5,
"segments": [
{
"id": 0,
"start": 0.0,
"end": 3.2,
"text": "Sveiki, Ε‘odien mΔs runΔsim par...",
"avg_logprob": -0.15,
"no_speech_prob": 0.001
}
]
}Critical Dependencies:
- GPU Access: NVIDIA runtime, CUDA libraries
- Model Files: Pre-trained Whisper Latvian model
- Memory: ~4-8GB GPU memory for inference
- Network: HTTP API endpoint for audio upload
Diagnostic Commands:
# Check GPU access
docker exec asr-transcription-lv nvidia-smi
# Test service health
curl -X POST http://192.168.1.11:8101/transcribe \
-F "audio=@test.wav"
# Check model loading
docker exec asr-transcription-lv ls -la /srv/latvian_xtts/models/whisper-lv-ct2### Stage 2: Text Normalization & Segmentation
#### Module: LessonSegmenter
Purpose: Convert raw transcript into structured dialogue lines
Input: transcript.json
Outputs: transcriptnorm.json, dialoguelines.json
Processing Steps:
1. Text Normalization:
- Remove filler words, stutters
- Standardize punctuation
- Fix common transcription errors
- Apply Latvian-specific text cleaning rules
2. Dialogue Segmentation:
- Identify speaker boundaries
- Split into logical sentence units
- Preserve semantic coherence
- Extract meaningful dialogue exchanges
Output Structure:
{
"lines": [
{
"id": 1,
"start_time": 0.0,
"end_time": 3.2,
"speaker": "A",
"text": "Labdien! KΔ jums klΔjas?",
"confidence": 0.95
}
],
"metadata": {
"total_lines": 45,
"speakers_detected": 2,
"avg_line_duration": 2.8
}
}### Stage 3: Parallel Linguistic Analysis
#### Service: UDPipe (Port 8092)
Purpose: Universal Dependencies parsing for Latvian
Technology: UDPipe neural pipeline
Database Dependency: latvian-postgres:5432/tiltstezaurs
Processing Capabilities:
- Tokenization: Word and sentence boundaries
- Part-of-Speech Tagging: Grammatical categories
- Lemmatization: Base word forms
- Dependency Parsing: Syntactic relationships
- Morphological Analysis: Case, gender, number, tense
Output Format:
{
"tokens": [
{
"id": 1,
"form": "Labdien",
"lemma": "labdien",
"upos": "INTJ",
"feats": "Polarity=Pos",
"head": 0,
"deprel": "root"
}
]
}#### Service: Sentence Embedder (Port 8093)
Purpose: Generate semantic vector representations
Technology: Sentence-transformers, multilingual models
Processing:
1. Text Encoding: Convert text to dense vectors
2. Semantic Analysis: Capture meaning and context
3. Similarity Scoring: Enable content comparison
4. Clustering: Group similar content
Output: 384 or 768-dimensional embeddings per sentence
#### Service: Morphological Analyzer (Port 8087)
Purpose: Advanced morphological analysis
Database Connection: Direct PostgreSQL access for dictionary lookup
Analysis Components:
- Word Structure: Prefix + root + suffix breakdown
- Inflection Patterns: Declension and conjugation rules
- Derivational Morphology: Word formation patterns
- Compound Analysis: Multi-part word decomposition
#### Service: Vocabulary Database (Port 8088)
Purpose: Comprehensive dictionary lookup and validation
Database: tilts_tezaurs - Latvian vocabulary database
Capabilities:
- Word Validation: Check against authoritative dictionary
- Definition Lookup: Retrieve meanings and usage examples
- Frequency Analysis: Word commonality scoring
- CEFR Classification: Difficulty level assignment
### Stage 4: Fluency Assessment
#### Service: Fluency Index (Port 8094)
Purpose: Comprehensive fluency scoring for learning content
Assessment Criteria:
1. Vocabulary Complexity: Word difficulty distribution
2. Syntactic Complexity: Sentence structure analysis
3. Semantic Coherence: Logical flow and clarity
4. Phonological Difficulty: Pronunciation challenges
5. Cultural Context: Idiomatic expressions, cultural references
Scoring Algorithm:
fluency_score = (
vocabulary_score * 0.3 +
syntax_score * 0.25 +
coherence_score * 0.2 +
phonology_score * 0.15 +
cultural_score * 0.1
)Output Range: 0.0 (A1 beginner) to 2.0 (A2+ advanced)
#### Service: Fluency Gate (Port 8095)
Purpose: Learning-appropriate content filtering
Technology: FAISS similarity search with LVTB (Latvian Treebank)
Gating Process:
1. Similarity Search: Compare against known CEFR-level content
2. Difficulty Assessment: Calculate objective difficulty metrics
3. Learning Suitability: Filter for A1-A2 appropriateness
4. Adaptation Recommendations: Suggest simplifications if needed
### Stage 5: Quality Control & Validation
#### Service: Grammar Gate (Port 8096)
Purpose: Grammatical correctness validation
Technology: UDPipe + rule-based grammar checking
Validation Checks:
- Case Agreement: Subject-object case relationships
- Gender Concordance: Adjective-noun agreement
- Verb Conjugation: Proper tense and person forms
- Word Order: Standard Latvian syntax patterns
#### Service: Grammar Correction (Port 8103)
Purpose: Automated grammar error detection and correction
Correction Capabilities:
- Error Detection: Identify grammatical mistakes
- Repair Suggestions: Provide correction options
- Confidence Scoring: Rate correction certainty
- Learning Integration: Generate educational explanations
#### Service: Comprehensive Validation Gateway (Port 8097)
Purpose: Orchestrate parallel fluency and grammar validation
Processing Flow:
Input Content
β
Parallel Processing:
βββ Fluency Gate (8095) - FAISS similarity scoring
βββ Grammar Gate (8096) - UDPipe rule validation
β
Results Aggregation:
βββ Combined quality score
βββ Content appropriateness rating
βββ Improvement recommendations
β
Validated Output### Stage 6: Content Generation & Enhancement
#### Service: Template Extractor (Port 8098)
Purpose: Extract linguistic patterns and templates
Database Dependency: PostgreSQL for pattern storage
Extraction Process:
1. Pattern Recognition: Identify recurring language structures
2. Template Generation: Create reusable sentence templates
3. Variation Detection: Find pattern variations and exceptions
4. Educational Tagging: Mark patterns for learning focus
Template Format:
{
"pattern": "Es [verb] [object]",
"examples": ["Es lasu grΔmatu", "Es dzeru kafiju"],
"difficulty": "A1",
"frequency": 0.85
}#### Service: Constrained Generator (Port 8099)
Purpose: AI-powered content generation with grammatical constraints
Technology: GPT-4.1-mini with custom prompting
Generation Capabilities:
- Exercise Creation: Generate fill-in-the-blank exercises
- Example Sentences: Create usage examples for vocabulary
- Dialogue Extensions: Expand conversation scenarios
- Cultural Adaptations: Localize content for Latvian context
Constraints Applied:
- CEFR Level: A1-A2 vocabulary and grammar only
- Cultural Appropriateness: Latvia-specific contexts
- Grammatical Accuracy: Validated sentence structures
- Learning Objectives: Aligned with educational goals
#### Service: Repair Loop (Port 8100)
Purpose: Iterative content quality improvement
Repair Process:
1. Quality Assessment: Evaluate generated content
2. Error Detection: Identify linguistic or pedagogical issues
3. Iterative Improvement: Apply corrections and refinements
4. Validation Cycles: Multiple passes until quality threshold met
5. Final Approval: Human-readable quality report
### Stage 7: Educational Content Assembly
#### Module: IntelligentCardGenerator
Purpose: Transform processed content into structured educational materials
Card Generation Process:
1. Content Extraction: Pull key vocabulary and phrases
2. GPT Enhancement: Add phonetic transcriptions, examples, grammar notes
3. Morphological Enrichment: Add word structure analysis
4. Image Integration: Generate DALL-E prompts for visual learning
5. Audio Preparation: Prepare for TTS integration
Card Types Generated:
1. Vocabulary Cards: Core word learning
2. Sentence Cards: Contextual usage
3. Pattern Cards: Grammar structure practice
4. Dialogue Cards: Conversation scenarios
#### Anki Integration Pipeline
Service: AnkiConnect (anki-headless:8765)
Export Process:
1. Card Structure: Format for 4-Card Template v2
2. Deduplication: Check against existing Anki database
3. Tag Management: Apply lesson and week tags
4. Image Handling: Generate or retrieve visual content
5. Audio References: Link to HyperTTS audio generation
Final Card Structure:
{
"deckName": "Latvian (ChatGPT)::Vocab & Sentences",
"modelName": "4-Card Template v2",
"fields": {
"latvian": "grΔmata",
"english": "book",
"phonetic": "ΛgrΙΛ.mΙ.tΙ",
"gender": "f",
"plural": "grΔmatas",
"example_lv": "Es lasu jaunu grΔmatu.",
"example_en": "I am reading a new book.",
"notes": "Feminine noun, 4th declension",
"morphology": "grΔmat + a β’ Related: grΔmatvedΔ«ba, grΔmatnieks",
"image_prompt": "Simple illustration of an open book with Latvian text visible"
}
}### File System Organization
/srv/latvian_learning/workspace/jobs//
βββ raw_audio.wav # Original audio input
βββ transcript.json # Whisper output
βββ transcript_norm.json # Normalized transcript
βββ dialogue_lines.json # Segmented content
βββ linguistic_analysis/ # UDPipe outputs
βββ fluency_scores.json # Fluency assessment
βββ validation_results.json # Quality control outcomes
βββ generated_content/ # AI-generated materials
βββ lesson.json # Final lesson package
βββ anki_cards.json # Anki-ready cards
βββ status.json # Processing state ### Database Dependencies
PostgreSQL Connections:
- tiltstezaurs: Vocabulary and morphological data
- lessontracking: Educational progress data
- userpreferences: Personalization settings
Critical Tables:
- vocabulary: Word definitions, CEFR levels, frequency
- morphology: Word structure, inflection patterns
- templates: Language pattern library
- learningprogress: User advancement tracking
### Processing Times (Typical)
| Stage | Duration | Bottleneck |
|-------|----------|------------|
| Audio Transcription | 2-5 minutes | GPU availability |
| Linguistic Analysis | 30-60 seconds | Database queries |
| Fluency Assessment | 15-30 seconds | Vector computations |
| Content Generation | 1-2 minutes | GPT API calls |
| Anki Export | 10-20 seconds | Network I/O |
Total Pipeline: 5-10 minutes for 30-minute audio
### Optimization Strategies
1. GPU Scheduling: Queue management for transcription jobs
2. Database Connection Pooling: Reduce connection overhead
3. Caching: Store intermediate results for similar content
4. Parallel Processing: Concurrent linguistic analysis
5. Batch Operations: Group similar processing tasks
### Failure Points & Recovery
1. GPU Memory Exhaustion:
``bash`
# Monitor GPU usage
docker exec asr-transcription-lv nvidia-smi
# Restart service if needed
docker-compose restart asr-transcription-lv
2. Database Connection Failures:
`bash`
# Test connectivity
docker exec morphological-analyzer-lv nc -z latvian-postgres 5432
# Check PostgreSQL status
docker exec latvian-postgres pg_isready
3. Service Communication Timeouts:
`bash`
# Test internal network
docker exec latvian-main-site wget -qO- http://udpipe-lv:8092/health
# Check service logs
docker logs comprehensive-validation-gateway --tail 50
4. Content Quality Failures:
- Automatic Retry: Repair loop attempts correction
- Manual Review: Flag for human oversight
- Fallback Content: Use simpler alternatives
### Monitoring & Alerting
Health Check Endpoints:
- Individual services: http://http://192.168.1.11:5002/services/status
- Pipeline status: /workspace/jobs/` directories
- Processing queue: Monitor
Key Metrics:
- Transcription accuracy: WER (Word Error Rate)
- Processing throughput: Jobs per hour
- Service availability: Uptime percentage
- Quality scores: Average fluency and grammar ratings
- Resource utilization: GPU, CPU, memory usage
---
Document Status: Complete content processing pipeline mapping
Last Updated: 2026-05-16
Next Steps: Validate service interactions and optimize bottlenecks post-migration
Dashboard Widget Data Flows
Data Sources and Update Mechanisms
Each dashboard widget pulls data from specific sources through defined API endpoints. This section maps the complete data flow for each widget from source to display.
CEFR Progress Widget
Location: Top-left card on dashboard
Update Frequency: On page load + manual refresh
Data Source: logs/agent/cefr_summary_latest.json
Components: Vocabulary (35%), Maturity (20%), Quality (15%), Writing (12%), Tests (8%), Grammar (8%)
Primary endpoint: GET /api/cefr
Anki Statistics Widget
Location: Top-right card on dashboard
Update Frequency: On page load
Data Source: AnkiConnect API at anki-headless:8765
Displays: Total cards, Mature cards, Reviews today, Deck health
Primary endpoint: GET /api/anki
Practice Progress Widget
Location: Bottom section of dashboard
Update Frequency: After practice session completion
Data Sources:
- Writing:
logs/memory/writing_scores.jsonl - Tests:
logs/memory/test_scores.jsonl - Grammar:
logs/memory/grammar_scores.jsonl
Daily Streak Widget
Location: Small card showing streak counter
Update Frequency: Once per day, on first practice
Data Source: logs/memory/streak.json
Tracks: Current streak, longest streak, total days
Primary endpoint: GET /api/daily/streak
Recent Activity Widget
Location: Right sidebar or bottom section
Update Frequency: Real-time after activities
Data Sources:
- Sessions: PostgreSQL
study_sessionstable - Lessons:
workspace/jobs/done/ - Achievements:
logs/memory/achievements.jsonl
Widget Performance Characteristics
- CEFR progress (critical path)
- Anki statistics (async)
- Recent activity (lazy)
- CEFR: JSON file cache
- Anki: Real-time queries
- Practice: Append-only logs
- Graceful degradation
- Retry with exponential backoff
- Fallback to cached data
Critical User Action Journeys
Complete system flows from button click to database updates
Submit for Correction (Writing)
- User Action: Types Latvian text & clicks "Submit"
- API Call: POST
/api/practice/writing/submit - Processing: GPT-4.1 correction analysis
- Storage: Append to
writing_scores.jsonl - Updates: CEFR component + streak tracking
- Response: Corrections + updated score
Generate Test (Practice Quiz)
- User Action: Selects test type & count
- API Call: POST
/api/practice/test/generate - Selection: Query weak/recent cards from Anki
- Generation: Create 5 exercise types (20% each)
- Display: First question with audio player
- Submission: POST per-question answers
Create Lesson Plan
- User Action: Input lesson content & title
- API Call: POST
/api/plan/create - Processing: GPT extracts plan structure
- Archive: Current plan + carry-forward items
- Storage: Save as YAML in lesson_plans/
- Sync: Auto-sync vocabulary to Anki
Check Exercise Answer
- User Action: Submits exercise answer
- API Call: POST
/api/practice/test/check - Validation: Normalize & compare answer
- Feedback: Generate contextual feedback
- Update: Record result in memory store
- Display: Show correctness + explanation
Service Dependencies Map
All user actions flow through Flask β optionally GPT-4.1 API β optionally AnkiConnect β optionally PostgreSQL β File System for persistence
Dashboard Widget Data Flows - Detailed
graph TD
A[User Browser] --> B[Flask Dashboard Route /]
B --> C[dashboard.html Template]
C --> D[JavaScript Widget Loaders]
D --> E[API Endpoints]
E --> F[Data Sources]
subgraph "Widget Types"
G[CEFR Progress]
H[Anki Statistics]
I[Practice Progress]
J[Daily Streak]
K[Recent Activity]
end
subgraph "Data Sources"
L[PostgreSQL Database]
M[AnkiConnect API]
N[JSONL Log Files]
O[Memory Store Files]
end### Widget Display
- Location: Top-left card on dashboard
- Elements: Current level (A1, A1+, A2), numeric score, progress bar, component breakdown
- Update Frequency: On page load + manual refresh
### Data Flow
sequenceDiagram
participant Browser
participant Dashboard
participant API
participant CEFR
participant DB
participant Anki
participant Files
Browser->>Dashboard: GET /
Dashboard->>Browser: dashboard.html + skeleton
Browser->>API: GET /api/cefr
API->>CEFR: get_cefr_summary()
CEFR->>Files: Read logs/agent/cefr_summary_latest.json
CEFR->>API: Return cached CEFR data
API->>DB: Query study_sessions for reviews_today
alt DB Available
DB->>API: Return review count
else DB Unavailable
API->>Anki: AnkiConnect getNumCardsReviewedToday
Anki->>API: Return review count
end
API->>Anki: Check card stats if cached=0
Anki->>API: findCards queries for deck stats
API->>Browser: Complete CEFR summary JSON
Browser->>Browser: Render progress bar + components### Data Sources
| Component | Weight | Primary Source | Fallback | Update Method |
|-----------|--------|---------------|----------|---------------|
| Vocabulary | 35% | cefrsummarylatest.json | None | Updated by cefrtracker.py |
| Maturity | 20% | cefrsummarylatest.json | None | Updated by cefrtracker.py |
| Quality | 15% | cefrsummarylatest.json | None | Updated by cefrtracker.py |
| Writing | 12% | memorystore/writingscores.jsonl | Empty | Updated by practice submissions |
| Tests | 8% | memorystore/testscores.jsonl | Empty | Updated by practice submissions |
| Grammar | 8% | memorystore/grammarscores.jsonl | Empty | Updated by practice submissions |
| Reviews Today | N/A | PostgreSQL studysessions | AnkiConnect | Real-time |
### API Endpoints
- Primary: GET /api/cefr
- History: GET /api/cefr/history
- Response Format:
{
"level": "A1+",
"score": 0.85,
"components": {
"vocabulary": {"score": 0.9, "weight": 0.35},
"maturity": {"score": 0.8, "weight": 0.20}
},
"anki": {
"reviews_today": 25,
"total_cards": 450,
"mature_cards": 120
}
}### Troubleshooting
1. Widget shows "Loading...": Check /api/cefr endpoint
2. Reviews today = 0: Check AnkiConnect health at anki-headless:8765
3. Components missing: Check file existence: logs/agent/cefrsummarylatest.json
4. Stale data: Check cefr_tracker.py last run in systemd logs
### Widget Display
- Location: Top-right card on dashboard
- Elements: Total cards, mature cards, reviews today, deck health
- Update Frequency: On page load
### Data Flow
sequenceDiagram
participant Browser
participant API
participant Anki
Browser->>API: GET /api/anki
API->>Anki: findNotes deck:"Latvian (ChatGPT)"
Anki->>API: note_ids array
API->>Anki: findCards deck:"Latvian (ChatGPT)"
Anki->>API: card_ids array
API->>Anki: getNumCardsReviewedToday
Anki->>API: reviews_today count
API->>Browser: Combined statistics### Data Source Details
- Service: AnkiConnect API at anki-headless:8765
- Deck Filter: deck:"Latvian (ChatGPT)"
- Real-time: Direct API queries (no caching)
### API Response
{
"total_notes": 150,
"total_cards": 450,
"reviews_today": 25,
"available": true
}### Troubleshooting
1. Widget shows "Anki Unavailable": Check container docker ps | grep anki-headless
2. Zero cards despite content: Check deck name exactly matches filter
3. Timeout errors: Check network connectivity between containers
### Widget Display
- Location: Bottom section of dashboard
- Elements: Writing accuracy, test scores, grammar progress
- Update Frequency: After practice session completion
### Data Flow
sequenceDiagram
participant Browser
participant Practice
participant Memory
participant Dashboard
Browser->>Practice: Submit practice (writing/test/grammar)
Practice->>Memory: Append to scores.jsonl
Practice->>Browser: Success response
Note over Browser: User navigates back to dashboard
Browser->>Dashboard: GET /api/data
Dashboard->>Memory: Read recent scores from JSONL
Memory->>Dashboard: Return last N scores
Dashboard->>Browser: Practice statistics### Data Sources
| Practice Type | File Location | Update Trigger | Display Metric |
|---------------|---------------|----------------|----------------|
| Writing | logs/memory/writingscores.jsonl | POST /api/practice/writing/submit | Average correction score |
| Tests | logs/memory/testscores.jsonl | POST /api/practice/test/submit | Average accuracy |
| Grammar | logs/memory/grammar_scores.jsonl | POST /api/grammar/submit | Average score |
### File Format Example
{"timestamp": "2026-05-17T10:30:00", "score": 85.5, "corrections": 3}
{"timestamp": "2026-05-17T11:45:00", "score": 92.1, "corrections": 1}### Troubleshooting
1. No practice data: Check JSONL file existence and permissions
2. Stale scores: Verify practice submission endpoints work
3. Score calculation errors: Check JSON formatting in JSONL files
### Widget Display
- Location: Small card showing streak counter
- Elements: Current streak days, streak emoji/icon
- Update Frequency: Once per day, updated on first practice
### Data Flow
sequenceDiagram
participant Browser
participant API
participant Memory
participant File
Browser->>API: GET /api/daily/streak
API->>Memory: get_memory_store()
Memory->>File: Read logs/memory/streak.json
File->>Memory: Current streak data
Memory->>API: Streak info
API->>Browser: {"current_streak": 7, "last_practice": "2026-05-17"}### Data Source
- File: logs/memory/streak.json
- Format:
{
"current_streak": 7,
"last_practice_date": "2026-05-17",
"longest_streak": 12,
"total_days": 45
}### Update Triggers
- Record Practice: POST to /api/daily/record
- Automatic: First practice session each day extends streak
- Break: Missing a day resets streak to 0
### Troubleshooting
1. Streak not updating: Check POST /api/daily/record endpoint
2. Incorrect dates: Check system timezone settings
3. File corruption: Restore from backup or reset streak
### Widget Display
- Location: Right sidebar or bottom section
- Elements: Recent lesson completions, practice sessions, achievements
- Update Frequency: Real-time after activities
### Data Flow
sequenceDiagram
participant Browser
participant API
participant DB
participant Jobs
Browser->>API: GET /api/data (includes recent activity)
API->>DB: Query study_sessions recent
DB->>API: Recent session data
API->>Jobs: Check workspace/jobs/done/
Jobs->>API: Recent lesson completions
API->>Browser: Combined activity feed### Data Sources
| Activity Type | Source | Query/Path |
|---------------|--------|------------|
| Practice Sessions | PostgreSQL | studysessions WHERE startedat > now() - interval '7 days' |
| Lesson Processing | File System | workspace/jobs/done/*/metadata.json |
| Achievements | Memory Store | logs/memory/achievements.jsonl |
### Auto-Refresh Elements
- CEFR Progress: Updates after any practice submission
- Anki Statistics: Static (requires page reload)
- Streak Counter: Updates after daily practice recording
- Recent Activity: Updates immediately after activities
### Manual Refresh
- Full Dashboard: F5 or browser refresh
- Individual Widgets: Click refresh icon where available
- API Polling: Some widgets poll every 30s when tab is active
### Widget Loading Priority
1. Critical Path: CEFR progress (blocks above-fold content)
2. Async Loading: Anki statistics, practice progress
3. Lazy Loading: Recent activity (only when scrolled into view)
### Caching Strategy
- CEFR Data: Cached in JSON file, updated by background process
- Anki Data: No caching (real-time queries)
- Practice Scores: File-based append-only logs
- Streak Data: Single JSON file (fast read)
### Error Handling
- Graceful Degradation: Missing data shows placeholder instead of error
- Retry Logic: Failed API calls retry 3x with exponential backoff
- Fallback Data: Use cached/stale data when services unavailable
---
Document Status: Complete data flow mapping for all dashboard widgets
Last Updated: 2026-05-17
Next Steps: Create button click journey diagrams
Button Click Journeys
### 1. "Submit for Correction" (Writing Practice)
#### User Journey Map
sequenceDiagram
participant User
participant Browser
participant Flask
participant WritingProcessor
participant GPT
participant MemoryStore
participant PostgreSQL
participant CEFRTracker
participant Dashboard
User->>Browser: Type Latvian text and click Submit for Correction
Browser->>Flask: POST /api/practice/writing/submit
Note over Browser, Flask: POST body contains text and prompt fields
Flask->>WritingProcessor: process_correction(text, prompt)
WritingProcessor->>GPT: GPT-4.1 correction analysis
GPT->>WritingProcessor: Corrections + score + feedback
WritingProcessor->>Flask: Structured correction result
Flask->>MemoryStore: Append to writing_scores.jsonl
Note over MemoryStore: Appends timestamp, score, and corrections count
Flask->>PostgreSQL: Shadow-write per-lemma errors to learner_state
Note over PostgreSQL: Per-word correct/incorrect records for spaced repetition
Flask->>CEFRTracker: update_component('writing', score)
CEFRTracker->>MemoryStore: Update cefr_summary_latest.json
Flask->>Browser: Correction results + updated CEFR
Browser->>Dashboard: Update writing component score
Dashboard->>User: Display corrections with highlighted errors#### Detailed Flow Steps
1. Frontend Interaction
// Location: templates/practice.html
function submitWriting() {
const text = document.getElementById('writingText').value;
const prompt = document.getElementById('promptSelect').value;
fetch('/api/practice/writing/submit', {
method: 'POST',
headers: {'Content-Type': 'application/json'},
body: JSON.stringify({text: text, prompt: prompt})
})
.then(response => response.json())
.then(data => displayCorrections(data));
}2. Backend Processing
# Location: blueprints/practice_bp.py:271
@practice_bp.route('/api/practice/writing/submit', methods=['POST'])
def api_practice_writing_submit():
data = request.get_json()
text = data.get('text', '').strip()
prompt_key = data.get('prompt', 'daily_life')
# Step 1: GPT Processing
processor = get_writing_processor()
result = processor.process_correction(text, prompt_key)
# Step 2: Memory Store Update
memory = get_memory_store()
memory.log_writing_score(result.get('score', 0.5))
# Step 3: CEFR Component Update
cefr = get_cefr_tracker()
cefr_result = cefr.update_component('writing', result.get('score'))
# Step 4: Shadow Write to PostgreSQL (per-lemma errors)
_shadow_write_writing(result.get('corrections', []), plan_id=None)
return jsonify(result)3. Data Persistence Points
| Step | Location | Data Format | Purpose |
|------|----------|-------------|---------|
| Memory Store | logs/memory/writingscores.jsonl | {"timestamp": "...", "score": 85.5} | CEFR calculation input |
| CEFR Update | logs/agent/cefrsummarylatest.json | Component scores with weights | Dashboard display |
| PostgreSQL | learnerstate table | Per-lemma correct/incorrect records | Spaced repetition |
| Daily Suggester | logs/memory/streak.json | Practice activity recording | Streak tracking |
#### Failure Points & Diagnostics
1. GPT API Timeout (HIGH RISK)
- Symptoms: "Processing..." spinner never completes
- Check: curl -X POST localhost:5002/api/practice/writing/submit with test data
- Logs: Flask container logs for OpenAI API errors
2. File Write Permissions (MEDIUM RISK)
- Symptoms: No score updates in CEFR dashboard
- Check: File permissions on logs/memory/writingscores.jsonl
- Recovery: Fix permissions and retry submission
3. PostgreSQL Connection (LOW RISK - Shadow Write)
- Symptoms: Warning in logs but submission still works
- Check: Container connectivity to latvian-postgres:5432
- Impact: Spaced repetition data missing
### 2. "Generate Test" (Practice Quiz)
#### User Journey Map
sequenceDiagram
participant User
participant Browser
participant Flask
participant ExerciseGenerator
participant AnkiConnect
participant PostgreSQL
participant AdaptiveEngine
User->>Browser: Select test type and count then click Generate Test
Browser->>Flask: POST /api/practice/test/generate
Note over Browser, Flask: POST body with type mixed, count 10, strategy weak
Flask->>ExerciseGenerator: generate_mixed_test(count, strategy)
ExerciseGenerator->>AnkiConnect: Query weak cards from deck
AnkiConnect->>ExerciseGenerator: Card data with intervals
ExerciseGenerator->>AdaptiveEngine: Create varied exercise formats
AdaptiveEngine->>Flask: Exercise array with audio references
Flask->>Browser: Returns exercises array and total count
Browser->>User: Display first question with audio player
User->>Browser: Submit answer for each question
Browser->>Flask: POST per question check endpoint
Flask->>AdaptiveEngine: check_answer(exercise, user_answer)
AdaptiveEngine->>Flask: Returns correct status and feedback
User->>Browser: Click Submit Test after all answers
Browser->>Flask: POST /api/practice/test/submit
Flask->>PostgreSQL: Record study_session with scores
Flask->>CEFRTracker: update_component('tests', final_score)
Flask->>Browser: Final score + updated CEFR#### Exercise Generation Process
1. Vocabulary Selection
# Location: modules/exercise_generator.py
def generate_mixed_test(self, count, strategy, plan_id=None):
# Strategy determines card selection
if strategy == 'weak':
# Query cards with low intervals or high fail rates
cards = self.anki_connect.get_struggling_cards(limit=count*2)
elif strategy == 'recent':
# Query recently added cards
cards = self.anki_connect.get_recent_cards(days=7, limit=count*2)
else: # random
# Random sample from entire deck
cards = self.anki_connect.get_random_cards(limit=count*2)
# Generate 5 exercise types (20% each for mixed)
exercises = []
per_type = count // 5
exercises.extend(self._generate_fill_blank(cards[:per_type]))
exercises.extend(self._generate_translation(cards[per_type:per_type*2]))
# ... more types
random.shuffle(exercises)
return {"exercises": exercises, "total": len(exercises)}2. Answer Checking with Feedback
# Location: modules/adaptive_exercise_engine.py
def check_answer(self, exercise, user_answer):
correct_answer = exercise['correct_answer']
user_clean = self._normalize_answer(user_answer)
correct_clean = self._normalize_answer(correct_answer)
is_correct = user_clean == correct_clean
# Generate contextual feedback
feedback = self._generate_feedback(exercise, is_correct, user_answer)
return {
"correct": is_correct,
"correct_answer": correct_answer,
"feedback": feedback,
"explanation": exercise.get('explanation', '')
}#### Failure Points & Diagnostics
1. AnkiConnect Unavailable (HIGH RISK)
- Symptoms: "No exercises available" error
- Check: curl http://anki-headless:8765 for connectivity
- Recovery: Restart anki-headless container
2. Empty Deck (MEDIUM RISK)
- Symptoms: Generated exercises but no content
- Check: Verify cards exist: AnkiConnect findCards query
- Recovery: Import vocabulary content first
3. Audio Files Missing (LOW RISK)
- Symptoms: Listening exercises show broken audio players
- Check: Mount point /mnt/data/apps/anki-media accessible
- Impact: Listening exercises fail, other types work
### 3. "Create Lesson Plan" (Weekly Setup)
#### User Journey Map
sequenceDiagram
participant User
participant Browser
participant Flask
participant PlanExtractor
participant GPT
participant FileSystem
participant AnkiSync
participant PostgreSQL
User->>Browser: Input lesson content then click Create Plan
Browser->>Flask: POST /api/plan/create
Note over Browser, Flask: POST body with lesson text and title fields
Flask->>FileSystem: Check for existing active plan
FileSystem->>Flask: Current plan details
Flask->>Flask: Archive current plan with carry-forward
Flask->>PlanExtractor: extract_lesson_plan(text, title)
PlanExtractor->>GPT: Parse vocabulary, grammar, exercises
GPT->>PlanExtractor: Structured lesson plan
PlanExtractor->>Flask: Lesson plan object
Flask->>FileSystem: Save new plan as YAML
Note over FileSystem: lesson_plans/2025_w48.yaml
Flask->>AnkiSync: Auto-sync vocabulary to Anki
AnkiSync->>AnkiConnect: Create cards with week_48 tags
AnkiConnect->>AnkiSync: Card creation results
Flask->>PostgreSQL: Initialize progress tracking
Note over PostgreSQL: vocabulary table with spaced repetition
Flask->>Browser: Plan creation success + plan_id
Browser->>User: Redirect to new plan detail page#### Lesson Plan Lifecycle
1. Archive & Carry-Forward
# Location: blueprints/plan/routes.py
def create_plan():
# Step 1: Archive current active plan
current_plan = get_active_plan()
if current_plan:
carry_forward = calculate_carry_forward(current_plan)
archive_plan(current_plan['id'], carry_forward)
# Step 2: Create new plan with carried items
new_plan = extract_plan_from_input(data)
if carry_forward:
new_plan['vocabulary'].extend(carry_forward['vocabulary'])
new_plan['writing_prompts'].extend(carry_forward['writing'])
# Step 3: Save and activate
save_plan_yaml(new_plan)
return {"plan_id": new_plan['id'], "status": "active"}2. Anki Integration
# Location: modules/lesson_plan_anki_sync.py
def sync_plan_to_anki(plan_id):
plan = load_plan_yaml(plan_id)
# Remove old current_week tags
anki.remove_tags_by_pattern("current_week")
# Add new cards with tags
for vocab in plan['vocabulary']:
card_data = {
"deckName": "Latvian (ChatGPT)",
"modelName": "4-Card Template v2",
"fields": generate_card_fields(vocab),
"tags": [f"week_{plan['week_number']}", "current_week"]
}
anki.addNote(card_data)#### Failure Points & Diagnostics
1. GPT Extraction Errors (HIGH RISK)
- Symptoms: Malformed lesson plans with missing sections
- Check: GPT API response logs for parsing errors
- Recovery: Manual plan editing via web interface
2. File System Permissions (MEDIUM RISK)
- Symptoms: "Unable to save plan" error
- Check: Write permissions on lessonplans/ directory
- Recovery: Fix permissions and retry
3. Anki Sync Failures (LOW RISK)
- Symptoms: Plan created but cards missing in Anki
- Check: AnkiConnect availability and deck structure
- Impact: Manual card creation needed
### API Endpoint Summary
| User Action | Primary Endpoint | Secondary Calls | Database Updates |
|-------------|------------------|-----------------|------------------|
| Submit Writing | /api/practice/writing/submit | GPT-4.1, CEFR tracker | writingscores.jsonl, learnerstate |
| Generate Test | /api/practice/test/generate | AnkiConnect queries | None (read-only) |
| Submit Test | /api/practice/test/submit | CEFR tracker | testscores.jsonl, studysessions |
| Create Plan | /api/plan/create | GPT extraction, Anki sync | Plan YAML, vocabulary table |
| Check Exercise | /api/plan/ | Adaptive engine | , learnerstate |
### Service Dependencies
graph TD
A[User Browser] --> B[Flask Dashboard]
B --> C[GPT-4.1 API]
B --> D[AnkiConnect :8765]
B --> E[PostgreSQL :5432]
B --> F[File System]
C --> G[OpenAI Services]
D --> H[Anki Desktop]
E --> I[Learning Database]
F --> J[Logs & YAML Files]
subgraph "Failure Impact"
K[GPT Unavailable: No corrections/extractions]
L[Anki Unavailable: No test generation]
M[DB Unavailable: No progress tracking]
N[File System: No plan persistence]
end### Error Propagation Patterns
Graceful Degradation Hierarchy:
1. Critical Path: User gets response even if secondary features fail
2. Shadow Operations: PostgreSQL writes fail silently, don't block user
3. Background Tasks: File system operations retry 3x before failing
4. External APIs: GPT and AnkiConnect have circuit breakers
Recovery Strategies:
- Immediate: Retry failed operations automatically (3x max)
- Manual: Admin endpoints to replay failed operations
- Background: Periodic cleanup jobs fix inconsistent state
- Data: Point-in-time backups of critical YAML and JSONL files
---
Document Status: Complete button click journey mapping
Last Updated: 2026-05-17
Next Steps: Create troubleshooting decision trees for each journey
System Troubleshooting Guide
Diagnostic Steps:
# Check API endpoints
curl -s http://localhost:5002/api/cefr | jq '.'
curl -s http://localhost:8765 -d '{"action":"version","version":6}' | jq '.'
# Check data files
ls -la /srv/latvian_learning/logs/agent/cefr_summary_latest.json
ls -la /srv/latvian_learning/logs/memory/*.jsonl
# Check service health
docker ps | grep latvian
systemctl --user status latvian_dashboard.service
Resolution Steps:
- CEFR Widget: Check
/api/cefrendpoint and cefr_summary_latest.json file - Anki Widget: Verify anki-headless container is running
- Practice Widget: Check JSONL file permissions
- Streak Widget: Verify streak.json exists and is valid JSON
Quick Checks:
# Test endpoint
curl -X POST http://localhost:5002/api/practice/writing/submit \
-H "Content-Type: application/json" \
-d '{"text":"Es mΔcu latvieΕ‘u valodu","prompt":"daily_life"}'
# Check GPT API key
docker exec latvian-main-site env | grep OPENAI
# Test file permissions
touch /srv/latvian_learning/logs/memory/test_write.txt
ls -la /srv/latvian_learning/logs/memory/
Common Issues:
- Loading spinner forever: GPT API timeout - check API key and quota
- "Invalid text" error: Check text length (min 5 chars, max 500)
- "Score not updated": Check write permissions on logs/memory/
- Browser console errors: Check network tab in Developer Tools (F12)
Diagnostic Commands:
# Check AnkiConnect
curl -s http://localhost:8765 -d '{"action":"version","version":6}'
# Check deck access
curl -s http://localhost:8765 -d '{
"action": "findCards",
"version": 6,
"params": {"query": "deck:\"Latvian (ChatGPT)\""}
}' | jq '.result | length'
# Check audio files
ls -la /mnt/data/apps/anki-media/ | head -10
find /mnt/data/apps/anki-media -name "*.mp3" | wc -l
Resolution:
- "No exercises available": AnkiConnect unreachable - restart container
- Empty deck: Import vocabulary first
- Missing audio: Run HyperTTS batch generation
- Wrong deck name: Check deck filter in code matches actual deck
Test Commands:
# Test endpoint
curl -X POST http://localhost:5002/api/plan/create \
-H "Content-Type: application/json" \
-d '{"text":"Week 1: Basic greetings...","title":"Greetings"}'
# Check file permissions
ls -la /srv/latvian_learning/lesson_plans/
touch /srv/latvian_learning/lesson_plans/test.yaml
# Check extraction logs
docker logs latvian-main-site | grep -i "lesson.*extract"
Troubleshooting Path:
- No response: Check Flask endpoint is reachable
- Timeout: Check GPT API performance and quota
- Validation error: Check input format (min 100 chars)
- Save error: Fix permissions on lesson_plans/ directory
Verification Commands:
# Check score files
tail -5 /srv/latvian_learning/logs/memory/writing_scores.jsonl
tail -5 /srv/latvian_learning/logs/memory/test_scores.jsonl
# Check CEFR update
cat /srv/latvian_learning/logs/agent/cefr_summary_latest.json | jq '.components'
# Check streak
cat /srv/latvian_learning/logs/memory/streak.json
# Check database
docker exec latvian-postgres psql -U latvian_user -d latvian_db -c "SELECT COUNT(*) FROM study_sessions WHERE created_at > NOW() - INTERVAL '1 day';"
Common Causes:
- CEFR not updating: Check practice submission succeeded first
- Stale data: Check cefr_tracker.py background process
- Streak not recording: Call POST /api/daily/record endpoint
- Wrong date: Check system timezone settings
Service Recovery:
# Restart core services
docker restart latvian-main-site anki-headless latvian-postgres
# Wait for services
sleep 10
# Check health
curl -s http://localhost:5002/health | jq '.microservices'
File System Recovery:
# Fix permissions
sudo chown -R david:david /srv/latvian_learning/logs/
sudo chmod -R 755 /srv/latvian_learning/lesson_plans/
# Reset streak counter
echo '{"current_streak": 0, "last_practice_date": null}' > \
/srv/latvian_learning/logs/memory/streak.json
# Restore from backup if available
cp /srv/latvian_learning/backups/cefr_summary_latest.json.backup \
/srv/latvian_learning/logs/agent/cefr_summary_latest.json
When to Escalate:
- Multiple container failures simultaneously
- Database completely inaccessible
- All API endpoints returning 500 errors
- File system corruption detected
Troubleshooting Tips
- Always check Docker first: Most issues are container health
- Check logs early:
docker logs [container] --tail 50 - Test endpoints directly: Use curl before debugging frontend
- Verify file permissions: Many silent failures are permission issues
- Check API quotas: GPT API quota exhaustion is common
Troubleshooting User Flows - With Diagrams
### System Health Check
# 1. Check all containers
docker ps --format "table {{.Names}}\t{{.Status}}\t{{.Ports}}"
# 2. Check main dashboard health
curl -s http://localhost:5002/health | jq '.microservices'
# 3. Check critical services
curl -s http://localhost:8765 >/dev/null && echo "Anki: OK" || echo "Anki: FAIL"
curl -s http://localhost:5432 >/dev/null && echo "DB: OK" || echo "DB: FAIL"
# 4. Check disk space and permissions
df -h /srv/latvian_learning
ls -la /srv/latvian_learning/logs/### Decision Tree
graph TD
A["Dashboard widget not loading"] --> B{Which widget}
B -->|CEFR Progress| C["Check api cefr endpoint"]
B -->|Anki Statistics| D["Check AnkiConnect"]
B -->|Practice Progress| E["Check JSONL files"]
B -->|Daily Streak| F["Check streak json"]
C --> G{api cefr responds}
G -->|No| H["Check Flask container logs"]
G -->|Yes empty| I["Check cefr summary file exists"]
G -->|Yes stale| J["Check cefr tracker last run"]
D --> K{AnkiConnect reachable}
K -->|No| L["Check anki headless container"]
K -->|Yes empty| M["Check deck name matches filter"]
E --> N{JSONL files exist}
N -->|No| O["User has not done practice yet"]
N -->|Yes empty| P["Check file permissions"]
F --> R{streak file exists}
R -->|No| S["Initialize with default values"]
R -->|Yes corrupt| T["Restore from backup"]### Diagnostic Steps
CEFR Progress Widget Failure
# Test API endpoint directly
curl -s http://localhost:5002/api/cefr | jq '.'
# Check if CEFR file exists and is recent
ls -la /srv/latvian_learning/logs/agent/cefr_summary_latest.json
stat /srv/latvian_learning/logs/agent/cefr_summary_latest.json
# Check systemd service that updates CEFR
systemctl --user status latvian_dashboard.service
journalctl --user -u latvian_dashboard.service -n 50Anki Statistics Widget Failure
# Test AnkiConnect directly
curl -s http://localhost:8765 -d '{"action":"version","version":6}' | jq '.'
# Check container status
docker ps | grep anki-headless
docker logs anki-headless --tail 20
# Test deck query
curl -s http://localhost:8765 -d '{"action":"findCards","version":6,"params":{"query":"deck:\"Latvian (ChatGPT)\""}}' | jq '.result | length'### Decision Tree
graph TD
A["Submit button clicked"] --> B{Button responds}
B -->|No response| C["Check browser console errors"]
B -->|Loading forever| D["Check writing submit endpoint"]
B -->|Error message| E{What error}
C --> F["JavaScript errors in frontend"]
C --> G["Network connectivity issues"]
D --> H{Endpoint reachable}
H -->|No| I["Check Flask container status"]
H -->|Yes timeout| J["Check GPT API key and quota"]
H -->|Yes error| K["Check Flask logs"]
E -->|Invalid text| L["Check text length validation"]
E -->|Processing failed| M["GPT API or processing error"]
E -->|Score not updated| N["Check file write permissions"]
J --> O{GPT API working}
O -->|No API key| P["Set OPENAI API KEY env var"]
O -->|Quota exceeded| Q["Check OpenAI billing"]
O -->|Rate limited| R["Wait and retry"]
N --> S{JSONL file writable}
S -->|No| T["Fix permissions on logs memory dir"]
S -->|Yes| U["Check disk space"]### Diagnostic Commands
Frontend Issues
# Check browser console (F12 Developer Tools)
# Look for JavaScript errors or failed network requests
# Test endpoint with curl
curl -X POST http://localhost:5002/api/practice/writing/submit \
-H "Content-Type: application/json" \
-d '{"text":"Es mΔcu latvieΕ‘u valodu","prompt":"daily_life"}'Backend Processing Issues
# Check Flask container logs
docker logs latvian-main-site --tail 50 | grep -i error
# Check GPT API connectivity
export OPENAI_API_KEY="your-key-here"
curl -H "Authorization: Bearer $OPENAI_API_KEY" \
https://api.openai.com/v1/models | jq '.data[] | select(.id=="gpt-4.1")'
# Check file permissions
ls -la /srv/latvian_learning/logs/memory/writing_scores.jsonl
touch /srv/latvian_learning/logs/memory/test_write.txtGPT API Troubleshooting
# Check environment variable
docker exec latvian-main-site env | grep OPENAI
# Test with minimal request
docker exec latvian-main-site python3 -c "
import openai
import os
client = openai.OpenAI(api_key=os.getenv('OPENAI_API_KEY'))
try:
response = client.models.list()
print('GPT API: OK')
except Exception as e:
print(f'GPT API Error: {e}')
"### Decision Tree
graph TD
A["Generate Test clicked"] --> B{Any exercises returned}
B -->|No exercises| C["Check AnkiConnect"]
B -->|Partial exercises| D["Check specific exercise types"]
B -->|Malformed| E["Check exercise generator logic"]
C --> F{AnkiConnect responds}
F -->|No| G["Restart anki headless container"]
F -->|Empty deck| H["Import vocabulary first"]
F -->|Wrong deck name| I["Check deck filter in code"]
D --> J{Which types missing}
J -->|Listening| K["Check audio files availability"]
J -->|Translation| L["Check card field mappings"]
J -->|Multiple choice| M["Check distractor generation"]
K --> N{Audio mount accessible}
N -->|No| O["Check anki media mount"]
N -->|Files missing| P["Run HyperTTS batch generation"]### Diagnostic Commands
AnkiConnect Issues
# Test basic connectivity
curl -s http://localhost:8765 -d '{"action":"version","version":6}'
# Test deck access
curl -s http://localhost:8765 -d '{
"action": "findCards",
"version": 6,
"params": {"query": "deck:\"Latvian (ChatGPT)\""}
}' | jq '.result | length'
# Check for cards with audio
curl -s http://localhost:8765 -d '{
"action": "findCards",
"version": 6,
"params": {"query": "deck:\"Latvian (ChatGPT)\" has:audio"}
}' | jq '.result | length'Audio Files Check
# Check mount point
ls -la /mnt/data/apps/anki-media/ | head -10
# Count audio files
find /mnt/data/apps/anki-media -name "*.mp3" | wc -l
# Check specific card audio
ls -la /mnt/data/apps/anki-media/hypertts_* | head -5### Decision Tree
graph TD
A["Create Plan clicked"] --> B{Plan creation response}
B -->|No response| C["Check Flask endpoint"]
B -->|Validation error| D["Check input format"]
B -->|Processing error| E["Check GPT extraction"]
B -->|Save error| F["Check file system"]
C --> G{Plan create endpoint reachable}
G -->|No| H["Check Flask container"]
G -->|Timeout| I["Check GPT API performance"]
D --> J{What validation failed}
J -->|Empty text| K["Require minimum content"]
J -->|Invalid format| L["Check GPT prompt parsing"]
E --> M{GPT extraction working}
M -->|No API access| N["Check OpenAI credentials"]
M -->|Poor extraction| O["Review GPT prompt quality"]
M -->|Timeout| P["Increase timeout or retry"]
F --> Q{File system writable}
Q -->|Permission denied| R["Fix lesson plans permissions"]
Q -->|Disk full| S["Clean up old files"]
Q -->|Archive failed| T["Check current plan exists"]### Diagnostic Commands
Plan Creation Flow
# Test endpoint directly
curl -X POST http://localhost:5002/api/plan/create \
-H "Content-Type: application/json" \
-d '{"text":"Week 1: Basic greetings...","title":"Greetings"}'
# Check file system permissions
ls -la /srv/latvian_learning/lesson_plans/
touch /srv/latvian_learning/lesson_plans/test.yaml
# Check current active plan
ls -la /srv/latvian_learning/lesson_plans/*.yaml | grep -v _progressGPT Extraction Issues
# Check extraction logs
docker logs latvian-main-site | grep -i "lesson.*extract"
# Test GPT extraction manually
docker exec -it latvian-main-site python3 -c "
from agent.modules.lesson_plan_extractor import LessonPlanExtractor
extractor = LessonPlanExtractor()
result = extractor.extract_plan('Week 1: Basic greetings', 'Greetings')
print(result)
"### Decision Tree
graph TD
A["Practice done but progress unchanged"] --> B{Which component affected}
B -->|CEFR score| C["Check component file updates"]
B -->|Dashboard display| D["Check API responses"]
B -->|Streak counter| E["Check streak recording"]
C --> F{Relevant JSONL updated}
F -->|No| G["Check practice submission success"]
F -->|Yes no change| H["Check CEFR calculation logic"]
D --> I{API returns current data}
I -->|No| J["Check API endpoint health"]
I -->|Stale data| K["Check cache refresh logic"]
E --> L{Streak file updated}
L -->|No| M["Check daily record call"]
L -->|Wrong date| N["Check system timezone"]### Diagnostic Commands
CEFR Progress Tracking
# Check practice score files
tail -5 /srv/latvian_learning/logs/memory/writing_scores.jsonl
tail -5 /srv/latvian_learning/logs/memory/test_scores.jsonl
# Check CEFR summary update
stat /srv/latvian_learning/logs/agent/cefr_summary_latest.json
cat /srv/latvian_learning/logs/agent/cefr_summary_latest.json | jq '.components'
# Test CEFR calculation
docker exec latvian-main-site python3 -c "
from agent.modules.cefr_tracker import CEFRTracker
tracker = CEFRTracker()
summary = tracker.get_current_summary()
print(f'Current level: {summary.get(\"level\", \"Unknown\")}')
"Streak Recording
# Check streak file
cat /srv/latvian_learning/logs/memory/streak.json
# Test streak update
curl -X POST http://localhost:5002/api/daily/record \
-H "Content-Type: application/json" \
-d '{"activity":"practice"}'
# Check system date/timezone
date
timedatectl status### Immediate Recovery Actions
1. Service Recovery
# Restart core services
docker restart latvian-main-site anki-headless latvian-postgres
# Check service health after restart
sleep 10
curl -s http://localhost:5002/health | jq '.microservices'2. File System Recovery
# Fix common permission issues
sudo chown -R david:david /srv/latvian_learning/logs/
sudo chmod -R 755 /srv/latvian_learning/lesson_plans/
# Clean up corrupted files
mv /srv/latvian_learning/logs/memory/corrupt_file.jsonl /tmp/backup/
touch /srv/latvian_learning/logs/memory/writing_scores.jsonl3. Data Recovery
# Restore from backups if available
cp /srv/latvian_learning/backups/cefr_summary_latest.json.backup \
/srv/latvian_learning/logs/agent/cefr_summary_latest.json
# Reinitialize empty files with valid defaults
echo '{"current_streak": 0, "last_practice_date": null}' > \
/srv/latvian_learning/logs/memory/streak.json### Advanced Diagnostics
1. Full System Check
#!/bin/bash
# Complete health check script
echo "=== Container Status ==="
docker ps --format "table {{.Names}}\t{{.Status}}"
echo "=== Service Health ==="
curl -s http://localhost:5002/health | jq '.'
echo "=== Database Connectivity ==="
docker exec latvian-postgres pg_isready -U latvian_user
echo "=== File System Status ==="
df -h /srv/latvian_learning
find /srv/latvian_learning -name "*.jsonl" -exec wc -l {} \;
echo "=== Recent Errors ==="
docker logs latvian-main-site --since="1h" | grep -i error | tail -102. User Journey Simulation
#!/bin/bash
# Test critical user paths
# Test dashboard loading
echo "Testing dashboard..."
curl -s http://localhost:5002/api/data >/dev/null && echo "β Dashboard API" || echo "β Dashboard API"
# Test practice submission
echo "Testing writing practice..."
curl -X POST http://localhost:5002/api/practice/writing/submit \
-H "Content-Type: application/json" \
-d '{"text":"Sveiki","prompt":"greetings"}' >/dev/null && \
echo "β Writing Practice" || echo "β Writing Practice"
# Test exercise generation
echo "Testing exercise generation..."
curl -X POST http://localhost:5002/api/practice/test/generate \
-H "Content-Type: application/json" \
-d '{"type":"mixed","count":5}' >/dev/null && \
echo "β Exercise Generation" || echo "β Exercise Generation"### Escalation Criteria
Immediate Escalation (System Down):
- Multiple container failures
- Database completely inaccessible
- All API endpoints returning 500 errors
- File system corruption
Standard Resolution (Service Degraded):
- Single service failures
- Partial functionality available
- Non-critical features affected
- Performance issues
Monitor and Document:
- Sporadic errors
- Single user reports
- Minor data inconsistencies
- Temporary network issues
---
Document Status: Complete troubleshooting decision trees for user interactions
Last Updated: 2026-05-17
Next Steps: Integrate with monitoring dashboard for automated diagnostics
Infrastructure & Deployment
Container inventory, image sources, GitHub repository map, and restoration procedures
Complete Infrastructure Reference
### Host: latvian-learning (CT 130, 192.168.1.30)
The Latvian Learning host runs the user-facing services (TILTS dashboard, AI services, Anki integration).
### latvian-main-site (TILTS Dashboard)
Purpose: Main Latvian Learning user interface - lesson plans, practice exercises, dashboard widgets
| Field | Value |
|-------|-------|
| Image | latvian-learning-tilts-main:latest |
| Source Repo | github.com/davidgut1982/portainer β stacks/learning-stack/ |
| Dockerfile | ./latvian-learning-tilts-main/Dockerfile (in repo) |
| Build Context | /srv/latvianlearning/ (project root) |
| Port | 5002 (host) β 5000 (container) |
| Network | latviannetwork |
| Volumes | /srv/latvianlearning:/srv/latvianlearning, /mnt/data/apps/anki-media:/mnt/data/apps/anki-media |
| Env Vars | POSTGRESHOST=latvian-postgres, POSTGRESDATABASE=tiltstezaurs, ANKIURL=http://anki-headless:8765, OMNIVOICEURL=http://omnivoice-lv:8000 |
| Restart | unless-stopped |
| Healthcheck | curl -f http://localhost:5000/ |
Restoration:
cd /srv/latvian_learning
docker compose build latvian-main-site
docker compose up -d --force-recreate latvian-main-site### latvian-services-dashboard (AI Services Status)
Purpose: Health overview of all backend AI services (was confusingly named "Educational Stack")
| Field | Value |
|-------|-------|
| Image | latvian-learning-main-site:latest |
| Source Repo | github.com/davidgut1982/portainer β stacks/learning-webapp/ |
| Port | 5003 (host) β 5000 (container) |
| Network | latviannetwork |
| Volumes | /srv/latvian_learning/logs:/app/logs |
| Healthcheck | curl -f http://localhost:5000/health |
All AI services use the latviannetwork and share volumes from /srv/latvianlearning/.
### morph-analyzer-lv (Morphological Analysis)
| Field | Value |
|-------|-------|
| Image | 192.168.1.4:5000/morph-analyzer-morph-analyzer-lv:latest |
| Port | 8091:8001 |
| Source | Educational Stack archive |
### udpipe-lv (POS Tagging)
| Field | Value |
|-------|-------|
| Image | 192.168.1.4:5000/udpipe-udpipe-lv:latest |
| Port | 8092:8002 |
| Source | Educational Stack archive |
### sentence-embedder-lv (Semantic Embeddings)
| Field | Value |
|-------|-------|
| Image | 192.168.1.4:5000/sentence-embedder-lv:latest (16.3GB) |
| Port | 8093:8003 |
| Memory Limit | 1GB (reduced from over-allocated 3GB) |
### fluency-index-lv (FAISS Index)
| Field | Value |
|-------|-------|
| Image | 192.168.1.4:5000/fluency-index-fluency-index-lv:latest |
| Port | 8094:8004 |
| Volume | fluencyindexdata:/data |
### fluency-gate-lv (Fluency Validation)
| Field | Value |
|-------|-------|
| Image | 192.168.1.4:5000/fluency-gate-fluency-gate-lv:latest |
| Port | 8095:8005 |
### grammar-gate-lv (DEPRECATED)
Status: Code marked deprecated per Decision #11. Kept for backward compatibility.
### comprehensive-validation-gateway (4E Orchestrator)
| Field | Value |
|-------|-------|
| Image | 192.168.1.4:5000/comprehensive-validation-gateway:latest |
| Port | 8097:8007 |
| Function | Routes to fluency-gate (4C) + grammar-correction (2E) in parallel |
### template-extractor-lv (Template Extraction)
| Field | Value |
|-------|-------|
| Image | 192.168.1.4:5000/template-extractor-template-extractor-lv:latest |
| Port | 8098:8008 |
### constrained-generator-lv (GPT Generator)
| Field | Value |
|-------|-------|
| Image | 192.168.1.4:5000/constrained-generator-constrained-generator-lv:latest |
| Port | 8099:8009 |
| Opt-in via | USECONSTRAINEDGENERATOR=true env flag |
### repair-loop-lv (Content Repair)
| Field | Value |
|-------|-------|
| Image | 192.168.1.4:5000/repair-loop-repair-loop-lv:latest |
| Port | 8100:8010 |
### asr-transcription-lv (Whisper Speech-to-Text)
| Field | Value |
|-------|-------|
| Image | 192.168.1.4:5000/asr-transcription-lv:latest |
| Port | 8101:8011 |
| GPU | RTX 3060 (deviceids: ['1']) |
| Memory Limit | 4GB |
| Model | Whisper Latvian (CT2 int8) |
### forced-aligner-lv (Audio Alignment)
| Field | Value |
|-------|-------|
| Image | 192.168.1.4:5000/forced-aligner-lv:latest |
| Port | 8102:8003 |
| CPU Only | CUDAVISIBLE_DEVICES="" (prevents VRAM leak) |
### grammar-correction-lv (2E Grammar Validation)
| Field | Value |
|-------|-------|
| Image | 192.168.1.4:5000/grammar-correction-lv:latest |
| Port | 8103:8007 |
| Function | UDPipe + rule-based + GPT repair |
### back-translator-lv (NLLB Translation)
| Field | Value |
|-------|-------|
| Image | 192.168.1.4:5000/back-translator-lv:latest (21.1GB) |
| Port | 8104:8008 |
| GPU | Required for performance |
| Memory Limit | 5GB |
| Model | NLLB-200-distilled-1.3B-ct2-int8 (CTranslate2 INT8) |
### morphological-analyzer-lv (Database-backed Morphology)
Status: Configured but rarely called. Code uses GPT-based AIRouter instead.
### vocabulary-database-lv (Vocab Lookup)
Status: Configured but unused in production code.
### vocal-isolator-lv (Demucs Vocal Separation)
| Field | Value |
|-------|-------|
| Image | 192.168.1.4:5000/vocal-isolator-lv:latest (13.2GB) |
| Port | 8106:8003 |
| GPU | Tesla P4 (GPU 0) |
| Memory Limit | 4GB |
| Volume | vocalisolatorcache:/cache (~30GB for 200 films) |
| Source | /srv/latvianlearning/tilts-system/docker/vocal-isolator-lv/ |
| Model | Demucs htdemucs (chunked, 5-min segments, 10s overlap) |
| Network | Must be on BOTH latviannetwork AND dockerlatviannetwork |
### fake-subsai-asr-gateway (Bazarr Subtitle Gateway)
| Field | Value |
|-------|-------|
| Image | 192.168.1.4:5000/fake-subsai-asr-gateway:latest |
| Port | 9001 (host network mode) |
| Source | /srv/latvianlearning/tilts-system/docker/fake-subsai-asr-gateway/ |
| Memory Limit | 3.5GB |
| Function | Orchestrates: vocal-isolator β ASR β forced-aligner β back-translator |
| Network | networkmode: host (not on latviannetwork) |
### omnivoice-lv (TTS)
| Field | Value |
|-------|-------|
| Image | 192.168.1.4:5001/omnivoice-lv:latest (19.9GB) |
| Port | 8021 (host) β 8000 (container) |
| GPU | RTX 3060, ~5.8GB VRAM |
| Source | /srv/latvianlearning/tilts-system/docker/omnivoice/ |
| Network | Must be on BOTH latviannetwork AND dockerlatvian_network |
| Function | XTTS-derived Latvian TTS for Anki cards |
### latvian-postgres (Database)
| Field | Value |
|-------|-------|
| Image | pgvector/pgvector:pg16 |
| Port | 5433 (host) β 5432 (container) |
| Database | latvianlearning, also tiltstezaurs for TILTS |
| User | latvianuser |
| Volume | postgresdata:/var/lib/postgresql/data (1.59GB) |
### Dashboard Data Sources
The dashboard widgets are populated from these endpoints. Understanding which endpoint feeds which widget is critical when data appears inconsistent.
| Widget | Source Endpoint | Notes |
|--------|----------------|-------|
| CEFR Score / Level | /api/cefr | Has fallback logic for broken snapshots (see below) |
| Unique Words (statWords) | /api/cefr.components.vocab.uniquewords | Reads from CEFR, not /api/anki directly |
| Total Cards (statCards) | /api/cefr.components.vocab.totalcards | From CEFR snapshot |
| Mature (statMature) | /api/cefr.components.maturity.maturecards | Anki definition |
| New (statNew) | /api/cefr (Definition B - learning interval based) | Differs from /api/anki/cards |
| Reviews Today | /api/anki.reviewstoday | Live AnkiConnect query |
### CEFR Endpoint Stale-Data Behavior
/api/cefr includes self-healing logic added 2026-05-17 (see kbeb546db3d07f):
# If latest snapshot has vocab=0 (broken run), fall back to last valid history entry
if summary['components']['vocab']['unique_words'] == 0:
# Search history for last entry with vocab > 0
for entry in reversed(history):
if entry['components']['vocab']['unique_words'] > 0:
summary = entry
breakRoot cause not yet fixed: The CEFR tracker still sometimes writes broken vocab=0 snapshots. The endpoint just compensates. TODO: investigate agent/modules/cefrtracker.py to prevent writing broken snapshots.
### Anki Card Classification Discrepancy
New vs Learning split differs between endpoints (cosmetic, totals reconcile):
| Endpoint | New | Learning | Mature | Total |
|----------|-----|----------|--------|-------|
| /api/anki/cards (Definition A - reps=0) | 4,376 | 349 | 94 | 4,819 |
| /api/cefr (Definition B - interval based) | 4,182 | 543 | 94 | 4,819 |
See kb_fa56af09911e for full explanation. Both are valid Anki concepts; pick endpoint based on use case.
### Recently Added Stub Endpoints
- /api/queue/status - returns {"pending": 0, ...} (stub added 2026-05-17 to silence /tools console errors)
- /api/plan/ - fixed 2026-05-17 (path resolution bug in container)
Container: omnivoice-lv at port 8021 (host) / 8000 (internal)
Service docs: Available at http://192.168.1.30:8021/docs
KB Entry: kba4fb069b755e
### Available Voices (5 total, 4 Latvian-safe)
| Voice ID | Gender | Description | Safe for Latvian? |
|----------|--------|-------------|-------------------|
| voice2conversational | Female | Conversational Latvian (default) | β
Yes |
| voice4conversationalclean | Female | Clean Latvian | β
Yes |
| voice1anki | Female | Anki vocab style (3s) | β
Yes |
| voice5maleclean | Male | Clean Latvian | β
Yes |
| ~~voice3male~~ | Male | ENGLISH audio - XTTS leftover | β NO - excluded from form |
### Critical Warning
voice3male was carried over from XTTS migration and contains English reference audio. Using it for Latvian text causes "accent bleed" - English-sounding Latvian. Never map Latvian voices to voice3male.
### Legacy Voice Mappings (Narakeet β OmniVoice)
For backward compatibility, these aliases are mapped:
| Legacy Name | Maps To | Notes |
|-------------|---------|-------|
| inese, betty, female | voice2conversational | Female Latvian β |
| arturs, john, male | voice5maleclean | Male Latvian β (FIXED 2026-05-18 from broken voice3male) |
### Quick TTS Endpoint
POST /api/tts/generate
{
"text": "Sveiki, kΔ jums klΔjas?",
"voice": "voice2_conversational", # or any of the 4 Latvian voices
"speed": "normal", # "slow", "normal", "with_pauses"
"mode": "single" # "single" or "sentences"
}Response:
{
"success": true,
"mode": "single",
"audio_urls": ["/api/tts/audio/20260518_xxx.wav"]
}### Adding New Voices
1. Add WAV to /app/speakers/ inside omnivoice-lv container
2. Add transcript to /app/speakers/transcripts.yaml (verified by ASR)
3. Test: curl POST http://localhost:8021/ttstoaudio/
4. Add to form template quicktts.html
5. Update kba4fb069b755e
### How TTS Flows Through System
[User] β Quick TTS form (/tts)
β POST /api/tts/generate (tts_bp.py)
β resolves voice via XTTS_VOICE_MAP
β POST http://omnivoice-lv:8000/tts_to_audio/
β uses /app/speakers/.wav as reference
β XTTS model on RTX 3060 generates WAV
β saved to /workspace/media/tts/
β served at /api/tts/audio/ ### Device Ordering (Verified 2026-05-17)
Default: PCIBUSID order (NOT FASTESTFIRST)
nvidia-smi index | name | PCI bus
0 | Tesla P4 | 04:00.0
1 | NVIDIA RTX 3060 | 08:00.0Inside containers: cuda:0 = Tesla P4, cuda:1 = RTX 3060
### Per-Service GPU Assignment
| Service | GPU | VRAM Used | Rationale |
|---------|-----|-----------|-----------|
| omnivoice-lv | RTX 3060 | 5,272 MiB | Largest model, needs RTX 3060's 12 GB |
| asr-transcription-lv | RTX 3060 | 1,928 MiB | Whisper benefits from sm86 compute |
| polycr-paddleocr-1 | RTX 3060 | 264 MiB | OCR service, was already there |
| back-translator-lv | Tesla P4 | 1,885 MiB (3.5 GB declared in GPU supervisor) | Rebalanced 2026-05-17 to use P4 |
| vocal-isolator-lv | Tesla P4 | 344 MiB (idle) | Always pinned to P4 (per KB) |
| forced-aligner-lv | CPU only | N/A | Per KB - prevents VRAM leak |
### Total VRAM Usage
Tesla P4: 3,149 MiB / 7,680 MiB (41% - actively used)
RTX 3060: 7,538 MiB / 12,288 MiB (61% - safe headroom)### How to Pin a Service to a Specific GPU
service-name:
environment:
NVIDIA_VISIBLE_DEVICES: "0" # "0"=P4, "1"=RTX 3060
CUDA_VISIBLE_DEVICES: "0" # Container sees only one GPU, becomes cuda:0
runtime: nvidia
deploy:
resources:
reservations:
devices:
- driver: nvidia
device_ids: ['0'] # Matches NVIDIA_VISIBLE_DEVICES
capabilities: [compute, utility]KB Entry: kb_fca78bc57c59 (full details on GPU ordering + rebalance)
### Status: β DEPLOYED 2026-05-17 - Real Anki Working with 5,004 cards / 41 decks
Verification:
curl -s -X POST http://localhost:8765 -H "Content-Type: application/json" -d '{"action":"deckNames","version":6}'
# Returns 41 real decks including "Latvian (ChatGPT)::Vocab & Sentences"
curl -s https://latvian.shifting-ground.link/api/anki
# Returns: {"available":true,"total_cards":4819,"total_notes":1715,...}### Public URL Routing (THE REAL PICTURE)
[Anki Desktop on user devices]
β syncs via
[sync.shifting-ground.link] β THE REAL PRODUCTION URL β
WORKING
β pfSense HAProxy routes directly to
[192.168.1.30:27701] anki-sync container
β writes to
[/mnt/data/apps/anki-sync/data/david/collection.anki2]
[TILTS Dashboard latvian.shifting-ground.link]
β uses internal Docker network
[anki-headless:8765] AnkiConnect API β
WORKING
β reads from
[Anki collection synced from anki-sync]### Active URLs Reference
| URL | Status | Purpose | Used By |
|-----|--------|---------|---------|
| sync.shifting-ground.link | β
WORKING (PRIMARY) | Anki Desktop sync server | User's Anki Desktop, iPhone, etc. |
| latvian.shifting-ground.link | β
Working | Latvian Learning Dashboard | Browser users |
| latvian.shifting-ground.link/api/anki/* | β
Working | Real Anki data via internal Docker | Dashboard frontend |
| anki.shifting-ground.link | β οΈ 503 (cosmetic) | Anki Web UI (KasmVNC) | Not used in normal workflow |
| ankiconnect.shifting-ground.link | β οΈ 503 (cosmetic) | External AnkiConnect API | Not used in normal workflow |
### Real Sync Activity Proof
The anki-sync container actively serves user david's Anki Desktop 25.09.2:
INFO request{uri="/msync/uploadChanges" ip="192.168.1.1" uid="david" client="25.09.2,3890e12c,linux"}: finished httpstatus=200
INFO request{uri="/msync/mediaSanity" ip="192.168.1.1" uid="david"}: finished httpstatus=200Files being uploaded include Latvian vocabulary card images:vΔlvienreiz.png, Δehija.png, Δetri.png, Δ£imene.png, Ε veice.png, ΕΎurnΔls.png, etc.
The source IP 192.168.1.1 (pfSense) confirms traffic flows: External β pfSense HAProxy β anki-sync:27701.
### NPMplus Proxy Hosts (for non-critical Anki URLs)
These NPMplus entries exist but their pfSense backends are unhealthy. Only matters if you need direct browser access to Anki Web UI - not required for normal usage.
| Subdomain | Config File | Upstream | pfSense Status |
|-----------|-------------|----------|----------------|
| anki.shifting-ground.link | /data/nginx/proxyhost/13.conf | 192.168.1.30:3000 | β οΈ Backend down |
| ankiconnect.shifting-ground.link | /data/nginx/proxyhost/16.conf | 192.168.1.30:8765 | β οΈ Backend down |
The fact that these URLs return 503 does NOT impact:
- Anki Desktop sync (uses sync.shifting-ground.link instead)
- Dashboard Anki data display (uses internal Docker network)
- Test/quiz generation in TILTS (uses internal AnkiConnect)
These URLs would only be needed for:
- VNC-browsing Anki Desktop GUI through web (rare)
- External tools calling AnkiConnect from outside the LAN (rare)
### Source of Truth: github.com/davidgut1982/portainer/stacks/anki/
Local clones (verified):
- /home/david/portainer/stacks/anki/docker-compose.yml
- /mnt/data/apps/portainer-export/stacks/anki/docker-compose.yml
The Stack Has TWO Services (NOT just one!):
### Service 1: anki-headless (Anki Desktop + AnkiConnect API)
| Field | Value |
|-------|-------|
| Image | registry.shifting-ground.link/anki-headless:latest |
| (Mirror) | 192.168.1.4:5000/anki-headless:latest (same SHA: 3e5bae7d9f74) |
| Image Type | linuxserver/webtop + custom-cont-init.d scripts |
| User | 0:0 (root - required for s6-overlay) |
| Ports | 8765:8765 (AnkiConnect API), 3000:3000 (KasmVNC web UI) |
| Networks | ankinetwork, frontendnet |
| Restart | unless-stopped |
Environment Variables (REQUIRED):
ANKI_API_KEY=
DISPLAY=:99
ANKICONNECT_BIND_HOST=0.0.0.0 # CRITICAL - default is localhost only!
ANKICONNECT_PORT=8765
PUID=1005
PGID=136
TZ=America/Chicago Volume Mounts (REQUIRED):
volumes:
- /mnt/data/apps/anki:/config # Main Anki config + addons + media
- /mnt/data/apps/anki/custom-cont-init.d:/custom-cont-init.d # THE SECRET SAUCE - init scripts
- /mnt/data/apps/anki/nginx:/var/lib/nginx # Nginx runtime
- /mnt/data/apps/anki/nginx/logs:/var/log/nginx # Nginx logs
- /mnt/data/apps/anki-media:/config/.local/share/Anki2/User 1/collection.media # Card media### The Custom Init Scripts (CRITICAL for it to work!)
Location: /mnt/data/apps/anki/custom-cont-init.d/
00-anki-setup.sh - Creates Anki user directory:
#!/command/with-contenv bash
set -e
mkdir -p /config/.local/share/Anki2/"User 1"
chown -R abc:abc /config05-enable-ankiconnect.sh - Enables AnkiConnect addon programmatically:
#!/command/with-contenv bash
set -e
CFG=/config/.local/share/Anki2/addons21.json
mkdir -p /config/.local/share/Anki2
if [ ! -f "$CFG" ]; then
cat >"$CFG" <Without these init scripts, the webtop image starts but AnkiConnect addon is never enabled.
### Service 2: anki-sync (Anki Sync Server) - β DEPLOYED 2026-05-17
| Field | Value |
|-------|-------|
| Image | ghcr.io/luckyturtledev/anki:latest (27MB Rust binary, NOT Python Anki) |
| Container | anki-sync |
| Port | 27701:8080 |
| Networks | latviannetwork (in our setup) / frontendnet (per portainer compose) |
| Restart | unless-stopped |
| Env | TZ=America/Chicago, SYNCUSER1=david: |
| Volumes | /mnt/data/apps/anki-sync/{config,data,logs} |
Status: Running on port 27701, listening at 0.0.0.0:8080 inside container.
Verification:
docker logs anki-sync | tail -5
# INFO listening addr=0.0.0.0:8080
curl -s -o /dev/null -w "%{http_code}\n" http://localhost:27701/
# Returns 404 (NORMAL - sync server only has /sync/* endpoints)KB Entry: kb36b2f93e8edd
This is the Anki Sync Server that:
- Receives sync requests from Anki Desktop (in container OR on personal devices)
- Stores the master collection.anki2 at /mnt/data/apps/anki-sync/data/david/collection.anki2
- That file was modified TODAY because real Anki sync is happening!
- Provides AnkiWeb-protocol-compatible sync API
The Flow:
[Personal device/iPhone Anki] β syncs to β anki-sync:27701 β writes /mnt/data/apps/anki-sync/data/david/
[Container anki-headless] β syncs to β anki-sync:27701 β keeps in sync
[TILTS dashboard] β queries β anki-headless:8765 (AnkiConnect API) β reads real cards### Deployment
Real docker-compose.yml (from /home/david/portainer/stacks/anki/docker-compose.yml):
version: "3.9"
services:
anki-headless:
image: registry.shifting-ground.link/anki-headless:latest
container_name: anki-headless
user: "0:0"
environment:
- ANKI_API_KEY=${ANKI_API_KEY}
- DISPLAY=:99
- ANKICONNECT_BIND_HOST=0.0.0.0
- ANKICONNECT_PORT=8765
- PUID=1005
- PGID=136
networks:
- anki_network
- frontend_net
ports:
- "8765:8765"
- "3000:3000"
volumes:
- ${CONFIGURATION_FILES}/anki:/config
- ${CONFIGURATION_FILES}/anki/custom-cont-init.d:/custom-cont-init.d
- ${CONFIGURATION_FILES}/anki/nginx:/var/lib/nginx
- ${CONFIGURATION_FILES}/anki/nginx/logs:/var/log/nginx
- /mnt/data/apps/anki-media:/config/.local/share/Anki2/User 1/collection.media
restart: unless-stopped
anki-sync:
image: ghcr.io/luckyturtledev/anki:latest
container_name: anki-sync
restart: unless-stopped
ports:
- "27701:8080"
volumes:
- ${CONFIGURATION_FILES}/anki-sync/config:/config
- ${CONFIGURATION_FILES}/anki-sync/data:/data
- ${CONFIGURATION_FILES}/anki-sync/logs:/logs
environment:
- "TZ=America/Chicago"
- "SYNC_USER1=david:${ANKI_SYNC_PASSWORD}"
networks:
- frontend_net
networks:
anki_network:
driver: bridge
frontend_net:
driver: bridgeRequired .env file:
ANKI_API_KEY=
ANKI_SYNC_PASSWORD=
CONFIGURATION_FILES=/mnt/data/apps Deploy command:
cd /home/david/portainer/stacks/anki
# Create .env with real credentials first
docker compose up -d### Verification (REAL Anki, not mock)
# AnkiConnect API responding
curl -s -X POST http://localhost:8765 \
-H "Content-Type: application/json" \
-d '{"action":"version","version":6}'
# Expected: {"result":6,"error":null}
# Get real deck names
curl -s -X POST http://localhost:8765 \
-d '{"action":"deckNames","version":6}'
# Expected: array of real deck names (e.g., "Latvian (ChatGPT)::Vocab & Sentences")
# Anki Sync Server health
curl -s http://localhost:27701/
# Expected: Anki Sync Server response
# Container status
docker ps --filter "name=anki" --format "{{.Names}}|{{.Status}}"### Common Issues
| Issue | Cause | Fix |
|-------|-------|-----|
| Port 8765 not listening | Custom init scripts missing | Mount /mnt/data/apps/anki/custom-cont-init.d |
| AnkiConnect bind to localhost only | Missing env var | Set ANKICONNECTBINDHOST=0.0.0.0 |
| Permission denied on nginx | wrong user | Must use user: "0:0" (root) |
| Webtop loops on s6-init | seccomp blocks setgroups | Add securityopt: - seccomp:unconfined + privileged: true if in LXC |
| Collection not syncing | Sync server password mismatch | Check SYNC_USER1=david:password matches Anki Desktop sync config |
### Primary Repository: github.com/davidgut1982/portainer
Structure:
portainer/
βββ stacks/ # Portainer-deployable stacks
β βββ anki/ # Anki sync server + media
β βββ arr-stack/ # Media management
β βββ learning-stack/ # Main TILTS infrastructure
β βββ learning-webapp/ # Latvian Flask web app source
β βββ mcp-dashboard/ # MCP dashboard
β βββ npmplus-setup/ # Reverse proxy
β βββ plex-stack/ # Plex media
β βββ utilities-stack/ # Registry, npm, etc.
β βββ xtts/ # XTTS training infrastructure
βββ educational-stack-full.yml # All 16 AI services (582 lines)
βββ srv-compose-files/ # Service compose files
βββ docker-infrastructure-temp/ # Alternative configs### Project Source Code
Latvian Learning (this repo): /srv/latvian_learning/
- tilts-system/ - Main TILTS application (Python/Flask)
- agent/dashboard/ - Dashboard blueprints, templates, modules
- docker/ - Per-service Dockerfiles for AI pipeline
- latvian-learning-tilts-main/ - TILTS frontend image source
- latvian-learning-webapp/ - Services dashboard image source
- docs/user-journeys/ - This documentation
- docs/diagrams/ - Architecture diagrams
### latviannetwork (primary)
- Subnet: 172.18.0.0/16
- All TILTS services use this
- External=true in compose files
### dockerlatviannetwork (legacy)
- Subnet: 172.22.0.0/16
- Some services (omnivoice, vocal-isolator) historically on this
- CRITICAL: Services on this MUST also be on latviannetwork for dashboard to reach them
### ankinetwork
- For anki-headless service
- Also requires latviannetwork connection
### "Service X is down" β Standard Recovery
# 1. Check status
docker ps -a | grep
# 2. Check logs for last 50 lines
docker logs --tail 50
# 3. Try restart
docker restart
# 4. If restart fails, recreate from compose:
docker compose stop
docker compose rm -f
docker compose up -d
# 5. If image is missing/corrupt, pull from registry
docker pull 192.168.1.4:5000/:latest ### "Can't reach service from another container" β Network Issue
# Check what networks the service is on
docker inspect --format '{{range $name, $net := .NetworkSettings.Networks}}{{$name}}: {{$net.IPAddress}}{{"
"}}{{end}}'
# Connect to additional network if needed
docker network connect latvian_network ### "Container starts but service unresponsive" β Common Causes
1. Missing volume mount: Check docker inspect
2. Wrong network: Service on wrong subnet (see Networks above)
3. GPU not allocated: Need deploy.resources.reservations.devices in compose
4. CUDA mismatch: Containers built for CUDA 12.1, host runs 13.x (forward compat OK)
5. CPU pinning wrong: Host has sparse cores (0,3,6,9,12,15) not contiguous (0-3)
### Full System Recovery
# 1. Pull all images from registry
docker compose pull
# 2. Recreate all containers
docker compose down
docker compose up -d
# 3. Wait for healthchecks
sleep 60
docker ps --filter health=unhealthy
# 4. Verify dashboards
curl -s -o /dev/null -w "TILTS: %{http_code}
" http://localhost:5002/
curl -s -o /dev/null -w "Services Dashboard: %{http_code}
" http://localhost:5003/Host: docker-registry (CT 104)
List all images:
curl -s http://192.168.1.4:5000/v2/_catalogGet tags for specific image:
curl -s http://192.168.1.4:5000/v2//tags/list Registry build host: Some images built on 192.168.1.4 (registry host) and pushed locally.
| Issue | Status | Notes |
|-------|--------|-------|
| anki-headless using mock | π΄ BROKEN | Real image needs investigation - registry image is KasmVNC/GUI based |
| 5 diagrams render tiny | β
FIXED | Added explicit width/height to SVGs |
| Shadow dir permission | β
FIXED | Created /srv/latvian_learning/workspace/media/shadowing with 777 |
| Writing 500 error | β
FIXED | chmod 777 on logs/memory dir |
| Grammar JS broken | β
FIXED | Closed unterminated <script> tag |
| Missing /api/anki/* routes | β
FIXED | Added 5 new endpoints |
| Test gen slow | β οΈ WORKAROUND | 21s/5q, 54s/10q - linear scaling |
| GPU CUDA mismatch | β
FIXED | Added runtime: nvidia to compose |
Deployed: 2026-05-17 - KB Entry: kb3c2d2a671885
### What's Monitored
| Layer | Tool | Catches |
|-------|------|---------|
| Container health | cadvisor + ContainerDown alert | Crashed/stopped containers |
| System resources | node-exporter | Disk full, RAM, CPU spikes |
| GPU usage | nvidia-gpu-exporter | VRAM exhaustion, idle GPUs |
| Public URLs | blackbox-exporter | 503s, timeouts, SSL issues |
| Application data | latvian-exporter (custom) | Mock servers, stale CEFR, zero cards |
### Key Custom Metrics (latvian-exporter)
Available at http://192.168.1.8:9116/metrics:
- latvianankitotalcards - real card count (4819 = healthy, <100 = MOCK)
- latvianankimocksuspected - 1 = ALERT, mock detected
- latviancefrsnapshotvalid - 0 = broken CEFR run
- latvianankidecks_count - 41 decks confirmed
- + 12 more metrics
### Alert Rules β Actions
| Alert | Trigger | Auto-Action |
|-------|---------|-------------|
| ContainerDown | Container missing 5+ min | Restart container |
| AnkiConnectNoRealData | Direct probe fails | Restart anki-headless |
| SyncServerDown | sync.shifting-ground.link fails | Restart anki-sync |
| AnkiCardCountSuspiciouslyLow | Cards < 100 | Alert only (mock danger) |
| CEFRStaleSnapshot | vocab=0 but Anki OK | Trigger CEFR recalc |
| AnkiSyncStale | No sync 24h | Restart anki-sync |
| PublicURLDown | Any public URL down | Alert only (pfSense issue) |
### Grafana Dashboards (http://192.168.1.8:3001)
1. Latvian Learning Health - Single-pane status view
2. Anki Sync Activity - Card growth, daily reviews trend
3. GPU Per-Service - Per-process VRAM, P4 vs RTX 3060
### How to Test Monitoring
# Verify all metrics
ssh root@192.168.1.8 'curl -s http://localhost:9116/metrics | grep latvian_'
# Force trigger an alert (test mode)
ssh root@192.168.1.8 'curl -X POST http://localhost:9093/api/v2/alerts -d "[{...}]"'
# View dashboards
open http://192.168.1.8:3001### 2026-05-18 - Quick TTS / OmniVoice Voice Fix
User found Quick TTS form was using old Narakeet voice names (Betty/John) which mapped to broken voice3male (English audio). KB: kba4fb069b755e
- Fixed XTTSVOICEMAP: john/arturs/male β voice5maleclean (was voice3male English)
- Updated Quick TTS form to show all 4 Latvian OmniVoice voices with native names
- Added warning about voice3male being English-only
### 2026-05-17 - Audit Blind Spots Found by User
User found 2 bugs the audits missed (upload + TTS). KB: kb4f47cd82433b
- Fixed: ffmpeg missing in Dockerfile (audio upload broken)
- Documented: TTS API actually works, just had scary supervisor warnings
### 2026-05-17 - Monitoring Stack Deployed (5 phases)
Comprehensive Prometheus/Grafana/Alertmanager monitoring with auto-remediation. KB: kb3c2d2a671885
- Blackbox exporter for URL monitoring
- Custom latvian-exporter for app metrics
- 12 alert rules with moderate auto-remediation
- 3 Grafana dashboards
### 2026-05-17 - Comprehensive System Audit
Found 7 issues, fixed 7. KB: kbebe48d69afea
- Fixed: /api/cefr returning stale broken data (showed A1- instead of A2-)
- Fixed: Dashboard statWords "--" placeholder (auto-resolved by CEFR fix)
- Fixed: /api/queue/status 404 (added stub endpoint)
- Fixed: /api/plan/
- Fixed: Test generation no progress UI (added spinner + status message)
- Fixed: /progress/weekly contradictory text (improved error handling)
- Documented: Anki new/learning split discrepancy (cosmetic only)
### 2026-05-17 - GPU Rebalance
back-translator moved to Tesla P4. KB: kbfca78bc57c59
- P4: 4.6% β 41% utilization (now actively used)
- RTX 3060: 84% β 61% utilization (safer headroom)
### 2026-05-17 - Real Anki Restoration
Replaced mock Python server with real Anki + AnkiConnect. KB: kbd80f5d3c855e
- 5,004 real cards across 41 decks
- Used registry image with custom-cont-init.d scripts
### 2026-05-17 - Anki Sync Server Deployed
Added anki-sync container for sync.shifting-ground.link. KB: kb36b2f93e8edd
- ghcr.io/luckyturtledev/anki:latest, port 27701
- Active sync confirmed from user's Anki Desktop 25.09.2
| Issue | Impact | Workaround |
|-------|--------|-----------|
| /diagrams console "Unexpected token '<'" | Cosmetic console error | None needed |
| anki.shifting-ground.link 503 | Direct web UI access blocked | Use sync.shifting-ground.link for sync (works) |
| ankiconnect.shifting-ground.link 503 | External AnkiConnect blocked | Dashboard uses internal Docker network (works) |
| CEFR tracker writes vocab=0 snapshots | Fixed by /api/cefr fallback | Long-term: fix cefr_tracker.py |
| New vs Learning split varies | Cosmetic only - totals match | Use endpoint that matches your need |
Decision tree:
1. Identify which service is broken (from error message or health check)
2. Look up that service in the inventory above
3. Check the source repo and image name
4. Try "Standard Recovery" procedure
5. If still broken, check Known Issues
6. If novel issue, check container logs and document the fix here
7. Add to Audit History above so future agents have context
System Flow Diagrams
Live architecture and data-flow diagrams rendered from docs/diagrams/system-flows.md
1. Overall System Architecture
graph TB
User["User (browser)"] --> nginx["nginx\nlatsvian.shifting-ground.link"]
nginx --> main["latvian-main-site\n:5002 (gunicorn)"]
nginx --> healthproxy["health-monitor-proxy/*\n(reverse proxy)"]
healthproxy --> healthmon["system-health-monitor\n:5004"]
main --> postgres[("latvian-postgres\n:5433\ntilts_tezaurs / learning.*")]
main --> tts["omnivoice-lv\n:8000 XTTS"]
main --> asr["asr-transcription-lv\n:8011 Whisper"]
main --> aligner["forced-aligner-lv\n:8102 MMS"]
main --> gateway["comprehensive-validation-gateway\n:8097 / internal :8007"]
main --> bt["back-translator-lv\n:8104 NLLB-200 CT2 INT8"]
main --> repair["repair-loop-lv\n:8100 / :8010"]
main --> openrouter["OpenRouter\nhttps://openrouter.ai/api/v1"]
openrouter --> tutor["TUTOR_MODEL\ndeepseek-chat-v3.1\nor gemini-2.5-flash"]
openrouter --> quiz["QUIZ_MODEL\ngpt-4o-mini"]
openrouter --> repairmodel["REPAIR_MODEL\ngpt-4o-mini"]
gateway --> fluencygate["fluency-gate-lv\n:8095 / :8005"]
gateway --> grammargate["grammar-gate-lv\n:8096 / :8006"]
fluencygate --> fluencyindex["fluency-index-lv\n:8094 / :8004\nFAISS LVTB"]
fluencyindex --> embedder["sentence-embedder-lv\n:8093 / :8003"]
grammargate --> udpipe["udpipe-lv\n:8092 / :8002"]
repair --> repairmodel
repair --> gateway
subgraph Monitoring ["Monitoring Tier (192.168.1.8 bastion)"]
prom["Prometheus\n:9090"]
graf["Grafana\n:3001"]
loki["Loki\n:3100"]
am["Alertmanager\n:9093"]
end
main -->|"/metrics scrape 30s"| prom
prom --> am
prom --> graf
loki --> graf
style postgres fill:#b8d4e8
style openrouter fill:#ffe4b5
style gateway fill:#d4edda
style repair fill:#fff3cd
style Monitoring fill:#f0f4ff
2. AI Tutor Chat Flow (Text + Voice)
flowchart TD
subgraph Input ["Input β choose one"]
TextIn["User types Latvian text\n(Chat mode)"]
VoiceIn["User records audio\n(Voice Mode)"]
end
subgraph VoicePipeline ["Voice Pipeline (voice-input only)"]
ASR["asr-transcription-lv:8011\nWhisper (lv language)"]
Align["forced-aligner-lv:8102\nPer-word {word, start, end, score}"]
ColoredWords["Colored word display\ngreen/yellow/red per score"]
end
subgraph ChatPipeline ["Chat Pipeline (both paths)"]
UserValidate["Validate user text\ncomprehensive-validation-gateway:8097"]
UserBadge["Teal badge on user message\nquality_score + assessment"]
LLM["OpenRouter β TUTOR_MODEL\n(deepseek-chat-v3.1)"]
AIValidate["Validate AI response\ncomprehensive-validation-gateway:8097"]
RepairCheck{"quality_score < 0.70?"}
Repair["repair-loop-lv:8100\ngpt-4o-mini rewrites text"]
ReRepair["Re-validate repaired text"]
BackTransAI["back-translator-lv:8104\nLV β EN translation"]
BackTransUser["back-translator-lv:8104\nLV β EN (user text)"]
TTSOut["omnivoice-lv:8000\nXTTS β audio URL"]
end
subgraph Output ["Output"]
SaveDB["INSERT learning.messages\n(tilts_tezaurs)"]
UI["Browser renders:\n- AI message + green/amber badge\n- English back-translation\n- Audio playback button\n- User teal badge + EN translation"]
end
TextIn --> UserValidate
VoiceIn --> ASR --> Align --> ColoredWords
ColoredWords --> UserValidate
UserValidate --> UserBadge
UserBadge --> LLM
LLM --> AIValidate
AIValidate --> RepairCheck
RepairCheck -- Yes --> Repair --> ReRepair --> BackTransAI
RepairCheck -- No --> BackTransAI
BackTransAI --> BackTransUser
BackTransUser --> TTSOut
TTSOut --> SaveDB --> UI
3. Pronunciation Checker Flow
flowchart TD
subgraph TextSource ["Reference Text β choose source"]
VocabSrc["GET /api/pronunciation/sample?source=vocab\nRandom word from learning.vocabulary\n(example_lv preferred)"]
TranscriptSrc["GET /api/pronunciation/sample?source=transcript\nRandom Latvian sentence from study_sessions\n(heuristic Latvian filter)"]
QueueSrc["GET /api/pronunciation/sample?source=queue\nRandom item from SRS review queue"]
end
subgraph RecordFlow ["Recording & Alignment"]
Browser["Browser MediaRecorder\nWebM/Opus audio"]
Upload["POST /api/pronunciation/align\nmultipart: audio (WebM) + text (str)"]
FFmpeg["ffmpeg\nWebM -> WAV (16kHz mono)"]
Aligner["forced-aligner-lv:8102/align\nMMS forced alignment"]
Words["Per-word result:\n{word, start, end, score 0-1}"]
end
subgraph Display ["Display & Playback"]
ColorCode["JS colors each word span:\ngreen >= 0.75\nyellow >= 0.45\nred < 0.45"]
Playback["Playback buttons:\n- Your recording\n- Hear correct pronunciation (TTS)\n- Click any word -> word TTS"]
TTS["GET /api/pronunciation/tts?text=...\nomnivoice-lv:8000 XTTS"]
end
VocabSrc --> Browser
TranscriptSrc --> Browser
QueueSrc --> Browser
Browser --> Upload --> FFmpeg --> Aligner --> Words --> ColorCode
ColorCode --> Playback
Playback -- "reference audio" --> TTS
4. Validation Pipeline Flow
flowchart TD
Input["Latvian text\n(user message or AI response)"]
subgraph Gateway ["comprehensive-validation-gateway:8097"]
FluencyCall["fluency-gate-lv:8095\nFAISS LVTB nearest-neighbor\nWeight: 60%"]
GrammarCall["grammar-gate-lv:8096\nUDPipe morphological rules\nWeight: 40%"]
Combine["quality_score =\nfluency x 0.60 + grammar x 0.40"]
end
subgraph FluencyDetail ["Fluency Gate internals"]
FIndex["fluency-index-lv:8094\nFAISS index over LVTB corpus"]
Embedder["sentence-embedder-lv:8093\nparaphrase-multilingual vectors"]
end
subgraph GrammarDetail ["Grammar Gate internals"]
UDPipe["udpipe-lv:8092\nMorphological parsing"]
end
Assess{"quality_score threshold"}
GoodLabel["overall_assessment: good\n(>= 0.85)"]
AcceptLabel["overall_assessment: acceptable\n(>= 0.70)"]
PoorLabel["overall_assessment: poor\n(>= 0.55)"]
InvalidLabel["overall_assessment: invalid\n(< 0.55)"]
RepairCheck{"quality_score < 0.70?"}
Repair["repair-loop-lv:8100\ngpt-4o-mini via OpenRouter\nPOST /repair"]
ReValidate["Re-validate repaired text\n(best-effort β use original if still low)"]
BackTrans["back-translator-lv:8104\nNLLB-200-distilled-1.3B-ct2-int8\nENβLV (Tesla P4 GPU, CTranslate2 INT8)"]
ENDisplay["English shown to user\nas comprehension aid"]
Save["Save to learning.messages\n(tilts_tezaurs, schema learning)"]
Input --> FluencyCall
Input --> GrammarCall
FluencyCall --> FIndex --> Embedder
GrammarCall --> UDPipe
FluencyCall --> Combine
GrammarCall --> Combine
Combine --> Assess
Assess --> GoodLabel
Assess --> AcceptLabel
Assess --> PoorLabel
Assess --> InvalidLabel
Combine --> RepairCheck
RepairCheck -- Yes --> Repair --> ReValidate --> BackTrans
RepairCheck -- No --> BackTrans
BackTrans --> ENDisplay
BackTrans --> Save
style GoodLabel fill:#90EE90
style AcceptLabel fill:#d4edda
style PoorLabel fill:#fff3cd
style InvalidLabel fill:#f8d7da
style Repair fill:#ffe4b5
5. Observability Stack
graph TB
tilts["latvian-main-site\n:5002\n/metrics"] -->|scrape 30s| prom["Prometheus\n192.168.1.8:9090"]
cadvisor["cAdvisor\n:8082"] -->|scrape| prom
nodeexp["node-exporter\n:9100"] -->|scrape| prom
latvexp["latvian-exporter\n:9116"] -->|scrape| prom
bb["blackbox-exporter\n:9115"] -->|probe| prom
prom -->|alerts| am["Alertmanager\n:9093"]
am -->|webhook| rem["remediation-agent\n:8888"]
prom -->|datasource| graf["Grafana\n192.168.1.8:3001"]
promtail["promtail\nDocker SD"] -->|logs| loki["Loki\n:3100"]
loki -->|datasource| graf
style prom fill:#f5a623,color:#000
style graf fill:#5c9bd6,color:#fff
style loki fill:#6dc066,color:#fff
style am fill:#e87040,color:#fff
6. Anki Sync Architecture
graph LR
headless["anki-headless\n:8765 AnkiConnect\n(content manager)"]
sync["anki-sync\n:27701\n(sync server)"]
laptop["User Laptop Anki\n(study client)"]
headless -->|"creates cards, adds/removes tags\nalways UPLOADS on conflict\n(aqt/sync.py patched)"| sync
laptop -->|"uploads study progress\n(reviews, intervals, ease)"| sync
sync -->|"downloads content changes"| laptop
batch["AudioBatchWorker\nSQLite: workspace/batch_tracking/media_batches.db\n(NOT PostgreSQL)"]
omni["omnivoice-lv:8000\nXTTS TTS\ncheck model_loaded β HTTP 200 β ready"]
media["/mnt/data/apps/anki/\n.local/share/Anki2/User 1/collection.media/\n{note_id}_lemma.mp3\n{note_id}_example.mp3"]
batch -->|"POST /tts_to_audio/"| omni
omni --> media
headless -->|"enqueues audio jobs\nvia AnkiConnect"| batch
media -->|"syncs with collection"| sync
style headless fill:#d4edda
style sync fill:#b8d4e8
style laptop fill:#fff3cd
style omni fill:#ffe4b5
7. Lesson Synthesis Intelligence
graph TD
transcript["Lesson Transcript"] --> rawvocab["Raw Vocabulary Extract"]
rawvocab --> intervalcheck["Check Anki Intervals\nvia AnkiConnect :8765"]
intervalcheck --> split{"sr_interval?"}
split -->|"> 21d (mastered)"| excluded["EXCLUDED\nAnki long-term schedule"]
split -->|"<= 21d (learning)"| srs["SRS Review Pool"]
srs --> curation["GPT Curation\n8-12 review + 3-5 new"]
newwords["New session words"] --> curation
curation --> plan["Lesson Plan\n15 curated words"]
plan --> tags["current_week tag applied\nFiltered deck rebuilds"]
plan --> anki["Anki: _manage_active_window()\nOld cards suspended"]
Documents
Ai Tutor
Anki Architecture
Curriculum
Documentation Status
Emergency Procedures
Lesson Recording
Master Documentation
Monitoring
Openrouter Models
Pronunciation Checker
Quick Reference
Readme
Validation Pipeline
Diagrams
14
System Overview
4 Pipelines
User Flows
2 Flows
Guides
6