Autonomous Learning
Pipeline skript: ai_autolearn.py
Poradi kroku:
1. crawl_web_sources.py - crawl allowlisted web zdroju
2. self_learn_from_db.py - sbira kvalitni pary z MySQL
3. validate_training_data.py - validace datasetu
4. build_qa_text.py - build raw treninkoveho textu
5. prepare_dataset.py - tokenizace train/val
6. train.py - trenink candidate checkpointu
7. ai_autodeploy.py - safe promote candidate -> prod
Spusteni
python ai_autolearn.py
Web crawling env
ENOAI_WEB_SOURCES- start URL allowlist (csv)ENOAI_WEB_MAX_PAGESENOAI_WEB_MAX_CHARS_PER_PAGEENOAI_WEB_TIMEOUTENOAI_WEB_USER_AGENT
Crawler:
- respektuje
robots.txt - drzi se stejneho hostu
- uklada do
data/raw/web_crawl_knowledge.txt
Self-learning env (DB pary)
ENOAI_SELF_LEARN_WINDOW_DAYSENOAI_SELF_LEARN_MIN_ROWSENOAI_SELF_LEARN_MAX_PAIRSENOAI_SELF_LEARN_MIN_ASSISTANT_LENENOAI_SELF_LEARN_MAX_ASSISTANT_LENENOAI_SELF_LEARN_ALLOWLIST_TERMSENOAI_SELF_LEARN_BLOCKLIST_TERMS
Quality weighting
ENOAI_SELF_LEARN_WEIGHT_EDITEDENOAI_SELF_LEARN_WEIGHT_GOODENOAI_SELF_LEARN_WEIGHT_APPROVEDENOAI_SELF_LEARN_WEIGHT_DEFAULTENOAI_SELF_LEARN_MAX_WEIGHT_DUPESENOAI_SELF_LEARN_DECAY_HALFLIFE_DAYSENOAI_SELF_LEARN_DECAY_MIN_FACTOR
Safe deploy gate
Skript: ai_autodeploy.py
Candidate checkpoint:
enoa_gpt/checkpoints/enoagpt_tiny_final.pt
Production checkpoint:
enoa_gpt/checkpoints/enoagpt_tiny_prod.pt
Promote probehne jen kdyz:
- mini eval score >=
ENOAI_AUTODEPLOY_MIN_SCORE - score gain >=
ENOAI_AUTODEPLOY_MIN_GAIN - (volitelne) DB A/B eval gate projde
DB A/B env:
ENOAI_AUTODEPLOY_DB_EVALENOAI_AUTODEPLOY_DB_EVAL_MIN_SAMPLESENOAI_AUTODEPLOY_DB_EVAL_MIN_DELTAENOAI_DB_EVAL_WINDOW_DAYSENOAI_DB_EVAL_MAX_SAMPLESENOAI_DB_EVAL_MIN_ASSISTANT_LEN
Report:
enoa_gpt/checkpoints/autodeploy_last.json