KORP AI snippets — 2026-06-09
Chatbot testing & evaluation toolkit — companions to
ทดสอบ AI Chatbot ก่อนเปิดใช้จริง (Evaluation/QA/UAT) 2026. MIT licensed.
- golden_set_runner.py — run a CSV golden set, score coverage + must-not violations, save regression report
- llm_judge.py — LLM-as-a-judge scorer (0..1) with strict Thai rubric, provider-agnostic
- judge_calibration.py — prove judge agrees with humans (agreement % + bias) before trusting it
- redteam_injection_suite.py — TH+EN prompt-injection / data-leak attack battery (OWASP LLM Top 10)
- load_test.py — concurrency stress harness, p50/p95 latency + response-shrink degradation flag
- gonogo_gate.py — final Go/No-Go gate, CI-friendly (non-zero exit if any gate fails)
← all snippets · blog · korpai.co