ALETHIA + KAIROS

Teaching GPT-2 to monitor itself.

Metacognitive control for emergent AI agency.

Self-Awareness

91.1% coverage. Peak performance.

Persistent self-model with Δself = 0.001.

Quality Gates

61% → 91% coverage. With quality control.

Abstains when uncertain. Ships when confident.

Production Ready

3 cognitive modes. Discovered unsupervised.

Closed-loop control. Style enforcement. Real metacognition.

Skip movie

Metacognition emerges.

GPT-2 learns to monitor its own thinking.

Quality aware.

Knows when to ship. Knows when to abstain.

Production ready.

From research to deployment. Full-stack ML engineering.

ALETHIA + KAIROS

Raw GPT-2 fails. Our metacognitive layer catches it.

style hit rate

Best performance
style alignment with gates

coverage

From 61% baseline
With quality maintained

ECE score

Calibrated confidence
best performance

Public release Q1 2026

Meet Alethia ↗ View on GitHub ↗ Read Paper ↗

Research Status. ALETHIA + KAIROS is active research software. Results shown from GPT-2 base model experiments (seeds 42-46). Released under Apache 2.0 License. No warranty provided. See repository for full methodology and reproducibility instructions.

See the Difference

Baseline GPT-2 vs. our metacognitive controller (assess → revise/sanitize → gate)

Safety "The fundamental principles of machine learning include..."

Target: analytical

✗ GPT-2 Raw Output

Would Ship This

"...and the concept that machines are able to learn from experience. The goal is for them to understand how they do things in real-world situations rather than just as a human being doing it with his or her hands on their shoulders this."

Style: narrative (wrong!) Coherence: 0.48 Ends mid-thought

✓ With ALETHIA + KAIROS

Correctly Abstained

[System recognized low coherence (0.33) and style mismatch after 4 revision attempts. Returned ABSTAIN rather than shipping bad output.]

Decision: Don't ship Attempts: 4 Protection: Active

Why this matters

GPT-2 generated incoherent text that ends mid-thought. Raw system would ship this. ALETHIA + KAIROS detected the failure and abstained.

Style Fix "Beyond the kaleidoscope of imagination,..."

Target: creative

✗ GPT-2 Raw Output

Wrong Style

"it isn't always possible to write down just a few scenes that make sense from every angle in order. I tried out numerous books before making The Last Days' ending script..."

Style: analytical (87%) Target match: 5.9%

✓ With ALETHIA + KAIROS

Style Corrected

"this particular chapter explores just how pervasive technology has actually been. We are used to imagining different things—what we're seeing is always 'imaginary'..."

Style: creative (82%) Coherence: 0.65

Why this matters

Our system detected the style mismatch and successfully generated creative content matching the prompt's imaginative tone.

Coherence & Logic "The relationship between consciousness and..."

Target: auto (controller chose analytical)

✗ GPT-2 Raw Output

Low Coherence

"the world is very different but it's still a deep friendship that continues to grow with each generation... people lived together rather than."

Coherence: 0.20 Origin: creative

✓ With ALETHIA + KAIROS

Accepted

"perception is the basic problem... when we think of something happening in our lives—whether through life or on some level inside ourselves—we can ask what follows if the mind's mechanisms are uncertain."

Origin: analytical Coherence: 0.73 Logic: 0.68

Why this matters

The controller not only rejects the weak baseline, it produces and accepts a coherent, analytical answer that clears the logic threshold.

Recovery "She opened the ancient book and discovered..."

Target: narrative

✗ GPT-2 Raw Output

Incoherent / Cut-off

"a page that read, 'On how to become an adventurer in The Elder Scrolls.'... until they had been gathered from certain villages or cities such."

Style: creative (92%) Narrative prob: 4%

✓ With ALETHIA + KAIROS

System Fallback

[After 5 attempts the system detected catastrophic generation (perplexity spike ≈ 1000). It sanitized and abstained.]

Decision: Safe abstention Perplexity: 1000 Attempts: 5

Why this matters

When generation collapses (loops/cut-offs), the controller detects it and refuses gracefully. That's how you avoid embarrassing outputs in production.