The most advanced temporal neural simulation platform ever engineered.
speed. 12 milliseconds. Per emulation.
125,000× faster than agent-based systems.
precision 99.8% R². Neural emulation.
DuckDB-powered data pipeline.
deployment; 100 simulations per second.
Perfect GPU utilization. Pure parallel execution.
One model. Any scale. Infinite possibility.
Every bit. Every run. Deterministic to the last electron.
Your laptop. Their cloud. Same performance.
85% faster than a blink.*
24 minutes → 12 milliseconds
Engineered at Imperial. Deployed globally.
Faster than GPT-4 inference
On consumer hardware
UK Chief Scientific Officers
Cosmo Santoni — Applied Mathematics & Machine Learning PhD candidate at Imperial College London.
I transform 24-minute computations into 12ms inference. From pure mathematics to production deployment. Full-stack ML engineering with a single obsession: speed without compromise.
PyTorch. CUDA. DuckDB. React. Whatever it takes to build systems that save lives at scale.
I don't just publish papers. I build production systems that policy-makers & governments use to make critical decisions. From mathematical theory to 125,000× speedups through custom neural architectures. End-to-end: models, APIs, deployment, and yes—even this site.
Real benchmarks. Real hardware. Real breakthrough.
Median wall time
Median wall time for 16 simulations
Best measured configuration
Method. Same scenarios & time horizon for both systems. ABM reports wall time for
8 stochastic replicates per scenario (full 8-rep bundle). Emulator (LSTM) reports p50 over 9 runs
with the fastest dropped; GPU is a single device with 3 warm-ups. CPU-single uses torch.set_num_threads(1);
CPU-parallel uses future::multisession.
Determinism. torch.backends.cudnn.deterministic=TRUE,
torch.backends.cudnn.benchmark=FALSE. A non-deterministic “turbo” profile exists but is
not used for the reported numbers.
Environment. Intel Core Ultra 9 185H (16C/22T), 62 GiB LPDDR5-7467, NVIDIA RTX 3500 Ada (12 GiB),
Fedora Linux 41, NVIDIA 575.64.03. ABM CPU workers ≤4 to avoid thermal throttling; GPU: CUDA_VISIBLE_DEVICES=0.
Data pipeline (DuckDB). RDS outputs flattened into a single simulation_results table.
Local queries (no external DB), with PRAGMA threads and PRAGMA memory_limit set per machine.
Derived targets (prevalence, cases/1000) computed at load. Empirical (full corpus ≈574,095,360 rows; DuckDB v1.1.3-dev165; threads=16; memory_limit='24GB'):
p50 ≈ 0.50 s for COUNT(*),
1.46 s for 30-day cases (GROUP BY),
11.4 s for a 7-step rolling prevalence window.