Y Combinator S2026 Applicant

Cross-GPU bit-identical verified

25.5 DAYS to 2.7 MINUTES

13,848x faster. Both EXACT.

VLA is zero-error GPU arithmetic. Drop-in replacement for PyTorch. 82 functions with mathematically exact results.

Same result on any GPU. RTX 4060 = Tesla T4 = H100. Patent-pending.

13,848x

faster than CPU

104M

elements exact

precision loss

Free

pip install

pip install simgen-vla See Benchmarks

Scientific Simulation Has Been Broken for 40 Years

Same code + same data = different results on different GPUs. Papers can't be replicated. Audits fail. The Patriot missile bug killed 28 soldiers.

Until now: Fast (GPU, wrong) or Correct (CPU, slow). VLA eliminates this tradeoff.

10240x10240 MATRIX MULTIPLY

25.5 DAYSvs2.7 MIN

Both EXACT

# Drop-in replacement

from simgen import vla

result = vla.matmul(A, B)

# 104M elements, exact result

Cross-GPU checksum

RTX 4070: 6ece6956f187064f

Tesla T4: 6ece6956f187064f

BIT-IDENTICAL

🔥 EXTREME SCALE RESULTS

At Scales That Matter, VLA Delivers

100 MILLION OPS

ZERO

Accumulation Error

FP32: Massive drift

64-DAY SIMULATION

< 1m

Orbital Drift

FP64: Meters of drift

419M ELEMENT MATMUL

22 min

Exact Result

CPU: 204 DAYS

KAHAN SUM TEST

10000

Exact Answer

FP32/64/80: 0 (WRONG)

Verified on Kaggle Tesla T4. Same results on any NVIDIA GPU.

Scientific Simulation Has Been Broken for 40 Years

The dirty secret of computational science: same simulation, different GPU, different answer.

Papers can't be replicated

Reviewers run your code, get different results. Publications rejected. Research wasted. Peer review broken for computational science.

Audits fail

Financial calculations drift between machines. Regulatory fines. Failed compliance. Different totals on different hardware.

Safety-critical failures

Patriot missile bug (1991): 0.34s drift killed 28 soldiers. Floating-point error in life-critical systems has real consequences.

Climate models diverge

Same physics, different GPUs = different predictions. Policy decisions based on noise. Drug simulations unreliable.

WHY HASN'T THIS BEEN SOLVED?

Previous Approaches All Failed

CPU arbitrary precision (mpmath, Decimal)10,000-50,000x slower than GPU — unusable for real work

80-bit extended precisionOnly on x86 CPU, not GPU. Still loses precision.

Interval arithmeticTracks error but doesn't eliminate it

Double-double librariesSlow, not GPU-optimized, still approximations

"Just use more precision"FP128 doesn't exist on GPU hardware

Until now: Fast (GPU, wrong) or Correct (CPU, slow).

VLA eliminates this tradeoff.

THE SOLUTION

VLA: Zero-Error GPU Arithmetic

70+ functions with mathematically exact results. Drop-in replacement for PyTorch. Native CUDA kernels at 13+ GFLOPS.

Fast

13,848x faster than CPU Decimal. Native CUDA kernels for 8 architectures.

Correct

Mathematically exact results. Zero accumulation error. Proven.

Reproducible

Bit-identical across ALL GPU architectures. SHA256 checksum API.

# Install
$ pip install simgen-vla

from simgen import vla

# Zero-error operations
result = vla.sum(tensor)      # Exact sum
result = vla.matmul(A, B)    # Exact matrix multiply
result = vla.dot(a, b)       # Exact dot product

# Cross-GPU reproducibility (KILLER FEATURE)
checksum = vla.checksum(result)  # Same on ANY GPU!
vla.verify(result, expected_checksum)

# Global enable (patches torch ops)
vla.enable()
torch.sum(x)  # Now uses VLA!

See full benchmarks →

What You Can Do

Drop-in Replacement

Replace torch.sum, torch.matmul, and 70+ other functions with exact VLA versions. No model rewrites. Works with any PyTorch code.

Cross-GPU Reproducibility

Get identical checksums on RTX 4060, Tesla T4, A100, H100 — any NVIDIA GPU. Finally, reproducible science.

Exact Simulations

Run orbit propagation, molecular dynamics, financial calculations with zero arithmetic drift. Results you can trust.

Verify & Share

Use vla.checksum() to prove your results match. Attach checksums to papers. Auditable, verifiable computation.

Democratizes Exact Simulation

Before VLA: $100K+ hardware or 25-day CPU runs. After VLA: any $300 GPU.

Scenario	Before VLA	With VLA
100 million accumulations	Catastrophic drift	ZERO error
64-day orbit simulation	Meters of drift	< 1 meter drift
Exact 20K×20K matmul	204 DAYS (CPU)	22 MINUTES (GPU)
Hardware required	$40K+ HPC node	$300 RTX 4060
Cross-GPU results	Different every time	Bit-identical always

Who Needs This

Zero-error precision for every domain where accuracy matters.

Finance

Audit-proof calculations. Bit-identical totals across machines. No more regulatory surprises.

Pharma & Biotech

Exact molecular dynamics. Reliable drug simulations. Reproducible research.

Aerospace

Zero-drift orbit propagation. Precise trajectory calculation over millions of timesteps.

Climate Science

Deterministic long-term forecasts. Reproducible climate models across hardware.

Nuclear & Defense

Safety simulations that can be certified. Provably correct calculations.

Academia

Papers that can be replicated. Reproducible computational science. Open research.

See detailed use cases →

Works on Any NVIDIA GPU

Native CUDA kernels for 8 architectures. Same exact results everywhere.

GTX 1060-1080

Pascal

RTX 2060-2080

Turing

RTX 3060-3090

Ampere

RTX 4060-4090

Ada

Tesla T4

Turing

A100

Ampere

H100

Hopper

Cloud GPUs

All

Get Started

Free tier available. Full VLA precision. No credit card required.

Commercial license starts at $49/mo.

$ pip install simgen-vla

Install Free View Pricing