Performance Benchmarks

Memory & energy efficiency
at near-parity accuracy

PNN replaces dense weight matrices with a patented connectivity configuration — cutting parameters 1.7–4.5× while holding accuracy within ~0.4 pp on vision and ~1.1 pp on tabular. Because the pattern is index-free, INT8 models are up to 18× smaller than dense FP32 and stay lossless on vision.

1.7–4.5×

fewer parameters than dense

18×

smaller INT8 model (vision, lossless)

≤0.4 pp

accuracy gap on vision tasks

~12 nJ

per inference on the PNN chip (est)

Results

Summary — 5 seeds, iso-architecture

Task	Metric	Dense	PNN	PNN int8	Savings
MNIST	acc % ↑	97.45	97.02	97.00	4.5× fewer · 18× smaller int8
Fashion	acc % ↑	88.57	88.17	88.14	4.5× · 18×
Tabular*	acc % ↑	88.71	87.65	87.38	2.4× · 9.6×
NLP-GPT	val loss ↓	2.268	2.387	—	1.7× · 6.9×
microGPT	op-bound speed	1.0×	1.59× faster	—	scalar / no-BLAS

↑ higher = better, ↓ lower = better. NLP metric is validation cross-entropy loss. * Tabular = recommended fc2-only config. INT8 lossless on vision.

Parameters

Fewer parameters per model

Dense baseline PNN

Accuracy

Near-parity, shown truthfully

Dense baseline PNN

Test accuracy %, dense vs PNN side by side, y-axis from 0 so the small gap is shown honestly. Mean of 5 seeds. NLP-GPT uses validation loss (2.268 vs 2.387) — a different metric, omitted here.

Memory

Model footprint — dense FP32 vs PNN INT8

Dense baseline PNN

Megabytes, log scale. Index-free connectivity means columns are computed, not stored.

Speed & energy

Where the speed actually shows up

1.59×

Op-bound speedup

Scalar microGPT (no BLAS): fwd 1.70×, bwd 1.35×. The edge / MCU regime.

≈ parity

CPU inference (measured C)

INT8: dense 22 µs vs PNN 25 µs. Element-level sparsity gives no win vs tuned dense GEMM.

~12 nJ

PNN chip (est)

~1.3 µs / inference, ~10–25× better energy than a matched NVIDIA part on a deployed fixed model.

Honest regime map

Where PNN wins — and where it doesn't

Wins

• Memory: always — 4.5× fewer params, 18× smaller INT8.
• Accuracy: near-parity on vision, graceful on tabular (≤1.1 pp).
• Op-bound speed: 1.6× on scalar / no-BLAS edge.
• Custom silicon: ~1.3 µs, ~12 nJ; ~10–25× better energy than NVIDIA (est).

Limitations

• No CPU/GPU speed win vs tuned dense GEMM — the prime gather is SIMD-hostile.
• INT8 not universally free — lossless on vision, harmful if input layers are over-sparsified.
• NLP gain modest (1.7×, FFN-only); attention/embeddings stay dense.
• Hardware numbers are engineering estimates; measured CPU reality is parity.

Memory & energy efficiency at near-parity accuracy

Summary — 5 seeds, iso-architecture

Fewer parameters per model

Near-parity, shown truthfully

Model footprint — dense FP32 vs PNN INT8

Where the speed actually shows up

Op-bound speedup

CPU inference (measured C)

PNN chip (est)

Where PNN wins — and where it doesn't

Wins

Limitations

Memory & energy efficiency
at near-parity accuracy