PNNPNN

Performance Benchmarks

Memory & energy efficiency at near-parity accuracy

PNN replaces dense weight matrices with a patented connectivity configuration — cutting parameters 1.7–4.5× while holding accuracy within ~0.4 pp on vision and ~1.1 pp on tabular. Because the pattern is index-free, INT8 models are up to 18× smaller than dense FP32 and stay lossless on vision.

1.7–4.5×
fewer parameters than dense
18×
smaller INT8 model (vision, lossless)
≤0.4 pp
accuracy gap on vision tasks
~12 nJ
per inference on the PNN chip (est)

Results

Summary — 5 seeds, iso-architecture

TaskMetricDensePNNPNN int8Savings
MNISTacc % ↑97.4597.0297.004.5× fewer · 18× smaller int8
Fashionacc % ↑88.5788.1788.144.5× · 18×
Tabular*acc % ↑88.7187.6587.382.4× · 9.6×
NLP-GPTval loss ↓2.2682.3871.7× · 6.9×
microGPTop-bound speed1.0×1.59× fasterscalar / no-BLAS

↑ higher = better, ↓ lower = better. NLP metric is validation cross-entropy loss. * Tabular = recommended fc2-only config. INT8 lossless on vision.

Parameters

Fewer parameters per model

050k100k150k200k250k4.5× fewerMNIST4.5× fewerFashion2.4× fewerTabular*1.7× fewerNLP-GPT
Dense baseline PNN

Accuracy

Near-parity, shown truthfully

02040608010097.597.0MNIST88.688.2Fashion88.787.7Tabular*
Dense baseline PNN

Test accuracy %, dense vs PNN side by side, y-axis from 0 so the small gap is shown honestly. Mean of 5 seeds. NLP-GPT uses validation loss (2.268 vs 2.387) — a different metric, omitted here.

Memory

Model footprint — dense FP32 vs PNN INT8

0.020.050.10.5118×MNIST18×Fashion10×Tabular*7×NLP-GPT
Dense baseline PNN

Megabytes, log scale. Index-free connectivity means columns are computed, not stored.

Speed & energy

Where the speed actually shows up

1.59×

Op-bound speedup

Scalar microGPT (no BLAS): fwd 1.70×, bwd 1.35×. The edge / MCU regime.

≈ parity

CPU inference (measured C)

INT8: dense 22 µs vs PNN 25 µs. Element-level sparsity gives no win vs tuned dense GEMM.

~12 nJ

PNN chip (est)

~1.3 µs / inference, ~10–25× better energy than a matched NVIDIA part on a deployed fixed model.

Honest regime map

Where PNN wins — and where it doesn't

Wins

  • Memory: always — 4.5× fewer params, 18× smaller INT8.
  • Accuracy: near-parity on vision, graceful on tabular (≤1.1 pp).
  • Op-bound speed: 1.6× on scalar / no-BLAS edge.
  • Custom silicon: ~1.3 µs, ~12 nJ; ~10–25× better energy than NVIDIA (est).

Limitations

  • No CPU/GPU speed win vs tuned dense GEMM — the prime gather is SIMD-hostile.
  • INT8 not universally free — lossless on vision, harmful if input layers are over-sparsified.
  • NLP gain modest (1.7×, FFN-only); attention/embeddings stay dense.
  • Hardware numbers are engineering estimates; measured CPU reality is parity.