Verlet physics — profile-first benchmark

PinePaper’s bone physics (js/VerletPhysics.js) is the first hot loop being measured before deciding whether any of the Rust/WASM repurpose work is worth doing. Until we have real numbers from real devices, “WASM would be faster” is a hypothesis, not a plan.

This page is the recipe to get those numbers.

Why Verlet first

Across the editor’s compute-heavy paths (IK solvers, ODE integration, FFT, 3D projection, spatial queries), bone physics is the one path that runs every frame, on every rigged character, with no GPU offload available. If 60fps is the target and Verlet is already eating <0.5 ms per step on a mid-tier phone, there’s nothing to optimize anywhere. If Verlet is eating > 2 ms per step on the same device, the rest of the per-frame budget is already squeezed and Rust starts to look real.

It is also the most isolated hot loop in the codebase — no Paper.js calls, no DOM, just floating-point math over a particle/constraint list. That makes it the closest match to what WASM is actually good at, so a measurement on Verlet generalises better to the other candidates than the other way round.

What we measure

The step() method on VerletPhysics runs once per skeleton per frame. It does:

  1. Update animation targets from the FK/IK solve
  2. Verlet integration with spring motors + gravity + damping per particle
  3. Iterative constraint relaxation (distance + angle constraints), N iterations
  4. Collision against external colliders
  5. Bone resync — write particle positions back to skeleton bones

For a 21-bone humanoid this is ~42 particles, ~30 constraints, 3–4 relaxation iterations per step. The instrumentation captures the wall-clock duration of one full step() call, in milliseconds, into a per-skeleton ring buffer (last 1000 samples, ~16 s at 60 fps).

The collection cost when the flag is OFF is one property read per frame — sub-nanosecond, never measurable. When ON it adds two performance.now() calls + a ring-buffer push per skeleton per frame. That extra cost shows up in the numbers (~1–5 µs per step on desktop V8), so think of the reported times as “step + tiny instrument overhead”, not pure step. Same overhead applies in both JS and any future WASM run, so ratios stay honest.

How to run

Step 1 — turn the flag on

window.__pinepaper_perf = true;
window.PinePaper.verletPhysics.clearPerfStats();   // start clean

The flag and the clear are idempotent — running them twice is fine.

Step 2 — exercise a real scenario

Don’t synthesise — load a real one:

  • Open the editor in a fresh tab
  • Rigging Tools → Add Humanoid Skeleton (21 bones)
  • Either pose-cycle it (run a walk cycle: Test Character → Move Left/Right), or apply some animated keyframes to a root bone and press play
  • Let it run ~30 seconds so the ring buffer fills with ~1800 samples at 60 fps

For comparison runs, also try:

  • 11-bone bird preset
  • 15-bone quadruped preset
  • 29-bone centaur preset (the largest preset; the right stress case)

Step 3 — read the percentiles

const skeleton = window.PinePaper.riggingSystem.getAllSkeletons()[0].id;
window.PinePaper.verletPhysics.getPerfStats(skeleton);

Returns:

{
  count: 1800,        // samples in the ring buffer
  p50:   0.42,        // median step time, ms
  p95:   0.71,        // tail — the one that matters for perceived jank
  p99:   1.12,
  min:   0.31,
  max:   3.04,
  mean:  0.48
}

The number to focus on is p95, not the mean. Jank is felt at the tail.

Decision rule

What the result tells us:

p95 on a mid-tier Android (~ Pixel 6a / Galaxy A53) What it means
< 0.5 ms Verlet isn’t a bottleneck. Don’t port to Rust — the editor’s perf problem (if any) is somewhere else.
0.5 – 2 ms Modest cost. A Rust port could shave 30–70% off, but other optimisations (Web Worker offload, batching, fewer constraint iterations) are likely faster wins.
2 – 4 ms Real cost — 12–24% of a 60 fps budget on the physics step alone. Rust port worth scoping; expect 3–5× speedup based on similar JS→WASM ports of vec2 math.
> 4 ms The character is dropping frames on this device. Either Rust port now, or aggressively trim the simulation (fewer iterations / fewer bones with physics enabled / lower-poly rig).

Always measure on at least two devices: the development machine (desktop V8 is wildly fast — desktop p95 of 0.05 ms doesn’t tell you anything about phones), and at least one real mid-tier Android. Apple devices are usually closer to desktop than to Android in this kind of test, so an iPhone reading is informative but not the worst case.

When you have numbers

Paste them into backlog_runtime_rs_unit_tests.md (or its successor backlog memo about the editor-Rust repurpose decision). The decision flow:

p95 mobile < 0.5 ms  →  STOP. Rust everywhere is premature. Look at GPU paths or
                         entirely different bottlenecks (paint / layout / blend).

p95 mobile 0.5–2 ms  →  CONSIDER easier wins first: move VerletPhysics.step() to
                         a Web Worker, reduce constraint iterations, batch by
                         skeleton. If those don't close the gap, then Rust.

p95 mobile > 2 ms    →  PROFILE the Rust hypothesis. Write a tiny Rust impl of
                         step() exposed via wasm-bindgen, run the SAME scenario
                         in this benchmark, compare p95s. If Rust ≥ 3× faster on
                         mobile, plan the port.

The “write a tiny Rust impl” step is a half-day’s work given the existing runtime-rs/ toolchain — it’s literal arithmetic with no async, no DOM, no IO. Comparing real numbers from real devices is how this stops being conversation and becomes a decision.

Future hot-loop benchmarks

Same pattern for the other candidates listed in the Rust-repurpose conversation:

Hot loop Where to instrument Threshold to consider Rust
IK solver (FABRIK / two-bone / CCD) js/RiggingSystem.js per-chain solve p95 > 1 ms per chain on mid-tier mobile
ODE integration (Lorenz, double pendulum) js/math/ODESolver.js step p95 > 3 ms per simulation step
FFT (spectrum analyzer) js/math/SignalProcessor.js FFT call p95 > 5 ms for 2048-bin FFT
3D vertex projection js/ThreeD.js per-frame project loop p95 > 4 ms for 1000-vertex mesh

Each one gets the same __pinepaper_perf-flag pattern and the same decision rule applied. Don’t port any of them speculatively.