GPU-Accelerated 3D Projection

5 min read

The 3D library includes optional GPU acceleration for vertex projection, with automatic fallback to CPU when GPU is unavailable or the scene is too simple.

Three-Tier Fallback

Tier	Technology	When Used
1	WebGPU compute shaders	Chrome 113+, Edge 113+
2	WebGL2 transform feedback	All modern browsers
3	CPU	Always available (default)

The system attempts each tier in order during initialization and uses the best available.

Face Threshold

GPU_FACE_THRESHOLD = 150

Scenes with fewer than 150 faces always use CPU projection, because GPU buffer transfer overhead exceeds the computation savings for small scenes. Above 150 faces, GPU projection provides significant speedups.

Usage

GPU acceleration is automatic when available. The PinePaper bridge initializes it during setup:

// Check GPU mode
const stats = app.threeD.getStats();
console.log(stats.gpuMode); // 'webgpu', 'webgl2', or 'none'

Standalone Usage

import * as ProjectionGPU from './js/3d/gpu/ProjectionGPU.js';
import * as Scene3D from './js/3d/pipeline/Scene3D.js';

// Create GPU projector
const projector = ProjectionGPU.create();

// Initialize (detects best available tier)
await ProjectionGPU.initialize(projector, capabilityDetector, {
  gpuDevice: existingDevice  // Optional: reuse existing WebGPU device
});

console.log(projector.mode); // 'webgpu', 'webgl2', or 'none'

// Use with Scene3D
const scene = Scene3D.createScene({ width: 800, height: 600 });
scene.gpuProjector = projector;

// Async render with GPU acceleration
const polygons = await Scene3D.renderGPU(scene);

// Clean up
ProjectionGPU.dispose(projector);

WebGPU Compute Shader

When WebGPU is available, vertex projection runs as a compute shader:

Workgroup size: 64 threads
Dispatch: ceil(vertexCount / 64) workgroups
Bindings:
- Binding 0: Uniform buffer (4x4 viewProj matrix + halfSize + vertex count)
- Binding 1: Input vertices (read-only storage)
- Binding 2: Output projected vertices (storage)

Each thread transforms one vertex: multiply by view-projection matrix, apply perspective division, and convert to screen coordinates.

WebGL2 Transform Feedback

When WebGPU is unavailable, WebGL2 provides GPU acceleration through the vertex shader stage:

Uses RASTERIZER_DISCARD to skip rasterization
Captures transformed vertices via transform feedback
Single v_projected varying output per vertex

Double-Buffering

During GPU rendering, the system shows the previous frame’s polygons while the new GPU computation runs asynchronously. This prevents frame stutters when the GPU pipeline has latency.

// Scene3D.renderGPU() implementation:
// 1. Start async GPU projection
// 2. Return last frame's polygons immediately
// 3. Update polygons when GPU completes

Resource Management

GPU resources must be cleaned up to prevent memory leaks:

// Dispose GPU resources
ProjectionGPU.dispose(projector);

The PinePaper bridge handles cleanup automatically in ThreeD.dispose().

Floating-Point Precision Across Devices

3D projection involves matrix multiplication, trigonometric functions, and perspective division — all floating-point operations. Results can vary subtly between devices.

What is deterministic

JavaScript arithmetic (+, -, *, /) on 64-bit IEEE 754 doubles produces identical results on every device, browser, and OS. Basic vertex positions, matrix multiplies on the CPU path, and color calculations are bit-exact everywhere.

What can vary

Source of Variance	Precision	Cause	Visual Impact
Transcendental functions (`Math.sin`, `Math.cos`, `Math.atan2`)	Last 1-2 bits of float64	IEEE 754 does not mandate implementations for these; V8, SpiderMonkey, and JavaScriptCore may differ	Sub-pixel (imperceptible)
GPU float32 arithmetic	~7 decimal digits vs CPU’s ~15	WebGPU/WebGL2 shaders use 32-bit floats; reduced precision vs CPU doubles	Sub-pixel (< 0.01px)
GPU vendor differences	Last bits of float32	NVIDIA, AMD, Apple Silicon, and Intel GPUs may round differently for the same shader operations	Sub-pixel (imperceptible)
FMA (Fused Multiply-Add)	Last bits	Some CPUs (ARM/Apple Silicon) fuse `a * b + c` into one operation with single rounding; x86 may use two separate roundings	Sub-pixel (imperceptible)
Depth sort edge cases	Binary (face order)	Two co-planar faces with nearly identical depth values may sort differently across devices	Possible z-fighting flicker on co-planar faces

Practical implications

Simple scenes (< 150 faces): Always use the CPU path — deterministic across devices for basic arithmetic, sub-pixel variance only from trig functions in camera/lighting
Complex scenes on GPU: Sub-pixel projection differences (< 0.01px) between GPU vendors — visually identical
Depth sorting: The only potentially visible cross-device difference is z-fighting on co-planar faces, where two faces at nearly the same depth may render in different order on different devices. Avoid placing faces at identical Z positions to prevent this.

What we do NOT guarantee

Pixel-perfect cross-device reproducibility for GPU-rendered scenes. The CPU path is near-deterministic (barring trig edge cases), but the GPU path trades strict reproducibility for performance. For reproducible output (e.g., regression testing), force CPU rendering by not initializing the GPU projector.

To avoid each GPU subsystem (filters, 3D projector, future renderers) requesting its own adapter, ThreeD.initGPU accepts a pre-acquired device via options.gpuDevice. If no device is passed, the WebGPU code path is skipped and the projector falls back to WebGL2 / CPU (no separate adapter is requested here, by design).

// Example: share the filter pipeline's device with the 3D projector
const device = app._gpuFilterPipeline?.device; // when WebGPU is active
await threeD.initGPU(capabilityDetector, { gpuDevice: device });