GPU-Accelerated 3D Projection

The 3D library includes optional GPU acceleration for vertex projection, with automatic fallback to CPU when GPU is unavailable or the scene is too simple.

Three-Tier Fallback

Tier Technology When Used
1 WebGPU compute shaders Chrome 113+, Edge 113+
2 WebGL2 transform feedback All modern browsers
3 CPU Always available (default)

The system attempts each tier in order during initialization and uses the best available.

Face Threshold

GPU_FACE_THRESHOLD = 150

Scenes with fewer than 150 faces always use CPU projection, because GPU buffer transfer overhead exceeds the computation savings for small scenes. Above 150 faces, GPU projection provides significant speedups.

Usage

GPU acceleration is automatic when available. The PinePaper bridge initializes it during setup:

// Check GPU mode
const stats = app.threeD.getStats();
console.log(stats.gpuMode); // 'webgpu', 'webgl2', or 'none'

Standalone Usage

import * as ProjectionGPU from './js/3d/gpu/ProjectionGPU.js';
import * as Scene3D from './js/3d/pipeline/Scene3D.js';

// Create GPU projector
const projector = ProjectionGPU.create();

// Initialize (detects best available tier)
await ProjectionGPU.initialize(projector, capabilityDetector, {
  gpuDevice: existingDevice  // Optional: reuse existing WebGPU device
});

console.log(projector.mode); // 'webgpu', 'webgl2', or 'none'

// Use with Scene3D
const scene = Scene3D.createScene({ width: 800, height: 600 });
scene.gpuProjector = projector;

// Async render with GPU acceleration
const polygons = await Scene3D.renderGPU(scene);

// Clean up
ProjectionGPU.dispose(projector);

WebGPU Compute Shader

When WebGPU is available, vertex projection runs as a compute shader:

  • Workgroup size: 64 threads
  • Dispatch: ceil(vertexCount / 64) workgroups
  • Bindings:
    • Binding 0: Uniform buffer (4x4 viewProj matrix + halfSize + vertex count)
    • Binding 1: Input vertices (read-only storage)
    • Binding 2: Output projected vertices (storage)

Each thread transforms one vertex: multiply by view-projection matrix, apply perspective division, and convert to screen coordinates.

WebGL2 Transform Feedback

When WebGPU is unavailable, WebGL2 provides GPU acceleration through the vertex shader stage:

  • Uses RASTERIZER_DISCARD to skip rasterization
  • Captures transformed vertices via transform feedback
  • Single v_projected varying output per vertex

Double-Buffering

During GPU rendering, the system shows the previous frame’s polygons while the new GPU computation runs asynchronously. This prevents frame stutters when the GPU pipeline has latency.

// Scene3D.renderGPU() implementation:
// 1. Start async GPU projection
// 2. Return last frame's polygons immediately
// 3. Update polygons when GPU completes

Resource Management

GPU resources must be cleaned up to prevent memory leaks:

// Dispose GPU resources
ProjectionGPU.dispose(projector);

The PinePaper bridge handles cleanup automatically in ThreeD.dispose().

Floating-Point Precision Across Devices

3D projection involves matrix multiplication, trigonometric functions, and perspective division — all floating-point operations. Results can vary subtly between devices.

What is deterministic

JavaScript arithmetic (+, -, *, /) on 64-bit IEEE 754 doubles produces identical results on every device, browser, and OS. Basic vertex positions, matrix multiplies on the CPU path, and color calculations are bit-exact everywhere.

What can vary

Source of Variance Precision Cause Visual Impact
Transcendental functions (Math.sin, Math.cos, Math.atan2) Last 1-2 bits of float64 IEEE 754 does not mandate implementations for these; V8, SpiderMonkey, and JavaScriptCore may differ Sub-pixel (imperceptible)
GPU float32 arithmetic ~7 decimal digits vs CPU’s ~15 WebGPU/WebGL2 shaders use 32-bit floats; reduced precision vs CPU doubles Sub-pixel (< 0.01px)
GPU vendor differences Last bits of float32 NVIDIA, AMD, Apple Silicon, and Intel GPUs may round differently for the same shader operations Sub-pixel (imperceptible)
FMA (Fused Multiply-Add) Last bits Some CPUs (ARM/Apple Silicon) fuse a * b + c into one operation with single rounding; x86 may use two separate roundings Sub-pixel (imperceptible)
Depth sort edge cases Binary (face order) Two co-planar faces with nearly identical depth values may sort differently across devices Possible z-fighting flicker on co-planar faces

Practical implications

  • Simple scenes (< 150 faces): Always use the CPU path — deterministic across devices for basic arithmetic, sub-pixel variance only from trig functions in camera/lighting
  • Complex scenes on GPU: Sub-pixel projection differences (< 0.01px) between GPU vendors — visually identical
  • Depth sorting: The only potentially visible cross-device difference is z-fighting on co-planar faces, where two faces at nearly the same depth may render in different order on different devices. Avoid placing faces at identical Z positions to prevent this.

What we do NOT guarantee

Pixel-perfect cross-device reproducibility for GPU-rendered scenes. The CPU path is near-deterministic (barring trig edge cases), but the GPU path trades strict reproducibility for performance. For reproducible output (e.g., regression testing), force CPU rendering by not initializing the GPU projector.

Integration with RenderLayerManager

When used within PinePaper, the GPU projector reuses the existing WebGPU device from RenderLayerManager rather than requesting a separate GPU adapter, reducing initialization overhead.

// PinePaper bridge initialization
await threeD.initGPU(capabilityDetector, {
  gpuDevice: renderLayerManager.gpuDevice
});