GPU-Accelerated 3D Projection
The 3D library includes optional GPU acceleration for vertex projection, with automatic fallback to CPU when GPU is unavailable or the scene is too simple.
Three-Tier Fallback
| Tier | Technology | When Used |
|---|---|---|
| 1 | WebGPU compute shaders | Chrome 113+, Edge 113+ |
| 2 | WebGL2 transform feedback | All modern browsers |
| 3 | CPU | Always available (default) |
The system attempts each tier in order during initialization and uses the best available.
Face Threshold
GPU_FACE_THRESHOLD = 150
Scenes with fewer than 150 faces always use CPU projection, because GPU buffer transfer overhead exceeds the computation savings for small scenes. Above 150 faces, GPU projection provides significant speedups.
Usage
GPU acceleration is automatic when available. The PinePaper bridge initializes it during setup:
// Check GPU mode
const stats = app.threeD.getStats();
console.log(stats.gpuMode); // 'webgpu', 'webgl2', or 'none'
Standalone Usage
import * as ProjectionGPU from './js/3d/gpu/ProjectionGPU.js';
import * as Scene3D from './js/3d/pipeline/Scene3D.js';
// Create GPU projector
const projector = ProjectionGPU.create();
// Initialize (detects best available tier)
await ProjectionGPU.initialize(projector, capabilityDetector, {
gpuDevice: existingDevice // Optional: reuse existing WebGPU device
});
console.log(projector.mode); // 'webgpu', 'webgl2', or 'none'
// Use with Scene3D
const scene = Scene3D.createScene({ width: 800, height: 600 });
scene.gpuProjector = projector;
// Async render with GPU acceleration
const polygons = await Scene3D.renderGPU(scene);
// Clean up
ProjectionGPU.dispose(projector);
WebGPU Compute Shader
When WebGPU is available, vertex projection runs as a compute shader:
- Workgroup size: 64 threads
- Dispatch:
ceil(vertexCount / 64)workgroups - Bindings:
- Binding 0: Uniform buffer (4x4 viewProj matrix + halfSize + vertex count)
- Binding 1: Input vertices (read-only storage)
- Binding 2: Output projected vertices (storage)
Each thread transforms one vertex: multiply by view-projection matrix, apply perspective division, and convert to screen coordinates.
WebGL2 Transform Feedback
When WebGPU is unavailable, WebGL2 provides GPU acceleration through the vertex shader stage:
- Uses
RASTERIZER_DISCARDto skip rasterization - Captures transformed vertices via transform feedback
- Single
v_projectedvarying output per vertex
Double-Buffering
During GPU rendering, the system shows the previous frame’s polygons while the new GPU computation runs asynchronously. This prevents frame stutters when the GPU pipeline has latency.
// Scene3D.renderGPU() implementation:
// 1. Start async GPU projection
// 2. Return last frame's polygons immediately
// 3. Update polygons when GPU completes
Resource Management
GPU resources must be cleaned up to prevent memory leaks:
// Dispose GPU resources
ProjectionGPU.dispose(projector);
The PinePaper bridge handles cleanup automatically in ThreeD.dispose().
Floating-Point Precision Across Devices
3D projection involves matrix multiplication, trigonometric functions, and perspective division — all floating-point operations. Results can vary subtly between devices.
What is deterministic
JavaScript arithmetic (+, -, *, /) on 64-bit IEEE 754 doubles produces identical results on every device, browser, and OS. Basic vertex positions, matrix multiplies on the CPU path, and color calculations are bit-exact everywhere.
What can vary
| Source of Variance | Precision | Cause | Visual Impact |
|---|---|---|---|
Transcendental functions (Math.sin, Math.cos, Math.atan2) |
Last 1-2 bits of float64 | IEEE 754 does not mandate implementations for these; V8, SpiderMonkey, and JavaScriptCore may differ | Sub-pixel (imperceptible) |
| GPU float32 arithmetic | ~7 decimal digits vs CPU’s ~15 | WebGPU/WebGL2 shaders use 32-bit floats; reduced precision vs CPU doubles | Sub-pixel (< 0.01px) |
| GPU vendor differences | Last bits of float32 | NVIDIA, AMD, Apple Silicon, and Intel GPUs may round differently for the same shader operations | Sub-pixel (imperceptible) |
| FMA (Fused Multiply-Add) | Last bits | Some CPUs (ARM/Apple Silicon) fuse a * b + c into one operation with single rounding; x86 may use two separate roundings |
Sub-pixel (imperceptible) |
| Depth sort edge cases | Binary (face order) | Two co-planar faces with nearly identical depth values may sort differently across devices | Possible z-fighting flicker on co-planar faces |
Practical implications
- Simple scenes (< 150 faces): Always use the CPU path — deterministic across devices for basic arithmetic, sub-pixel variance only from trig functions in camera/lighting
- Complex scenes on GPU: Sub-pixel projection differences (< 0.01px) between GPU vendors — visually identical
- Depth sorting: The only potentially visible cross-device difference is z-fighting on co-planar faces, where two faces at nearly the same depth may render in different order on different devices. Avoid placing faces at identical Z positions to prevent this.
What we do NOT guarantee
Pixel-perfect cross-device reproducibility for GPU-rendered scenes. The CPU path is near-deterministic (barring trig edge cases), but the GPU path trades strict reproducibility for performance. For reproducible output (e.g., regression testing), force CPU rendering by not initializing the GPU projector.
Integration with RenderLayerManager
When used within PinePaper, the GPU projector reuses the existing WebGPU device from RenderLayerManager rather than requesting a separate GPU adapter, reducing initialization overhead.
// PinePaper bridge initialization
await threeD.initGPU(capabilityDetector, {
gpuDevice: renderLayerManager.gpuDevice
});