dreamstack/engine/ds-stream
enzotar 3c14beea50 feat(engine): v0.14-v0.16 releases
ds-physics 0.16.0 (81 tests):
- v0.14: proximity queries, physics regions
- v0.15: event hooks, transform hierarchy, timeline
- v0.16: deterministic seed/checksum, collision manifolds, distance+hinge constraints

ds-stream 0.16.0 (143 tests):
- v0.14: FrameCompressor (RLE), MultiClientSync, BandwidthThrottle, FrameType::Compressed
- v0.15: AdaptiveBitrate, MetricsSnapshot, FramePipeline
- v0.16: FrameEncryptor (XOR), StreamMigration, FrameDedup, PriorityQueue

ds-stream-wasm 0.16.0 (54 tests):
- v0.14: RLE compress/decompress, sync drift, bandwidth limiting
- v0.15: adaptive quality, metrics snapshot, frame transforms
- v0.16: encrypt/decrypt frames, migration handoff, frame dedup

ds-screencast 0.16.0:
- v0.14: --roi, /clients, --compress, --migrate-on-crash
- v0.15: --adaptive-bitrate, /metrics, --viewport-transform, --cdn-push
- v0.16: /tabs, --encrypt-key, --watermark, --graceful-shutdown
2026-03-10 22:47:44 -07:00
..
src feat(engine): v0.14-v0.16 releases 2026-03-10 22:47:44 -07:00
Cargo.toml feat(engine): v0.14-v0.16 releases 2026-03-10 22:47:44 -07:00
CHANGELOG.md feat(engine): v0.14-v0.16 releases 2026-03-10 22:47:44 -07:00
README.md feat: physics language integration — scene container with Rapier2D WASM 2026-02-25 10:58:43 -08:00

ds-stream — Universal Bitstream Streaming

Any input bitstream → any output bitstream. Neural nets will generate the pixels. The receiver just renders bytes.

What It Does

┌──────────┐    WebSocket    ┌─────────┐    WebSocket    ┌──────────┐
│  Source   │ ───frames────► │  Relay  │ ───frames────► │ Receiver  │
│ (renders) │ ◄───inputs──── │ (:9100) │ ◄───inputs──── │ (~300 LOC)│
└──────────┘                 └─────────┘                 └──────────┘

The source runs a DreamStack app (signal graph + springs + renderer), captures output as bytes, and streams it. The receiver is a thin client that renders whatever bytes arrive — no framework, no runtime.

Binary Protocol

Every message = 16-byte header + payload.

┌──────┬───────┬──────┬───────────┬───────┬────────┬────────────┐
│ type │ flags │ seq  │ timestamp │ width │ height │ payload_len│
│  u8  │  u8   │ u16  │    u32    │  u16  │  u16   │    u32     │
└──────┴───────┴──────┴───────────┴───────┴────────┴────────────┘

Frame Types (source → receiver)

Code Type Description
0x01 Pixels Raw RGBA framebuffer
0x02 CompressedPixels PNG/WebP (future)
0x03 DeltaPixels XOR delta + RLE
0x10 AudioPcm Float32 PCM samples
0x11 AudioCompressed Opus (future)
0x20 Haptic Vibration command
0x30 SignalSync Full signal state JSON
0x31 SignalDiff Changed signals only
0x40 NeuralFrame Neural-generated pixels
0x41 NeuralAudio Neural speech/music
0x42 NeuralActuator Learned motor control
0x43 NeuralLatent Latent space scene
0xFE Ping Keep-alive
0xFF End Stream termination

Input Types (receiver → source)

Code Type Payload
0x01 Pointer x(u16) y(u16) buttons(u8)
0x02 PointerDown same
0x03 PointerUp same
0x10 KeyDown keycode(u16) modifiers(u8)
0x11 KeyUp same
0x20 Touch id(u8) x(u16) y(u16)
0x30 GamepadAxis axis(u8) value(f32)
0x40 Midi status(u8) d1(u8) d2(u8)
0x50 Scroll dx(i16) dy(i16)
0x60 Resize width(u16) height(u16)
0x90 BciInput (future)

Flags

Bit Meaning
0 FLAG_INPUT — message is input (receiver→source)
1 FLAG_KEYFRAME — full state, no delta
2 FLAG_COMPRESSED — payload is compressed

Streaming Modes

Mode Frame Size Bandwidth @30fps Use Case
Pixel 938 KB ~28 MB/s Full fidelity, any renderer
Delta 50-300 KB ~1-9 MB/s Low-motion scenes
Signal ~80 B ~2 KB/s DreamStack-native, receiver renders
Neural 938 KB ~28 MB/s Model-generated output

Quick Start

# Start the relay server
cargo run -p ds-stream

# Serve the examples
python3 -m http.server 8080 --directory examples

# Open in browser
# Tab 1: http://localhost:8080/stream-source.html
# Tab 2: http://localhost:8080/stream-receiver.html

Click mode buttons on the source to switch between Pixel / Delta / Signal / Neural. Toggle audio. Open multiple receiver tabs.

Crate Structure

engine/ds-stream/
├── Cargo.toml
└── src/
    ├── lib.rs          # crate root
    ├── protocol.rs     # types, header, events (412 lines)
    ├── codec.rs        # encode/decode, delta, builders (247 lines)
    ├── relay.rs        # WebSocket relay server (255 lines)
    └── main.rs         # CLI entry point

Tests

cargo test -p ds-stream    # 17 tests

Covers: header roundtrip, event roundtrip, delta compression, frame type enums, flag checks, partial buffer handling, message size calculation.


Next Steps

Near-term

  1. WebRTC transport — Replace WebSocket with WebRTC DataChannel for sub-10ms latency and NAT traversal. The protocol is transport-agnostic; only the relay changes.

  2. Opus audio compression — Replace raw PCM (AudioPcm) with Opus encoding (AudioCompressed). 28 KB/s → ~6 KB/s for voice, near-transparent quality.

  3. Adaptive quality — Source monitors receiver lag (via ACK frames) and auto-downgrades: full pixels → delta → signal diff. Graceful degradation on slow networks.

  4. WASM codec — Compile protocol.rs + codec.rs to WASM so source and receiver share the exact same binary codec. No JS reimplementation drift.

Medium-term

  1. Persistent signal state — Source sends SignalSync keyframes periodically. New receivers joining mid-stream get full state immediately, then switch to diffs.

  2. Touch/gamepad input — Wire up Touch, GamepadAxis, GamepadButton input types on receiver. Enable mobile and controller interaction.

  3. Frame compression — PNG/WebP encoding for pixel frames (CompressedPixels). Canvas toBlob('image/webp', 0.8) gives ~50-100 KB/frame vs 938 KB raw.

  4. Haptic output — Receiver calls navigator.vibrate() on Haptic frames. Spring impact → buzz.

Long-term (neural path)

  1. Train a pixel model — Collect (signal_state, screenshot) pairs from the springs demo. Train a small CNN/NeRF to predict framebuffer from signal state. Replace neuralRender() with real inference.

  2. Latent space streaming — Instead of pixels, stream a compressed latent representation (NeuralLatent). Receiver runs a lightweight decoder model. ~1 KB/frame for HD content.

  3. Voice input — Receiver captures microphone audio, streams as VoiceInput. Source interprets via speech-to-intent model to drive signals.

  4. Multi-source composition — Multiple sources stream to the same relay. Receiver composites layers. Each source owns a region or z-layer of the output.