Wideband autonomous SDR analysis engine forked from sdr-visual-suite
Nevar pievienot vairāk kā 25 tēmas Tēmai ir jāsākas ar burtu vai ciparu, tā var saturēt domu zīmes ('-') un var būt līdz 35 simboliem gara.

2.2KB

GPU Streaming Refactor Plan (2026-03-25)

Goal

Replace the current overlap+trim GPU extractor model with a true stateful per-signal streaming architecture, and build a corrected CPU oracle/reference path for validation.

Non-negotiables

  • No production start-index-only patch.
  • No production overlap-prepend + trim continuity model.
  • Exact integer decimation only in the new streaming production path.
  • Persistent per-signal state must include NCO phase, FIR history, and decimator phase/residue.
  • GPU validation must compare against a corrected CPU oracle, not the legacy CPU fallback.

Work order

  1. Introduce explicit stateful streaming types in gpudemod.
  2. Add a clean CPU oracle implementation and monolithic-vs-chunked tests.
  3. Add per-signal state ownership in batch runner.
  4. Implement new streaming extractor semantics in Go using NEW IQ samples only.
  5. Replace legacy GPU-path assumptions (rounding decimation, overlap-prepend, trim-defined validity) in the new path.
  6. Add production telemetry that proves state continuity (phase_count, history_len, n_out, reference error).
  7. Keep legacy path isolated only for temporary comparison if needed.

Initial files in scope

  • internal/demod/gpudemod/batch.go
  • internal/demod/gpudemod/batch_runner.go
  • internal/demod/gpudemod/batch_runner_windows.go
  • internal/demod/gpudemod/kernels.cu
  • internal/demod/gpudemod/native/exports.cu
  • cmd/sdrd/helpers.go

Immediate implementation strategy

Phase 1

  • Create explicit streaming state structs in Go.
  • Add CPU oracle/reference path with exact semantics and tests.
  • Introduce exact integer-decimation checks.

Phase 2

  • Rework batch runner to own persistent per-signal state.
  • Add config-hash-based resets.
  • Stop modeling continuity via overlap tail in the new path.

Phase 3

  • Introduce a real streaming GPU entry path that consumes NEW shifted samples plus carried state.
  • Move to a stateful polyphase decimator model.

Validation expectations

  • CPU oracle monolithic == CPU oracle chunked within tolerance.
  • GPU streaming output == CPU oracle chunked within tolerance.
  • Former periodic block-boundary clicks gone in real-world testing.