diff --git a/cmd/sdrd/pipeline_runtime.go b/cmd/sdrd/pipeline_runtime.go index 8b182e2..5b37088 100644 --- a/cmd/sdrd/pipeline_runtime.go +++ b/cmd/sdrd/pipeline_runtime.go @@ -508,8 +508,16 @@ func (rt *dspRuntime) captureSpectrum(srcMgr *sourceManager, rec *recorder.Manag } } if rt.iqEnabled { + // IQBalance must NOT modify allIQ in-place: allIQ goes to the extraction + // pipeline and any in-place modification creates a phase/amplitude + // discontinuity at the survIQ boundary (len-FFTSize) that the polyphase + // extractor then sees as paired click artifacts in the FM discriminator. + detailIsSurv := sameIQBuffer(detailIQ, survIQ) + survIQ = append([]complex64(nil), survIQ...) dsp.IQBalance(survIQ) - if !sameIQBuffer(detailIQ, survIQ) { + if detailIsSurv { + detailIQ = survIQ + } else { detailIQ = append([]complex64(nil), detailIQ...) dsp.IQBalance(detailIQ) } diff --git a/docs/audio-click-debug-notes-2026-03-24.md b/docs/audio-click-debug-notes-2026-03-24.md index f4ab63f..44a12ba 100644 --- a/docs/audio-click-debug-notes-2026-03-24.md +++ b/docs/audio-click-debug-notes-2026-03-24.md @@ -10,9 +10,14 @@ Goal: preserve the reasoning, experiments, false leads, and current best underst ## High-level outcome so far -We do **not** yet have the final root cause. +**SOLVED** — the persistent audio clicking issue is now resolved. -But we now know substantially more about what the clicks are **not**, and we identified at least one real bug plus several strong behavioral constraints in the pipeline. +Final result: +- live listening test confirmed the clicks are gone +- the final fix set consists of three independent root-cause fixes plus two secondary fixes +- the CUDA DLL did **not** need a rebuild for the final fix + +This document now serves as the investigation log plus final resolution record. --- @@ -1001,13 +1006,72 @@ This was created specifically so the same reviewer payload can be consumed by to --- -## Meta note +## Final resolution — 2026-03-25 + +Status: **SOLVED** + +The final fix set that resolved the audible clicks consisted of **three root-cause fixes** and **two secondary fixes**: + +### Root causes fixed + +1. **IQBalance in-place corruption of shared `allIQ` tail** + - File: `cmd/sdrd/pipeline_runtime.go` + - The surveillance slice (`survIQ`) was an alias of the tail of `allIQ`. + - `dsp.IQBalance(survIQ)` therefore modified the shared `allIQ` buffer in-place. + - The same `allIQ` buffer was then passed into the streaming extractor, creating a discontinuity where the IQ-balanced tail met unbalanced samples. + - Fix: copy `survIQ` before applying IQBalance so extraction sees an unmodified `allIQ` buffer. + +2. **`StreamingConfigHash` forced full extractor state reset every frame** + - File: `internal/demod/gpudemod/streaming_types.go` + - Floating-point jitter in smoothed center frequency caused `offsetHz` / `bandwidth` hash churn. + - That reset extractor history, NCO phase, and decimation phase every frame. + - Fix: hash only structural parameters (`signalID`, `outRate`, `numTaps`, `sampleRate`). + +3. **Non-WFM exact-decimation failure killed the entire streaming batch** + - File: `cmd/sdrd/streaming_refactor.go` + - Hardcoded `200000` output rate was not an exact divisor of `4096000`, so one non-WFM signal could reject the whole batch and silently force fallback to legacy extraction. + - Fix: use nearest exact integer-divisor output rate and keep fallthrough logging visible. + +### Secondary issues fixed + +1. **FM discriminator block-boundary gap** + - File: `internal/recorder/streamer.go` + - The cross-boundary phase step between consecutive IQ blocks was missing. + - Fix: carry the last IQ sample into the next discriminator block. + +2. **Missing 15 kHz lowpass on WFM mono/plain paths** + - File: `internal/recorder/streamer.go` + - Mono fallback / plain WFM paths sent raw discriminator output (pilot/subcarrier/RDS energy) directly into the resampler. + - Fix: add a stateful 15 kHz LPF before resampling on those paths. + +### Final verification summary -This investigation already disproved several plausible explanations. That is progress. +- Before major fixes: + - persistent loud clicking on all signals/modes + - `intra_click_rate` about `110/sec` + - extractor/audio boundary telemetry showed large discontinuities +- After config-hash fix: + - hard clicks disappeared + - large discontinuities dropped sharply + - fine click noise still remained +- After the final `IQBalance` aliasing fix: + - operator listening test confirmed clicks were eliminated + +### Files involved in the final fix set + +- `cmd/sdrd/helpers.go` +- `cmd/sdrd/streaming_refactor.go` +- `cmd/sdrd/pipeline_runtime.go` +- `internal/demod/gpudemod/streaming_types.go` +- `internal/demod/gpudemod/stream_state.go` +- `internal/recorder/streamer.go` + +### Important architectural note + +The CUDA streaming polyphase kernel itself was **not** the root cause. +The actual bugs were in the Go-side orchestration around path selection, extractor reset semantics, and mutation of the shared IQ buffer before extraction. + +## Meta note -The most important thing not to forget is: -- the overlap prepend bug was real, but not sufficient -- the click is already present in demod audio -- whole-process CPU saturation is not the main explanation -- excessive debug instrumentation can itself create misleading secondary problems -- the 2026-03-25 extractor telemetry strongly suggests the remaining root cause is upstream of the final trim stage +This investigation disproved several plausible explanations before landing the final answer. +That mattered, because the eventual root cause was not a single simple DSP bug but a combination of path fallthrough, state-reset churn, and shared-buffer mutation. diff --git a/docs/telemetry-debug-runbook.md b/docs/telemetry-debug-runbook.md index 4363b5e..4b14c87 100644 --- a/docs/telemetry-debug-runbook.md +++ b/docs/telemetry-debug-runbook.md @@ -53,3 +53,48 @@ Persisted JSONL files rotate in `persist_dir` (default: `debug/telemetry`). - `GET /api/debug/telemetry/history?since=&prefix=stage.` - `GET /api/debug/telemetry/history?since=&prefix=streamer.` 6. If IQ boundary issues persist, temporarily set `heavy_enabled=true` (keep sampling coarse with `heavy_sample_every` > 1), rerun, then inspect `iq.*` metrics and `audio.*` anomalies by `signal_id`/`session_id`. + +## 2026-03-25 audio click incident — final resolved summary + +Status: **SOLVED** + +The March 2026 live-audio click investigation ultimately converged on a combination of three real root causes plus two secondary fixes: + +### Root causes + +1. **Shared `allIQ` corruption by `IQBalance` aliasing** + - `cmd/sdrd/pipeline_runtime.go` + - `survIQ` aliased the tail of `allIQ` + - `dsp.IQBalance(survIQ)` modified `allIQ` in-place + - extractor then saw a corrupted boundary inside the shared buffer + - final fix: copy `survIQ` before `IQBalance` + +2. **Per-frame extractor reset due to `StreamingConfigHash` jitter** + - `internal/demod/gpudemod/streaming_types.go` + - smoothed tuning values changed slightly every frame + - offset/bandwidth in the hash caused repeated state resets + - final fix: hash only structural parameters + +3. **Streaming path batch rejection for non-WFM exact-decimation mismatch** + - `cmd/sdrd/streaming_refactor.go` + - one non-WFM signal could reject the whole batch and silently force fallback to the legacy path + - final fix: choose nearest exact integer-divisor output rate and keep fallback logging visible + +### Secondary fixes + +- FM discriminator cross-block carry in `internal/recorder/streamer.go` +- WFM mono/plain-path 15 kHz audio lowpass in `internal/recorder/streamer.go` + +### Verification notes + +- major discontinuities dropped sharply after the config-hash fix +- remaining fine clicks were eliminated only after the `IQBalance` aliasing fix in `pipeline_runtime.go` +- final confirmation was by operator listening test, backed by prior telemetry and WAV analysis + +### Practical lesson + +When the same captured `allIQ` buffer feeds both: +- surveillance/detail analysis +- and extraction/streaming + +then surveillance-side DSP helpers must not mutate a shared sub-slice in-place unless that mutation is intentionally part of the extraction contract.