Root cause: CUDA FreqShift kernel casts accumulated phase to float32
via sincosf((float)phase). After ~385k samples the phase reaches
~4M radians where float32 loses 0.03-0.1 rad precision, producing
sin/cos errors up to 0.1 at frame boundaries → audible clicks.
Fix: reduce phase to [-π,π) before float cast using
phase = phase - rint(phase/2π)*2π (fast GPU intrinsic, no perf impact).
Same fix applied to SSB product kernel.
Additional fixes in this commit:
- Go-side phase normalization (math.Remainder) after each frame update
in both GPU and CPU extraction paths. Prevents float64 drift over
hours of continuous operation.
- Overlap trim: floor→ceil for non-divisible decimation factors.
512/20=25 (floor) trimmed only 500 of 512 overlap samples → 12 leaked.
Now (512+19)/20=26 trims 520, cleanly removing all overlap.
Affects NFM (decim=20); WFM (decim=8, 512%8=0) was already clean.
- Click detector rewritten: second-derivative transient detector
replaces first-derivative delta scanner. Old detector flagged
hundreds of false positives per frame on normal FM audio.
New detector computes |2b-a-c| (discrete acceleration) which is
near-zero for smooth signals and large only for true impulse
transients. Threshold 0.15.
Files changed:
native/exports.cu - phase reduction in freq_shift + ssb_product kernels
kernels.cu - same (Linux CGO build)
cmd/sdrd/helpers.go - phase normalize + ceil trim (GPU + CPU paths)
recorder/streamer.go - transient detector + prevAudioL field
Requires DLL rebuild: .\build-gpudemod-dll.ps1
Three root causes identified for 4-5 clicks/sec in live audio streaming:
1. Buffer bloat in captureSpectrum (PRIMARY)
allIQ reads drained the entire SDR buffer (196k-1M+ samples), causing
extraction times to vary wildly. Large frames took >150ms to process,
starving the next frame and creating a positive feedback loop.
feed_gap warnings (152-218ms) directly correlated with audible clicks.
Fix: cap allIQ to 2 frame intervals (~682k samples) after reading.
Full buffer is still drained and ingested into the ring buffer;
only the extraction/streaming path is capped.
2. Stateless decimation in processSnippet
dsp.Decimate() restarted at index 0 every frame. When snippet length
was not divisible by the decimation factor (e.g. 512kHz/3=170.6kHz,
snippet % 3 != 0), a sample timing discontinuity occurred at each
frame boundary.
Fix: new dsp.DecimateStateful() preserves the decimation phase index
across calls. Session field preDemodDecimPhase added to streamSession
with proper snapshot/restore for segment splits.
3. Resampler bandwidth too narrow (10.8kHz instead of 15kHz)
Polyphase resampler prototype cutoff fc=0.45/max(L,M) limited audio
to 10.8kHz, cutting off FM stereo content above that.
Fix: increase fc to 0.90/max(L,M), passing up to 22.8kHz.
Kaiser window (β=6) maintains -60dB sidelobe suppression.
Files changed:
cmd/sdrd/pipeline_runtime.go - allIQ cap after buffer read
internal/dsp/fir.go - DecimateStateful()
internal/dsp/decimate_test.go - 5 tests for stateful decimation
internal/dsp/resample.go - fc 0.45 → 0.90
internal/recorder/streamer.go - preDemodDecimPhase field + usage