Root cause: CUDA FreqShift kernel casts accumulated phase to float32
via sincosf((float)phase). After ~385k samples the phase reaches
~4M radians where float32 loses 0.03-0.1 rad precision, producing
sin/cos errors up to 0.1 at frame boundaries → audible clicks.
Fix: reduce phase to [-π,π) before float cast using
phase = phase - rint(phase/2π)*2π (fast GPU intrinsic, no perf impact).
Same fix applied to SSB product kernel.
Additional fixes in this commit:
- Go-side phase normalization (math.Remainder) after each frame update
in both GPU and CPU extraction paths. Prevents float64 drift over
hours of continuous operation.
- Overlap trim: floor→ceil for non-divisible decimation factors.
512/20=25 (floor) trimmed only 500 of 512 overlap samples → 12 leaked.
Now (512+19)/20=26 trims 520, cleanly removing all overlap.
Affects NFM (decim=20); WFM (decim=8, 512%8=0) was already clean.
- Click detector rewritten: second-derivative transient detector
replaces first-derivative delta scanner. Old detector flagged
hundreds of false positives per frame on normal FM audio.
New detector computes |2b-a-c| (discrete acceleration) which is
near-zero for smooth signals and large only for true impulse
transients. Threshold 0.15.
Files changed:
native/exports.cu - phase reduction in freq_shift + ssb_product kernels
kernels.cu - same (Linux CGO build)
cmd/sdrd/helpers.go - phase normalize + ceil trim (GPU + CPU paths)
recorder/streamer.go - transient detector + prevAudioL field
Requires DLL rebuild: .\build-gpudemod-dll.ps1