The decoder took ~70s for a 20-minute recording. Profiling revealed the
bottleneck was not the 6400-candidate cycle-offset search, but the
cepstrum filter's naive O(N²) DCT calling math.Cos() in the inner loop:
55458 STFT frames × 2 passes × 256² × math.Cos() = 7.27 billion calls
At ~20ns per call: ~145 seconds (dominated total runtime)
Fixes:
1. Precomputed cosine table: compute 256×256 = 65536 cosine values
once, then use table lookups in the inner loop. Eliminates all
math.Cos() calls from the per-frame processing.
2. Parallel cycle-offset search: 5 goroutines (one per rep offset),
each searching 1280 cycle offsets independently. The rep offsets
are fully independent — no shared state, no synchronization needed
until the final result merge.
3. Precomputed center-frame lists: instead of checking f%timeRep for
every frame in every candidate test, precompute which frames are
center frames for each rep offset. Eliminates per-frame branching.
4. Float64 PN chip arrays: convert int8 PN chips to float64 once at
startup. Eliminates int8→float64 conversion in the hot inner loop
(204 conversions × 11000 frames × 6400 candidates = 14.4 billion
avoided conversions).
Performance (20-minute recording, 55458 STFT frames):
Before: 70s (math.Cos dominated)
After: 11.5s (6x faster)
Unit test (round-trip): 20s → 1.4s (14x faster)
Note: attempted coarse/fine search (testing every 10th group offset,
then refining) but abandoned — the chi-squared metric peak is too
narrow and the coarse step missed the true peak, causing false
positives. The full 6400-candidate brute-force search is kept for
correctness; the speedup comes entirely from eliminating per-operation
overhead, not from reducing the number of operations.
The StereoLimiter previously used instantaneous attack — gain was reduced
in the same sample that exceeded the ceiling. While this guarantees zero
overshoot, it suppresses transient peaks that the human auditory system
cannot resolve anyway, reducing perceived loudness and causing audible
gain pumping on percussive material.
Changed to a 2ms exponential attack based on psychoacoustic burst masking
research (O. Bonello, "Multiband audio processing and its influence on
the coverage area of FM stereo transmission", JAES 2007):
- The ear-brain system needs ~50ms to resolve distortion in a signal.
For bursts shorter than 5ms, masking thresholds increase by up to
36 dB compared to steady-state (burst masking).
- With 2ms attack, initial transient peaks pass through the limiter
unattenuated and are caught by the downstream HardClip. The clip
artifacts last <5ms (63% reduction in 2ms, 95% in 6ms), falling
within the burst masking window.
- The limiter no longer reacts to micro-transients that were already
inaudible, raising average modulation level without increasing
perceived distortion.
Signal chain interaction:
Audio → Drive → StereoLimiter (2ms attack, 150ms release)
→ HardClip (safety net, catches the <5ms transient peaks)
→ Cleanup LPF → HardClip
→ Stereo Encode → Composite Clipper
The HardClip after the limiter remains as the compliance safety net.
Peak modulation is guaranteed by the clip, not by the limiter. The
limiter's job is average level management; the clipper handles peaks.
Release time reduced from 200ms to 150ms for slightly faster recovery
on sustained passages without audible pumping.
Add an STFT watermark path inspired by Kirovski & Malvar, including the frequency-domain embedder/decoder, FFT support, and round-trip coverage. Wire the generator and CLI tools to use the new analysis/synthesis flow for watermark experiments on the watermark-rework branch.
Copy RTP payload bytes when parsing packets so buffered packets do not retain slices into the shared UDP read buffer.
Previously ParseRTPPacket() kept Payload as a subslice of the caller-provided buffer. Once the next ReadFromUDP() reused that buffer, any packet still waiting inside the jitter buffer would silently see corrupted payload data. In practice this could surface as clicks, artifacts, or silent audio on reordered AES67/RTP traffic without obvious decoder errors.
Take an owned copy of the payload on parse so jitter-buffered packets remain stable even when the source read buffer is reused.
Prevent the fallback FFmpeg decoder from deadlocking on longer-running streams.
The decoder previously drained stderr with io.ReadAll() before reading PCM from stdout. Once FFmpeg filled the stdout pipe buffer, the process blocked on further stdout writes, never closed stderr, and io.ReadAll(stderr) never returned. That stalled the decoder before readPCM() could even start.
Drain stderr concurrently in its own goroutine so stdin, stdout, and stderr can all make progress in parallel. This matches the expected pipe handling model for long-running FFmpeg processes and keeps the fallback decoder usable for real streams.
Improve reliability in two critical paths:
- make config saves atomic by writing to a temp file in the target directory, syncing it, and renaming it into place so crashes cannot leave a half-written JSON config behind
- serialize runtime state transitions with a dedicated mutex so concurrent state updates from run() and writerLoop() cannot double-record transitions or increment counters twice
Also remove an unreachable nil-check after cloneFrame() to keep the engine loop honest and easier to reason about.
Address a set of production-facing edge cases discovered during bug hunting.
Included fixes:
- make FrameQueue close handling race-safe by replacing the TOCTOU close check with a dedicated close signal channel
- relax tone frequency validation when tone amplitude is zero, and default tone amplitude to 0 to avoid unintended test-tone output
- validate PI codes consistently whenever provided, and require a PI when RDS is enabled
- harden Icecast reconnect backoff against duration overflow
- prevent duplicate hard-reload goroutines from rapid repeated ingest-save requests
- clamp BS.412 power accumulation against negative float drift before sqrt to avoid NaN gain propagation
These changes focus on shutdown safety, config correctness, reconnect robustness, and long-running DSP stability.
Wire tone frequency, tone amplitude, and audio gain through the live control path so the UI's live-update behavior matches the engine.
This changes the generator live params to carry tone and gain values, propagates them through Engine.UpdateConfig and txBridge.UpdateConfig, and extends the control-plane patch types accordingly.
It also refines the control API behavior:
- avoid holding the server config mutex across tx.UpdateConfig()
- report live=true only when a request contains at least one genuinely live-applicable field
Together these fixes align the UI semantics with the actual runtime behavior and remove a lock hazard in the config update path.
rejectBody() returns true when the request body is acceptable and false when a body must be rejected. The TX and fault-reset handlers treated the return value the wrong way around and returned early on valid empty POST requests. This prevented actions like /tx/stop from running in the normal no-body case.
Update the handlers to only abort when rejectBody() reports an actual rejection, so empty POST control actions proceed as intended.