fix: 13 bugs from systematic codebase review (fix25)
feat: add production-grade FMUpsampler (not yet wired)

=== SIGNAL PATH (fixes audible/measurable on HW) ===

[CRITICAL] stereo: fix 38kHz subcarrier 1-sample phase offset
  Encode() called Sample() before capturing Phase(), so the 38kHz
  subcarrier used phase_{n+1} while pilot used phase_n — a constant
  60 deg phase error at 228kHz, degrading stereo separation to ~50%.
  Now captures phase BEFORE tick; stores lastPhase for coherent
  RDS carrier derivation.

[HIGH] rds: phase-lock 57kHz carrier to pilot via StereoEncoder
  RDS encoder had its own free-running 57kHz oscillator instead of
  deriving from 3x pilot. Two independent float64 oscillators drift
  apart over hours. Added NextSampleWithCarrier(carrier float64),
  generator now passes stereoEncoder.RDSCarrier() (sin(3*pilotPhase))
  so all subcarriers share one phase reference. NextSample() remains
  as backward-compat fallback with internal carrier.

[MEDIUM] dsp/fmmod: fix phase wrapping for negative phase
  Wrap condition only triggered on phase > pi. Negative phase from
  negative-biased composite could grow unbounded, losing float64
  precision after extended runtime. Now wraps on |phase| > pi.

[MEDIUM] dsp/oscillator: phase wrapping handles both directions
  Same pattern as fmmod — only wrapped positive overflow. Now wraps
  on phase >= 1 OR phase < 0 for defensive robustness.

=== DAUERLAUF / SBC STABILITY ===

[HIGH] offline/generator: eliminate per-chunk allocation (912KB @ 2.28MHz)
  GenerateFrame() allocated a new []IQSample every call. At Pluto
  rate (2.28MHz, 50ms chunks) that's 114k*8=912KB * 20/sec = 18MB/s
  garbage, causing GC pauses and hardware underruns on Raspi.
  Now pre-allocates frameBuf, reuses across calls. Grow-only policy,
  never shrinks. Safe because driver.Write() is blocking.

[MEDIUM] output/file: batch-write entire frame in one syscall
  Was writing 8 bytes per sample = 114k syscalls per chunk on Pluto.
  Now serializes into a reusable byte buffer, single file.Write().
  Orders of magnitude faster on Raspi SD-card I/O.

[MEDIUM] app/engine: add error backoff in TX loop
  run() tight-looped at 100% CPU on persistent driver errors
  (hardware disconnect, USB reset). Now backs off by chunkDuration
  per error using select with ctx.Done() for clean cancellation.

[MEDIUM] app/engine: replace time.Sleep shutdown with sync.WaitGroup
  Stop() used time.Sleep(2*chunkDuration) hoping run() would exit.
  Race condition if hardware Write() stalls. Now uses wg.Wait() for
  deterministic goroutine join before Flush/Stop.

=== CORRECTNESS / HYGIENE ===

[LOW] rds/normalize: filter to ASCII, prevent mid-rune truncation
  normalizePS truncated at 8 bytes, not 8 characters. UTF-8 input
  (e.g. umlauts) could split mid-rune producing corrupt RDS bitstream.
  Now replaces non-ASCII with space before truncation.

[LOW] offline/generator: increment frame sequence counter
  Sequence was hardcoded to 1 on every frame, useless for debugging.
  Now increments per GenerateFrame() call.

[LOW] control: document that config PATCH doesn't reach running engine
  Added TODO noting that POST /config updates server's copy only;
  running Engine/Generator holds its own snapshot.

[COSMETIC] dsp/preemphasis: remove dead fields y1, a1
  PreEmphasis.y1 and .a1 were never read in Process(). Removed from
  struct, constructor, and Reset(). Fixed misleading doc comment on
  PreEmphasizedSource (claims audio-rate, actually composite-rate).

=== NEW: FMUpsampler (not wired, for future split-rate path) ===

  dsp/fmupsample.go — production-grade phase-domain FM upsampler.
  Accumulates FM phase at source rate (228kHz), linearly interpolates
  to device rate (2.28MHz), emits IQ via sin/cos. Zero-allocation
  steady state. Cross-chunk boundary via prevPhase + srcPos carry.
  Symmetric phase wrapping. Virtual index coordinate system places
  prevPhase at vi=0, srcPhases at vi=1..N for seamless boundaries.

  dsp/fmupsample_test.go — 16 tests + 1 benchmark:
  - Zero/DC/varying composite signals
  - Output count for exact and non-integer ratios
  - Phase continuity across chunk boundaries (critical)
  - Non-integer ratio boundary continuity (50 chunks)
  - Phase wrapping over 100 chunks at full deviation
  - Negative deviation (tests symmetric wrapping)
  - Equivalence with FMModulator via frequency comparison
  - Buffer reuse verification (pointer identity)
  - Reset state clearing
  - Tiny chunks (1 sample), alternating composite (stress)
  - Long-run: 72000 chunks simulating 1 hour
  - Benchmark with ReportAllocs

  NOT wired into Engine — will be integrated when split-rate path
  is enabled for Pluto/HackRF on Raspi.

Files changed:
  internal/stereo/encoder.go        - phase coherence fix
  internal/rds/encoder.go           - pilot-locked carrier API
  internal/rds/normalize.go         - ASCII safety
  internal/offline/generator.go     - buffer reuse + sequence + doc
  internal/dsp/fmmod.go             - bidirectional phase wrap
  internal/dsp/oscillator.go        - bidirectional phase wrap
  internal/dsp/preemphasis.go       - dead field removal
  internal/dsp/fmupsample.go        - NEW: production FM upsampler
  internal/dsp/fmupsample_test.go   - NEW: 16 tests + benchmark
  internal/output/file.go           - batch write
  internal/app/engine.go            - WaitGroup + backoff
  internal/control/control.go       - TODO config propagation
