# Audio Ingest Implementation Plan Status: Phase-1 fixup snapshot (2026-04-07) Owner: Jan Scope: `fm-rds-tx` Related: `docs/audio-ingest-rework.md` ## Goal Build a first-class audio ingest subsystem that supports multiple source families without pushing transport-specific logic into the FM TX engine or DSP path. This plan starts with a pragmatic integration strategy: - keep the existing TX/DSP pipeline stable - introduce a new `internal/ingest` runtime in front of it - preserve `audio.StreamSource` as the immediate TX-facing sink for now - bring **Icecast ingest into Phase 1**, alongside stdin/raw HTTP ingest - treat **native decoding as a first-class goal from the start**, not a late add-on The key architectural principle is: > Source-family specifics live in source adapters. Shared buffering, health, lifecycle, conversion, and handoff to TX live in a common ingest runtime. ## Actual Phase-1 status (2026-04-07) Implemented: - `internal/ingest` runtime in front of `audio.StreamSource` - ingest source factory and config mapping for `stdin`, `http-raw`, `icecast` - stdin and HTTP raw adapters feeding shared runtime - runtime and source stats exposed via `/runtime` as `ingest.*` - Icecast source adapter with reconnect/backoff and decoder preference modes (`auto`, `native`, `ffmpeg`) - decoder registry and explicit ffmpeg fallback decoder implementation Still open on purpose: - native `mp3`, `oggvorbis`, `aac` decoder packages are placeholders (`ErrUnsupported`) - real decode path for Icecast is currently ffmpeg fallback - no AoIP/SRT ingest integration into shared runtime yet - no multi-source orchestration/failover policy yet --- ## Non-goals for the first implementation wave The first wave should **not** attempt to solve everything at once. Out of scope initially: - full multi-source orchestration with seamless hot failover - exhaustive native decoding support for every compressed format and edge case - replacing the existing `offline.Generator` source contract - redesigning the TX runtime state machine - broad UI redesign - a universal media graph framework We want a clean, incremental path, not a big-bang rewrite. --- ## Current state of the codebase The repository already has most of the TX-side hooks needed for a proper ingest subsystem: - `cmd/fmrtx/main.go` - creates `audio.StreamSource` - wires it into the engine via `engine.SetStreamSource(...)` - starts stdin and `/audio/stream` ingest paths directly - `internal/app/engine.go` - accepts a stream source via `SetStreamSource(...)` - wraps it in `audio.NewStreamResampler(...)` - injects it upstream of DSP via `generator.SetExternalSource(...)` - `internal/audio/stream.go` - provides a TX-facing SPSC ring buffer - provides a simple `StreamResampler` - tracks underruns, overflows, buffering, high watermark - `internal/offline/generator.go` - already cleanly accepts an external audio source - `aoiprxkit/` - already contains useful RTP/AES67/SAP/SRT receive-side primitives and stats This means the right move is **not** to redesign the FM core, but to formalize the missing ingest layer in front of the existing TX path. --- ## Target architecture ## Layers ### 1. Source adapters Each adapter owns family-specific behavior, for example: - process control for ffmpeg-based adapters - reconnect loops for Icecast - RTP depacketization and jitter buffering for AoIP - protocol-specific metadata and health signals Examples: - stdin PCM - HTTP raw PCM - Icecast stream - RTP/AES67 - SRT - future ffmpeg-backed generic URL/file ingest ### 2. Decoder layer A dedicated decoder layer sits between transport/session adapters and the shared ingest runtime. Responsibilities: - decode compressed audio streams into normalized PCM chunks - keep codec-specific logic out of Icecast and other source adapters - allow multiple decoder implementations behind a common interface - prefer native Go decoders where they are stable and good enough - allow an ffmpeg-backed fallback only as an implementation detail, not as the architecture Examples: - MP3 - Ogg/Vorbis - AAC/ADTS where practical - later: Opus or other codecs as needed Initial decoder priority should be: 1. MP3 2. Ogg/Vorbis 3. AAC/ADTS 4. Opus later if a concrete source requirement justifies it ### 3. Shared ingest runtime A common ingest runtime sits between decoders/source adapters and TX. Responsibilities: - source lifecycle - prebuffering policy - normalized source state - family-neutral telemetry - format conversion into TX-facing audio frames - writing into the existing `audio.StreamSource` - later: failover/orchestration ### 4. Existing TX path The TX side stays mostly unchanged: - `audio.StreamSource` - `audio.StreamResampler` - `Engine.SetStreamSource(...)` - `offline.Generator.SetExternalSource(...)` - FM/DSP chain The TX engine should not know whether input came from stdin, Icecast, SRT, RTP, or something else. --- ## Why Icecast is in Phase 1 Icecast should be introduced early, not postponed. Reasons: - it exercises a real long-running network stream rather than one-shot raw pushes - it forces lifecycle design immediately: connecting, connected, stalled, reconnecting, failed - it forces buffering and liveness behavior to be designed properly - it prevents the ingest layer from being accidentally overfit to only raw PCM push workflows - it reflects an important real-world ingest path for FM rebroadcast/transcoding scenarios - it forces the project to define a real decoder boundary early Early Icecast support **should aim for native decoding where practical**. Initial Icecast strategy should therefore be: - separate transport/runtime concerns from decoding concerns - define a decoder interface from the beginning - prefer native Go decoders for common formats where mature libraries exist - keep an ffmpeg-backed decoder only as fallback or temporary compatibility path - keep the ingest runtime and source adapter interfaces clean enough that decoder implementation can evolve without redesigning the whole ingest subsystem --- ## Phase plan ## Phase 1: create the ingest runtime and ship first adapters ### Deliverables - new `internal/ingest` package - a decoder abstraction as part of the ingest subsystem - a shared ingest runtime in front of `audio.StreamSource` - adapters for: - stdin PCM - raw HTTP PCM - Icecast stream - decoder boundary with preference/fallback policy in place - explicit Phase-1 codec prioritization: MP3 first, Ogg/Vorbis second, AAC/ADTS third - runtime and source stats exposed in control API - command/config plumbing for selecting an ingest source ### Phase 1 boundary At the end of Phase 1: - TX still consumes through `audio.StreamSource` - DSP path is unchanged - source families are no longer wired directly into `cmd/fmrtx/main.go` - Icecast works with reconnect + observable runtime state - decoder selection/fallback behavior is explicit and test-covered - native decoder implementations remain a follow-up item --- ## Phase 2: integrate structured network audio families ### Deliverables - adapters backed by `aoiprxkit` - RTP/AES67 ingest - SRT ingest - shared source stats mapped into ingest runtime stats ### Notes - family-specific jitter/packet handling stays inside adapter/family code - TX side continues to see normalized stereo frames only --- ## Phase 3: source selection, fallback, and richer policy ### Deliverables - primary/fallback source model - failure policy - source switching policy - improved operator telemetry - optional source prioritization and warm standby This phase should only start once single-source ingest is stable. --- ## New package structure Proposed initial layout: ```text internal/ ingest/ types.go source.go runtime.go convert.go stats.go factory.go decoder/ decoder.go mp3/ decoder.go aac/ decoder.go oggvorbis/ decoder.go fallback/ ffmpeg.go adapters/ stdinpcm/ source.go httpraw/ source.go icecast/ source.go reconnect.go ``` Later additions: ```text internal/ingest/adapters/ aoip/ srt/ ffmpeg/ ``` Notes: - codec-specific logic should live under `internal/ingest/decoder/` - ffmpeg, if retained at all, should live under an explicit fallback package - keep source-family code out of `internal/app` and `internal/offline` --- ## Core interfaces These are design targets, not fixed signatures. ## Normalized ingest-side frame model The ingest layer needs a family-neutral PCM representation before converting into the TX-facing `audio.Frame` stream. Proposed shape: ```go type PCMChunk struct { Samples []int32 Channels int SampleRateHz int Sequence uint64 Timestamp time.Time SourceID string Discontinuity bool } ``` Rationale: - expressive enough for RTP/AES67/SRT/decoded Icecast output - allows transport metadata to be preserved long enough for runtime logic and stats - avoids forcing all adapters into the same byte-stream assumption Future extension points if needed: - `Codec string` - `ClockDomain string` - `BitDepth int` - `PTS time.Duration` --- ## Source descriptor ```go type SourceDescriptor struct { ID string Kind string Family string Transport string Codec string Channels int SampleRateHz int Detail string } ``` Examples: - `Kind=stdin-pcm`, `Family=raw`, `Transport=stdin` - `Kind=http-raw`, `Family=raw`, `Transport=http` - `Kind=icecast`, `Family=streaming`, `Transport=http` - later `Kind=aes67`, `Family=aoip`, `Transport=rtp` --- ## Source interface Two patterns are reasonable: - channel-based delivery - sink/callback-based delivery For the first implementation, channel-based is usually easier to reason about. ```go type Source interface { Descriptor() SourceDescriptor Start(ctx context.Context) error Stop() error Chunks() <-chan PCMChunk Errors() <-chan error Stats() SourceStats } ``` Alternative callback model is acceptable if it reduces allocations or simplifies integration. Important constraint: - the source adapter owns family-specific I/O - the ingest runtime owns shared buffering/handoff policy --- ## Shared source stats ```go type SourceStats struct { State string Connected bool LastChunkAt time.Time ChunksIn uint64 SamplesIn uint64 BufferedSeconds float64 Overflows uint64 Underruns uint64 Reconnects uint64 Discontinuities uint64 TransportLoss uint64 Reorders uint64 JitterDepth int LastError string } ``` Not every source will populate every field. That is okay. The common runtime should expose a stable superset and leave unsupported fields at zero/default. --- ## Shared ingest runtime ## Responsibilities The runtime is the main missing abstraction in the current codebase. Responsibilities: - own exactly one active source in Phase 1 - start/stop the source cleanly - receive normalized `PCMChunk`s - convert them into TX-facing stereo frames - write them into `audio.StreamSource` - enforce prebuffering policy where relevant - expose common ingest state and health - detect stalls/reconnects/discontinuities ## Non-responsibilities The runtime should **not**: - parse RTP - manage ffmpeg stderr parsing for generic protocol details - implement protocol-specific jitter buffering directly - manipulate FM/DSP runtime states directly It reports ingest health; TX remains responsible for TX health. --- ## TX-facing sink strategy For now, keep this path: - ingest runtime writes into `audio.StreamSource` - `Engine.SetStreamSource(...)` remains unchanged - `audio.StreamResampler` remains the final rate adaptation step into composite/DSP rate This minimizes risk. It also keeps future refactors optional instead of mandatory. --- ## Conversion policy A shared conversion layer is required between `PCMChunk` and `audio.StreamSource`. ## Initial policy - accept mono or stereo only in Phase 1 if that keeps implementation smaller - mono input is duplicated to stereo - stereo input is mapped directly L/R - channels > 2 are rejected initially unless a simple, explicit downmix policy is added - normalize to the existing `audio.Sample` range `[-1, +1]` - clipping should be explicit and measured, not silent and invisible ## Why a dedicated conversion layer matters Without it, each source adapter will start doing its own ad hoc format mapping. That is exactly what the new ingest subsystem is supposed to prevent. --- ## Icecast adapter design ## Scope for the first Icecast implementation The first version needs to support a robust operator-visible ingest path **and** establish the decoder boundary correctly. It does not need to support every codec/container combination from day one, but it should not assume ffmpeg as the architectural default. ## Recommended structure ### Transport/lifecycle layer Responsibilities: - connect to Icecast URL over HTTP - validate response - track connection state - reconnect with backoff - observe stalls / EOF / disconnects - surface metadata and errors Implementation guidance: - prefer a Go library or a thin wrapper around the standard Go HTTP client for Icecast transport/session handling - do not hand-roll unnecessary low-level protocol machinery when existing libraries or the standard client already cover it well - keep transport/session concerns isolated from decoder logic and ingest runtime logic ### Decode layer Preferred initial option: - use native Go decoders for the first targeted formats where mature libraries exist - decode compressed stream data into PCM chunks behind a decoder interface - prioritize MP3 first and Ogg/Vorbis second because they are likely to give the best early return for Icecast support - evaluate AAC/ADTS next once the decoder boundary and streaming behavior are stable Fallback option: - keep an ffmpeg-backed decoder implementation available only as fallback/compatibility path This keeps the first release practical while preserving architecture. The key is to avoid letting “ffmpeg exists” collapse the whole ingest abstraction. Meaning: - Icecast adapter uses a transport/session client layer plus a decoder interface - transport/session handling should preferably come from a Go library or a thin wrapper around the standard HTTP client - decoder choice can be native Go or fallback ffmpeg - Icecast remains an adapter in `internal/ingest/adapters/icecast` - runtime still sees a normal source ## Expected Icecast states At minimum: - `idle` - `connecting` - `buffering` - `running` - `stalled` - `reconnecting` - `failed` - `stopped` These should be visible via runtime stats and eventually UI. --- ## stdin PCM adapter Purpose: - preserve current CLI-based piping workflows - move direct ingest logic out of `cmd/fmrtx/main.go` Responsibilities: - read S16LE stereo PCM from stdin - emit `PCMChunk`s or equivalent normalized blocks - expose simple source stats This adapter should be intentionally boring. --- ## raw HTTP PCM adapter Purpose: - preserve current `/audio/stream` functionality - move it behind the shared ingest runtime instead of writing directly to `audio.StreamSource` There are two reasonable implementation paths: ### Option A: keep `/audio/stream` as a push endpoint owned by control server - control server accepts request body - forwards PCM blocks into an ingest-owned writer/sink - ingest runtime still owns buffering/health ### Option B: implement an explicit push source abstraction - source adapter exposes a writable sink - control plane writes into that sink For Phase 1, Option A is probably the fastest path. But the important part is: - control server should no longer push directly into TX buffer - it should push into the ingest subsystem --- ## Runtime stats model Add a top-level ingest section to `/runtime`. Proposed shape: ```json { "ingest": { "active": { "id": "icecast-main", "kind": "icecast", "state": "running", "sampleRateHz": 44100, "channels": 2, "bufferedSeconds": 1.4, "reconnects": 1, "lastError": "" }, "runtime": { "state": "running", "prebuffering": false, "lastChunkAt": "...", "droppedFrames": 0, "convertErrors": 0, "writeBlocked": false } } } ``` This should sit alongside: - driver stats - engine stats - audio stream stats - control audit stats Initially, `audioStream` may remain exposed for debugging, but `ingest` should become the operator-facing abstraction. --- ## Config shape evolution Do not overload existing `audio.*` forever. The current `audio` config primarily models file/tone/test input assumptions. Introduce a new config subtree for ingest. ## Proposed shape ```json { "ingest": { "kind": "icecast", "prebufferMs": 1500, "stallTimeoutMs": 3000, "reconnect": { "enabled": true, "initialBackoffMs": 1000, "maxBackoffMs": 15000 }, "stdin": { "sampleRateHz": 44100, "channels": 2, "format": "s16le" }, "httpRaw": { "sampleRateHz": 44100, "channels": 2, "format": "s16le" }, "icecast": { "url": "http://...", "decoder": "ffmpeg" } } } ``` Notes: - keep current flags working initially for backward compatibility - map them internally into the new ingest config - do not force config migration immediately --- ## CLI evolution Current flags: - `--audio-stdin` - `--audio-rate` - `--audio-http` These can stay temporarily, but should become compatibility shims. Possible future direction: - `--ingest stdin` - `--ingest http-raw` - `--ingest icecast` - `--icecast-url ...` The exact CLI can wait, but internal structure should already assume a source factory. --- ## File-by-file implementation plan ## 1. Add new ingest package skeleton Create: - `internal/ingest/types.go` - `internal/ingest/source.go` - `internal/ingest/runtime.go` - `internal/ingest/convert.go` - `internal/ingest/stats.go` - `internal/ingest/factory.go` ### Acceptance - package compiles - no behavior change yet --- ## 2. Implement stdin adapter Create: - `internal/ingest/adapters/stdinpcm/source.go` Responsibilities: - read stdin PCM - emit normalized chunks - report basic stats ### Acceptance - reproduces current `--audio-stdin` behavior through ingest runtime - TX still works unchanged downstream --- ## 3. Implement shared ingest runtime with `audio.StreamSource` sink Runtime should: - own source start/stop - convert PCM chunks to `audio.Frame`s - write into `audio.StreamSource` - track runtime state and counters ### Acceptance - stdin path works end-to-end - engine remains unchanged except wiring - `/runtime` can expose ingest stats --- ## 4. Rewire `cmd/fmrtx/main.go` Replace direct source-specific logic with: - source selection - ingest runtime creation - runtime start/stop - existing engine wiring ### Important Remove direct writes like: - stdin goroutine writing directly into `audio.StreamSource` - HTTP handler writing directly into `audio.StreamSource` They should now pass through ingest runtime abstractions. ### Acceptance - codepath is cleaner - source-family logic no longer lives in main --- ## 5. Rework raw HTTP ingest to target ingest runtime Modify control layer so `/audio/stream` targets ingest subsystem rather than TX ring directly. Likely affected file: - `internal/control/control.go` ### Acceptance - `/audio/stream` still works - stats reflect ingest runtime, not just raw ring buffer --- ## 6. Implement decoder layer and Icecast adapter Create: - `internal/ingest/decoder/decoder.go` - `internal/ingest/decoder/mp3/decoder.go` - `internal/ingest/decoder/aac/decoder.go` - `internal/ingest/decoder/oggvorbis/decoder.go` - optional fallback: `internal/ingest/decoder/fallback/ffmpeg.go` - `internal/ingest/adapters/icecast/source.go` - `internal/ingest/adapters/icecast/reconnect.go` ### Responsibilities - decoder interface turns compressed audio into PCM chunks - native decoder implementations cover the initial target formats where stable libraries exist - Icecast adapter handles HTTP connect/reconnect/lifecycle - Icecast transport/session handling should use a Go library or a thin wrapper around the standard HTTP client where appropriate - Icecast adapter selects and drives a decoder - emit PCM chunks - expose state transitions and errors ### Acceptance - long-running Icecast ingest works - native decoding is used for the initial supported formats - disconnect/reconnect is observable and recovers automatically - fallback path is explicit, not architectural default - TX path remains stable --- ## 7. Add ingest stats to control API Likely affected files: - `internal/control/control.go` - possibly UI if runtime page surfaces ingest info ### Acceptance - `/runtime` shows ingest state - operator can tell whether source is connecting/running/stalled/reconnecting --- ## 8. Introduce ingest config structure Likely affected file: - `internal/config/config.go` ### Strategy - add new config subtree without breaking old flags immediately - map legacy flag combinations into new config internally ### Acceptance - existing flows still work - new ingest configs can select Icecast cleanly --- ## Testing plan ## Unit tests ### `internal/ingest/convert.go` Test: - mono to stereo duplication - stereo pass-through - unsupported channel counts - clipping/normalization behavior - chunk boundary correctness ### stdin adapter Test: - reads PCM correctly - emits expected sample counts - EOF handling ### ingest runtime Test: - source start/stop lifecycle - writes converted frames into sink - prebuffer behavior - stall detection - source error propagation ### Icecast adapter Use test HTTP server where possible. Test: - connect success - reconnect after disconnect - state transitions - decoder failure handling - backoff behavior --- ## Integration tests ### TX path with ingest runtime Test: - ingest runtime feeding `audio.StreamSource` - engine consumes without regression - runtime stats remain coherent ### `/audio/stream` Test: - POST still works - control path now targets ingest layer ### Icecast smoke test Even if partly gated or environment-specific, define a repeatable smoke path. --- ## Operational telemetry requirements At minimum, operators should be able to answer these questions: - what source is active? - what family is it? - is it connected? - how much audio is buffered? - when did we last receive audio? - are we reconnecting? - what was the last ingest error? - are stalls/discontinuities happening? If those are not visible, ingest debugging will be painful. --- ## Risks and mitigations ## Risk 1: pushing too much complexity into Phase 1 Mitigation: - keep one active source only - preserve `audio.StreamSource` - avoid failover until the single-source path is stable ## Risk 2: decode strategy pollutes architecture Mitigation: - isolate codec logic behind a decoder interface - prefer native Go decoders for the initial supported formats - if ffmpeg is retained, keep it in an explicit fallback decoder package - do not let decode mechanism define runtime abstractions ## Risk 3: duplicated buffering causing latency confusion Mitigation: - document each buffering layer clearly - expose ingest buffered seconds separately from TX ring stats - keep prebuffer policy explicit ## Risk 4: unclear ownership of resampling Mitigation: - keep transport/family decode at native source rate - keep final TX-facing adaptation centralized near current `StreamResampler` - do not add ad hoc resamplers in every adapter unless protocol-specific needs require it ## Risk 5: channel/format sprawl too early Mitigation: - define a strict Phase 1 acceptance matrix - only support the combinations we actually test --- ## Recommended Phase 1 acceptance matrix ### stdin PCM - format: S16LE - channels: 2 - sample rates: 44100, 48000 ### raw HTTP PCM - format: S16LE - channels: 2 - sample rates: 44100, 48000 ### Icecast - one known-good stream path - reconnect behavior verified - native decoding works for at least MP3 in Phase 1 - ideally native decoding also works for Ogg/Vorbis in Phase 1 - AAC/ADTS can enter Phase 1 only if the chosen decoder and stream behavior are solid enough - decoded output normalized into stereo frames Optional but useful: - mono handling for at least one ingest path --- ## Suggested implementation order 1. add ingest package skeleton 2. implement conversion helpers 3. implement stdin adapter 4. implement ingest runtime writing into `audio.StreamSource` 5. rewire `cmd/fmrtx/main.go` to use runtime for stdin 6. route `/audio/stream` into ingest runtime 7. expose ingest stats in `/runtime` 8. implement decoder layer with native codec support for initial target formats, in this order: - MP3 - Ogg/Vorbis - AAC/ADTS if stable enough 9. implement Icecast adapter with reconnect + decoder selection 10. add ingest config subtree and compatibility mapping 11. polish tests, docs, and operator-facing runtime fields This order gives a narrow vertical slice early, then extends it. --- ## Concrete code touch points ### New files - `internal/ingest/types.go` - `internal/ingest/source.go` - `internal/ingest/runtime.go` - `internal/ingest/convert.go` - `internal/ingest/stats.go` - `internal/ingest/factory.go` - `internal/ingest/decoder/decoder.go` - `internal/ingest/decoder/mp3/decoder.go` - `internal/ingest/decoder/aac/decoder.go` - `internal/ingest/decoder/oggvorbis/decoder.go` - optional fallback: `internal/ingest/decoder/fallback/ffmpeg.go` - `internal/ingest/adapters/stdinpcm/source.go` - `internal/ingest/adapters/icecast/source.go` - `internal/ingest/adapters/icecast/reconnect.go` ### Existing files likely to change - `cmd/fmrtx/main.go` - `internal/control/control.go` - `internal/config/config.go` - possibly `internal/app/engine.go` only for wiring or runtime exposure, not architectural overhaul ### Existing files that should stay mostly untouched - `internal/offline/generator.go` - most DSP files - output/backend implementations --- ## Final design stance The new ingest subsystem should be treated as a first-class runtime boundary, not as a pile of helper functions. The repository already has the correct TX-side seam: - external source - stream buffer - final resampler - engine/DSP separation So the implementation should respect that and formalize the missing upstream ingest layer. The most important practical decisions in this plan are: - **Icecast enters in Phase 1** - **native decoding is a first-class target from the start** - fallback decoding is allowed only as an explicit compatibility path, provided the architecture stays clean That gives us a realistic ingest design early without destabilizing the FM core.