Audio Ingest Implementation Plan
Status: Phase-1 fixup snapshot (2026-04-07)
Owner: Jan
Scope: fm-rds-tx
Related: docs/audio-ingest-rework.md
Goal
Build a first-class audio ingest subsystem that supports multiple source families without pushing transport-specific logic into the FM TX engine or DSP path.
This plan starts with a pragmatic integration strategy:
- keep the existing TX/DSP pipeline stable
- introduce a new
internal/ingest runtime in front of it
- preserve
audio.StreamSource as the immediate TX-facing sink for now
- bring Icecast ingest into Phase 1, alongside stdin/raw HTTP ingest
- treat native decoding as a first-class goal from the start, not a late add-on
The key architectural principle is:
Source-family specifics live in source adapters. Shared buffering, health, lifecycle, conversion, and handoff to TX live in a common ingest runtime.
Actual Phase-1 status (2026-04-07)
Implemented:
internal/ingest runtime in front of audio.StreamSource
- ingest source factory and config mapping for
stdin, http-raw, icecast
- stdin and HTTP raw adapters feeding shared runtime
- runtime and source stats exposed via
/runtime as ingest.*
- Icecast source adapter with reconnect/backoff and decoder preference modes (
auto, native, ffmpeg)
- decoder registry and explicit ffmpeg fallback decoder implementation
Still open on purpose:
- native
mp3, oggvorbis, aac decoder packages are placeholders (ErrUnsupported)
- real decode path for Icecast is currently ffmpeg fallback
- no AoIP/SRT ingest integration into shared runtime yet
- no multi-source orchestration/failover policy yet
Non-goals for the first implementation wave
The first wave should not attempt to solve everything at once.
Out of scope initially:
- full multi-source orchestration with seamless hot failover
- exhaustive native decoding support for every compressed format and edge case
- replacing the existing
offline.Generator source contract
- redesigning the TX runtime state machine
- broad UI redesign
- a universal media graph framework
We want a clean, incremental path, not a big-bang rewrite.
Current state of the codebase
The repository already has most of the TX-side hooks needed for a proper ingest subsystem:
cmd/fmrtx/main.go
- creates
audio.StreamSource
- wires it into the engine via
engine.SetStreamSource(...)
- starts stdin and
/audio/stream ingest paths directly
internal/app/engine.go
- accepts a stream source via
SetStreamSource(...)
- wraps it in
audio.NewStreamResampler(...)
- injects it upstream of DSP via
generator.SetExternalSource(...)
internal/audio/stream.go
- provides a TX-facing SPSC ring buffer
- provides a simple
StreamResampler
- tracks underruns, overflows, buffering, high watermark
internal/offline/generator.go
- already cleanly accepts an external audio source
aoiprxkit/
- already contains useful RTP/AES67/SAP/SRT receive-side primitives and stats
This means the right move is not to redesign the FM core, but to formalize the missing ingest layer in front of the existing TX path.
Target architecture
Layers
1. Source adapters
Each adapter owns family-specific behavior, for example:
- process control for ffmpeg-based adapters
- reconnect loops for Icecast
- RTP depacketization and jitter buffering for AoIP
- protocol-specific metadata and health signals
Examples:
- stdin PCM
- HTTP raw PCM
- Icecast stream
- RTP/AES67
- SRT
- future ffmpeg-backed generic URL/file ingest
2. Decoder layer
A dedicated decoder layer sits between transport/session adapters and the shared ingest runtime.
Responsibilities:
- decode compressed audio streams into normalized PCM chunks
- keep codec-specific logic out of Icecast and other source adapters
- allow multiple decoder implementations behind a common interface
- prefer native Go decoders where they are stable and good enough
- allow an ffmpeg-backed fallback only as an implementation detail, not as the architecture
Examples:
- MP3
- Ogg/Vorbis
- AAC/ADTS where practical
- later: Opus or other codecs as needed
Initial decoder priority should be:
- MP3
- Ogg/Vorbis
- AAC/ADTS
- Opus later if a concrete source requirement justifies it
3. Shared ingest runtime
A common ingest runtime sits between decoders/source adapters and TX.
Responsibilities:
- source lifecycle
- prebuffering policy
- normalized source state
- family-neutral telemetry
- format conversion into TX-facing audio frames
- writing into the existing
audio.StreamSource
- later: failover/orchestration
4. Existing TX path
The TX side stays mostly unchanged:
audio.StreamSource
audio.StreamResampler
Engine.SetStreamSource(...)
offline.Generator.SetExternalSource(...)
- FM/DSP chain
The TX engine should not know whether input came from stdin, Icecast, SRT, RTP, or something else.
Why Icecast is in Phase 1
Icecast should be introduced early, not postponed.
Reasons:
- it exercises a real long-running network stream rather than one-shot raw pushes
- it forces lifecycle design immediately: connecting, connected, stalled, reconnecting, failed
- it forces buffering and liveness behavior to be designed properly
- it prevents the ingest layer from being accidentally overfit to only raw PCM push workflows
- it reflects an important real-world ingest path for FM rebroadcast/transcoding scenarios
- it forces the project to define a real decoder boundary early
Early Icecast support should aim for native decoding where practical.
Initial Icecast strategy should therefore be:
- separate transport/runtime concerns from decoding concerns
- define a decoder interface from the beginning
- prefer native Go decoders for common formats where mature libraries exist
- keep an ffmpeg-backed decoder only as fallback or temporary compatibility path
- keep the ingest runtime and source adapter interfaces clean enough that decoder implementation can evolve without redesigning the whole ingest subsystem
Phase plan
Phase 1: create the ingest runtime and ship first adapters
Deliverables
- new
internal/ingest package
- a decoder abstraction as part of the ingest subsystem
- a shared ingest runtime in front of
audio.StreamSource
- adapters for:
- stdin PCM
- raw HTTP PCM
- Icecast stream
- decoder boundary with preference/fallback policy in place
- explicit Phase-1 codec prioritization: MP3 first, Ogg/Vorbis second, AAC/ADTS third
- runtime and source stats exposed in control API
- command/config plumbing for selecting an ingest source
Phase 1 boundary
At the end of Phase 1:
- TX still consumes through
audio.StreamSource
- DSP path is unchanged
- source families are no longer wired directly into
cmd/fmrtx/main.go
- Icecast works with reconnect + observable runtime state
- decoder selection/fallback behavior is explicit and test-covered
- native decoder implementations remain a follow-up item
Phase 2: integrate structured network audio families
Deliverables
- adapters backed by
aoiprxkit
- RTP/AES67 ingest
- SRT ingest
- shared source stats mapped into ingest runtime stats
Notes
- family-specific jitter/packet handling stays inside adapter/family code
- TX side continues to see normalized stereo frames only
Phase 3: source selection, fallback, and richer policy
Deliverables
- primary/fallback source model
- failure policy
- source switching policy
- improved operator telemetry
- optional source prioritization and warm standby
This phase should only start once single-source ingest is stable.
New package structure
Proposed initial layout:
internal/
ingest/
types.go
source.go
runtime.go
convert.go
stats.go
factory.go
decoder/
decoder.go
mp3/
decoder.go
aac/
decoder.go
oggvorbis/
decoder.go
fallback/
ffmpeg.go
adapters/
stdinpcm/
source.go
httpraw/
source.go
icecast/
source.go
reconnect.go
Later additions:
internal/ingest/adapters/
aoip/
srt/
ffmpeg/
Notes:
- codec-specific logic should live under
internal/ingest/decoder/
- ffmpeg, if retained at all, should live under an explicit fallback package
- keep source-family code out of
internal/app and internal/offline
Core interfaces
These are design targets, not fixed signatures.
Normalized ingest-side frame model
The ingest layer needs a family-neutral PCM representation before converting into the TX-facing audio.Frame stream.
Proposed shape:
type PCMChunk struct {
Samples []int32
Channels int
SampleRateHz int
Sequence uint64
Timestamp time.Time
SourceID string
Discontinuity bool
}
Rationale:
- expressive enough for RTP/AES67/SRT/decoded Icecast output
- allows transport metadata to be preserved long enough for runtime logic and stats
- avoids forcing all adapters into the same byte-stream assumption
Future extension points if needed:
Codec string
ClockDomain string
BitDepth int
PTS time.Duration
Source descriptor
type SourceDescriptor struct {
ID string
Kind string
Family string
Transport string
Codec string
Channels int
SampleRateHz int
Detail string
}
Examples:
Kind=stdin-pcm, Family=raw, Transport=stdin
Kind=http-raw, Family=raw, Transport=http
Kind=icecast, Family=streaming, Transport=http
- later
Kind=aes67, Family=aoip, Transport=rtp
Source interface
Two patterns are reasonable:
- channel-based delivery
- sink/callback-based delivery
For the first implementation, channel-based is usually easier to reason about.
type Source interface {
Descriptor() SourceDescriptor
Start(ctx context.Context) error
Stop() error
Chunks() <-chan PCMChunk
Errors() <-chan error
Stats() SourceStats
}
Alternative callback model is acceptable if it reduces allocations or simplifies integration.
Important constraint:
- the source adapter owns family-specific I/O
- the ingest runtime owns shared buffering/handoff policy
Shared source stats
type SourceStats struct {
State string
Connected bool
LastChunkAt time.Time
ChunksIn uint64
SamplesIn uint64
BufferedSeconds float64
Overflows uint64
Underruns uint64
Reconnects uint64
Discontinuities uint64
TransportLoss uint64
Reorders uint64
JitterDepth int
LastError string
}
Not every source will populate every field.
That is okay.
The common runtime should expose a stable superset and leave unsupported fields at zero/default.
Shared ingest runtime
Responsibilities
The runtime is the main missing abstraction in the current codebase.
Responsibilities:
- own exactly one active source in Phase 1
- start/stop the source cleanly
- receive normalized
PCMChunks
- convert them into TX-facing stereo frames
- write them into
audio.StreamSource
- enforce prebuffering policy where relevant
- expose common ingest state and health
- detect stalls/reconnects/discontinuities
Non-responsibilities
The runtime should not:
- parse RTP
- manage ffmpeg stderr parsing for generic protocol details
- implement protocol-specific jitter buffering directly
- manipulate FM/DSP runtime states directly
It reports ingest health; TX remains responsible for TX health.
TX-facing sink strategy
For now, keep this path:
- ingest runtime writes into
audio.StreamSource
Engine.SetStreamSource(...) remains unchanged
audio.StreamResampler remains the final rate adaptation step into composite/DSP rate
This minimizes risk.
It also keeps future refactors optional instead of mandatory.
Conversion policy
A shared conversion layer is required between PCMChunk and audio.StreamSource.
Initial policy
- accept mono or stereo only in Phase 1 if that keeps implementation smaller
- mono input is duplicated to stereo
- stereo input is mapped directly L/R
- channels > 2 are rejected initially unless a simple, explicit downmix policy is added
- normalize to the existing
audio.Sample range [-1, +1]
- clipping should be explicit and measured, not silent and invisible
Why a dedicated conversion layer matters
Without it, each source adapter will start doing its own ad hoc format mapping.
That is exactly what the new ingest subsystem is supposed to prevent.
Icecast adapter design
Scope for the first Icecast implementation
The first version needs to support a robust operator-visible ingest path and establish the decoder boundary correctly.
It does not need to support every codec/container combination from day one, but it should not assume ffmpeg as the architectural default.
Recommended structure
Transport/lifecycle layer
Responsibilities:
- connect to Icecast URL over HTTP
- validate response
- track connection state
- reconnect with backoff
- observe stalls / EOF / disconnects
- surface metadata and errors
Implementation guidance:
- prefer a Go library or a thin wrapper around the standard Go HTTP client for Icecast transport/session handling
- do not hand-roll unnecessary low-level protocol machinery when existing libraries or the standard client already cover it well
- keep transport/session concerns isolated from decoder logic and ingest runtime logic
Decode layer
Preferred initial option:
- use native Go decoders for the first targeted formats where mature libraries exist
- decode compressed stream data into PCM chunks behind a decoder interface
- prioritize MP3 first and Ogg/Vorbis second because they are likely to give the best early return for Icecast support
- evaluate AAC/ADTS next once the decoder boundary and streaming behavior are stable
Fallback option:
- keep an ffmpeg-backed decoder implementation available only as fallback/compatibility path
This keeps the first release practical while preserving architecture.
The key is to avoid letting “ffmpeg exists” collapse the whole ingest abstraction.
Meaning:
- Icecast adapter uses a transport/session client layer plus a decoder interface
- transport/session handling should preferably come from a Go library or a thin wrapper around the standard HTTP client
- decoder choice can be native Go or fallback ffmpeg
- Icecast remains an adapter in
internal/ingest/adapters/icecast
- runtime still sees a normal source
Expected Icecast states
At minimum:
idle
connecting
buffering
running
stalled
reconnecting
failed
stopped
These should be visible via runtime stats and eventually UI.
stdin PCM adapter
Purpose:
- preserve current CLI-based piping workflows
- move direct ingest logic out of
cmd/fmrtx/main.go
Responsibilities:
- read S16LE stereo PCM from stdin
- emit
PCMChunks or equivalent normalized blocks
- expose simple source stats
This adapter should be intentionally boring.
raw HTTP PCM adapter
Purpose:
- preserve current
/audio/stream functionality
- move it behind the shared ingest runtime instead of writing directly to
audio.StreamSource
There are two reasonable implementation paths:
Option A: keep /audio/stream as a push endpoint owned by control server
- control server accepts request body
- forwards PCM blocks into an ingest-owned writer/sink
- ingest runtime still owns buffering/health
Option B: implement an explicit push source abstraction
- source adapter exposes a writable sink
- control plane writes into that sink
For Phase 1, Option A is probably the fastest path.
But the important part is:
- control server should no longer push directly into TX buffer
- it should push into the ingest subsystem
Runtime stats model
Add a top-level ingest section to /runtime.
Proposed shape:
{
"ingest": {
"active": {
"id": "icecast-main",
"kind": "icecast",
"state": "running",
"sampleRateHz": 44100,
"channels": 2,
"bufferedSeconds": 1.4,
"reconnects": 1,
"lastError": ""
},
"runtime": {
"state": "running",
"prebuffering": false,
"lastChunkAt": "...",
"droppedFrames": 0,
"convertErrors": 0,
"writeBlocked": false
}
}
}
This should sit alongside:
- driver stats
- engine stats
- audio stream stats
- control audit stats
Initially, audioStream may remain exposed for debugging, but ingest should become the operator-facing abstraction.
Config shape evolution
Do not overload existing audio.* forever.
The current audio config primarily models file/tone/test input assumptions.
Introduce a new config subtree for ingest.
Proposed shape
{
"ingest": {
"kind": "icecast",
"prebufferMs": 1500,
"stallTimeoutMs": 3000,
"reconnect": {
"enabled": true,
"initialBackoffMs": 1000,
"maxBackoffMs": 15000
},
"stdin": {
"sampleRateHz": 44100,
"channels": 2,
"format": "s16le"
},
"httpRaw": {
"sampleRateHz": 44100,
"channels": 2,
"format": "s16le"
},
"icecast": {
"url": "http://...",
"decoder": "ffmpeg"
}
}
}
Notes:
- keep current flags working initially for backward compatibility
- map them internally into the new ingest config
- do not force config migration immediately
CLI evolution
Current flags:
--audio-stdin
--audio-rate
--audio-http
These can stay temporarily, but should become compatibility shims.
Possible future direction:
--ingest stdin
--ingest http-raw
--ingest icecast
--icecast-url ...
The exact CLI can wait, but internal structure should already assume a source factory.
File-by-file implementation plan
1. Add new ingest package skeleton
Create:
internal/ingest/types.go
internal/ingest/source.go
internal/ingest/runtime.go
internal/ingest/convert.go
internal/ingest/stats.go
internal/ingest/factory.go
Acceptance
- package compiles
- no behavior change yet
2. Implement stdin adapter
Create:
internal/ingest/adapters/stdinpcm/source.go
Responsibilities:
- read stdin PCM
- emit normalized chunks
- report basic stats
Acceptance
- reproduces current
--audio-stdin behavior through ingest runtime
- TX still works unchanged downstream
3. Implement shared ingest runtime with audio.StreamSource sink
Runtime should:
- own source start/stop
- convert PCM chunks to
audio.Frames
- write into
audio.StreamSource
- track runtime state and counters
Acceptance
- stdin path works end-to-end
- engine remains unchanged except wiring
/runtime can expose ingest stats
4. Rewire cmd/fmrtx/main.go
Replace direct source-specific logic with:
- source selection
- ingest runtime creation
- runtime start/stop
- existing engine wiring
Important
Remove direct writes like:
- stdin goroutine writing directly into
audio.StreamSource
- HTTP handler writing directly into
audio.StreamSource
They should now pass through ingest runtime abstractions.
Acceptance
- codepath is cleaner
- source-family logic no longer lives in main
5. Rework raw HTTP ingest to target ingest runtime
Modify control layer so /audio/stream targets ingest subsystem rather than TX ring directly.
Likely affected file:
internal/control/control.go
Acceptance
/audio/stream still works
- stats reflect ingest runtime, not just raw ring buffer
6. Implement decoder layer and Icecast adapter
Create:
internal/ingest/decoder/decoder.go
internal/ingest/decoder/mp3/decoder.go
internal/ingest/decoder/aac/decoder.go
internal/ingest/decoder/oggvorbis/decoder.go
- optional fallback:
internal/ingest/decoder/fallback/ffmpeg.go
internal/ingest/adapters/icecast/source.go
internal/ingest/adapters/icecast/reconnect.go
Responsibilities
- decoder interface turns compressed audio into PCM chunks
- native decoder implementations cover the initial target formats where stable libraries exist
- Icecast adapter handles HTTP connect/reconnect/lifecycle
- Icecast transport/session handling should use a Go library or a thin wrapper around the standard HTTP client where appropriate
- Icecast adapter selects and drives a decoder
- emit PCM chunks
- expose state transitions and errors
Acceptance
- long-running Icecast ingest works
- native decoding is used for the initial supported formats
- disconnect/reconnect is observable and recovers automatically
- fallback path is explicit, not architectural default
- TX path remains stable
7. Add ingest stats to control API
Likely affected files:
internal/control/control.go
- possibly UI if runtime page surfaces ingest info
Acceptance
/runtime shows ingest state
- operator can tell whether source is connecting/running/stalled/reconnecting
8. Introduce ingest config structure
Likely affected file:
internal/config/config.go
Strategy
- add new config subtree without breaking old flags immediately
- map legacy flag combinations into new config internally
Acceptance
- existing flows still work
- new ingest configs can select Icecast cleanly
Testing plan
Unit tests
internal/ingest/convert.go
Test:
- mono to stereo duplication
- stereo pass-through
- unsupported channel counts
- clipping/normalization behavior
- chunk boundary correctness
stdin adapter
Test:
- reads PCM correctly
- emits expected sample counts
- EOF handling
ingest runtime
Test:
- source start/stop lifecycle
- writes converted frames into sink
- prebuffer behavior
- stall detection
- source error propagation
Icecast adapter
Use test HTTP server where possible.
Test:
- connect success
- reconnect after disconnect
- state transitions
- decoder failure handling
- backoff behavior
Integration tests
TX path with ingest runtime
Test:
- ingest runtime feeding
audio.StreamSource
- engine consumes without regression
- runtime stats remain coherent
/audio/stream
Test:
- POST still works
- control path now targets ingest layer
Icecast smoke test
Even if partly gated or environment-specific, define a repeatable smoke path.
Operational telemetry requirements
At minimum, operators should be able to answer these questions:
- what source is active?
- what family is it?
- is it connected?
- how much audio is buffered?
- when did we last receive audio?
- are we reconnecting?
- what was the last ingest error?
- are stalls/discontinuities happening?
If those are not visible, ingest debugging will be painful.
Risks and mitigations
Risk 1: pushing too much complexity into Phase 1
Mitigation:
- keep one active source only
- preserve
audio.StreamSource
- avoid failover until the single-source path is stable
Risk 2: decode strategy pollutes architecture
Mitigation:
- isolate codec logic behind a decoder interface
- prefer native Go decoders for the initial supported formats
- if ffmpeg is retained, keep it in an explicit fallback decoder package
- do not let decode mechanism define runtime abstractions
Risk 3: duplicated buffering causing latency confusion
Mitigation:
- document each buffering layer clearly
- expose ingest buffered seconds separately from TX ring stats
- keep prebuffer policy explicit
Risk 4: unclear ownership of resampling
Mitigation:
- keep transport/family decode at native source rate
- keep final TX-facing adaptation centralized near current
StreamResampler
- do not add ad hoc resamplers in every adapter unless protocol-specific needs require it
Risk 5: channel/format sprawl too early
Mitigation:
- define a strict Phase 1 acceptance matrix
- only support the combinations we actually test
Recommended Phase 1 acceptance matrix
stdin PCM
- format: S16LE
- channels: 2
- sample rates: 44100, 48000
raw HTTP PCM
- format: S16LE
- channels: 2
- sample rates: 44100, 48000
Icecast
- one known-good stream path
- reconnect behavior verified
- native decoding works for at least MP3 in Phase 1
- ideally native decoding also works for Ogg/Vorbis in Phase 1
- AAC/ADTS can enter Phase 1 only if the chosen decoder and stream behavior are solid enough
- decoded output normalized into stereo frames
Optional but useful:
- mono handling for at least one ingest path
Suggested implementation order
- add ingest package skeleton
- implement conversion helpers
- implement stdin adapter
- implement ingest runtime writing into
audio.StreamSource
- rewire
cmd/fmrtx/main.go to use runtime for stdin
- route
/audio/stream into ingest runtime
- expose ingest stats in
/runtime
- implement decoder layer with native codec support for initial target formats, in this order:
- MP3
- Ogg/Vorbis
- AAC/ADTS if stable enough
- implement Icecast adapter with reconnect + decoder selection
- add ingest config subtree and compatibility mapping
- polish tests, docs, and operator-facing runtime fields
This order gives a narrow vertical slice early, then extends it.
Concrete code touch points
New files
internal/ingest/types.go
internal/ingest/source.go
internal/ingest/runtime.go
internal/ingest/convert.go
internal/ingest/stats.go
internal/ingest/factory.go
internal/ingest/decoder/decoder.go
internal/ingest/decoder/mp3/decoder.go
internal/ingest/decoder/aac/decoder.go
internal/ingest/decoder/oggvorbis/decoder.go
- optional fallback:
internal/ingest/decoder/fallback/ffmpeg.go
internal/ingest/adapters/stdinpcm/source.go
internal/ingest/adapters/icecast/source.go
internal/ingest/adapters/icecast/reconnect.go
Existing files likely to change
cmd/fmrtx/main.go
internal/control/control.go
internal/config/config.go
- possibly
internal/app/engine.go only for wiring or runtime exposure, not architectural overhaul
Existing files that should stay mostly untouched
internal/offline/generator.go
- most DSP files
- output/backend implementations
Final design stance
The new ingest subsystem should be treated as a first-class runtime boundary, not as a pile of helper functions.
The repository already has the correct TX-side seam:
- external source
- stream buffer
- final resampler
- engine/DSP separation
So the implementation should respect that and formalize the missing upstream ingest layer.
The most important practical decisions in this plan are:
- Icecast enters in Phase 1
- native decoding is a first-class target from the start
- fallback decoding is allowed only as an explicit compatibility path, provided the architecture stays clean
That gives us a realistic ingest design early without destabilizing the FM core.