|
|
|
@@ -0,0 +1,31 @@ |
|
|
|
# gpudemod |
|
|
|
|
|
|
|
Phase 1 CUDA demod scaffolding. |
|
|
|
|
|
|
|
## Current state |
|
|
|
|
|
|
|
- Standard Go builds use `gpudemod_stub.go` (`!cufft`). |
|
|
|
- `cufft` builds allocate GPU buffers and cross the CGO/CUDA launch boundary. |
|
|
|
- If CUDA launch wrappers are not backed by compiled kernels yet, the code falls back to CPU DSP. |
|
|
|
- The shifted IQ path is already wired so a successful GPU freq-shift result can be copied back and reused immediately. |
|
|
|
|
|
|
|
## First real kernel |
|
|
|
|
|
|
|
`kernels.cu` contains the first candidate implementation: |
|
|
|
- `gpud_freq_shift_kernel` |
|
|
|
|
|
|
|
This is **not compiled automatically yet** in the current environment because the machine currently lacks a CUDA compiler toolchain in PATH (`nvcc` not found). |
|
|
|
|
|
|
|
## Next machine-side step |
|
|
|
|
|
|
|
On a CUDA-capable dev machine with toolchain installed: |
|
|
|
|
|
|
|
1. Compile `kernels.cu` into an object file |
|
|
|
2. Link it into the `cufft` build |
|
|
|
3. Replace `gpud_launch_freq_shift(...)` stub body with the real kernel launch |
|
|
|
4. Validate copied-back shifted IQ against `dsp.FreqShift` |
|
|
|
5. Only then move the next stage (FM discriminator) onto the GPU |
|
|
|
|
|
|
|
## Why this is still useful |
|
|
|
|
|
|
|
The runtime/buffer/recorder/fallback structure is already in place, so once kernel compilation is available, real acceleration can be inserted without another architecture rewrite. |