Просмотр исходного кода

docs: add initial CUDA demod kernel source

master
Jan Svabenik 2 дней назад
Родитель
Сommit
768fb0ee3e
3 измененных файлов: 54 добавлений и 0 удалений
  1. +31
    -0
      internal/demod/gpudemod/README.md
  2. +2
    -0
      internal/demod/gpudemod/doc.go
  3. +21
    -0
      internal/demod/gpudemod/kernels.cu

+ 31
- 0
internal/demod/gpudemod/README.md Просмотреть файл

@@ -0,0 +1,31 @@
# gpudemod

Phase 1 CUDA demod scaffolding.

## Current state

- Standard Go builds use `gpudemod_stub.go` (`!cufft`).
- `cufft` builds allocate GPU buffers and cross the CGO/CUDA launch boundary.
- If CUDA launch wrappers are not backed by compiled kernels yet, the code falls back to CPU DSP.
- The shifted IQ path is already wired so a successful GPU freq-shift result can be copied back and reused immediately.

## First real kernel

`kernels.cu` contains the first candidate implementation:
- `gpud_freq_shift_kernel`

This is **not compiled automatically yet** in the current environment because the machine currently lacks a CUDA compiler toolchain in PATH (`nvcc` not found).

## Next machine-side step

On a CUDA-capable dev machine with toolchain installed:

1. Compile `kernels.cu` into an object file
2. Link it into the `cufft` build
3. Replace `gpud_launch_freq_shift(...)` stub body with the real kernel launch
4. Validate copied-back shifted IQ against `dsp.FreqShift`
5. Only then move the next stage (FM discriminator) onto the GPU

## Why this is still useful

The runtime/buffer/recorder/fallback structure is already in place, so once kernel compilation is available, real acceleration can be inserted without another architecture rewrite.

+ 2
- 0
internal/demod/gpudemod/doc.go Просмотреть файл

@@ -5,6 +5,8 @@
// - cufft builds allocate GPU buffers and cross the CGO/CUDA launch boundary.
// - If/when a CUDA freq-shift launch succeeds, the shifted IQ is copied back and
// reused by the remaining CPU-side FIR/decimate/NFM pipeline.
// - kernels.cu contains the first real candidate kernel source, ready for
// toolchain-backed integration on a CUDA build machine.
//
// This keeps Phase 1 incremental and verifiable while later phases replace the
// placeholder launch wrappers with real kernels.


+ 21
- 0
internal/demod/gpudemod/kernels.cu Просмотреть файл

@@ -0,0 +1,21 @@
#include <cuda_runtime.h>
#include <math.h>

extern "C" __global__ void gpud_freq_shift_kernel(
const float2* __restrict__ in,
float2* __restrict__ out,
int n,
double phase_inc,
double phase_start
) {
int idx = blockIdx.x * blockDim.x + threadIdx.x;
if (idx >= n) return;

double phase = phase_start + phase_inc * (double)idx;
float si, co;
sincosf((float)phase, &si, &co);

float2 v = in[idx];
out[idx].x = v.x * co - v.y * si;
out[idx].y = v.x * si + v.y * co;
}

Загрузка…
Отмена
Сохранить