Optimizing SDR performances on multi core processors: introducing the Cloud-SDR way

Extracting the sub-band of interest from a wide-band SDR device is a classical signal processing task and is the core of what SDRNode software is doing. This channelization process is exactly what your favorite SDR console program is doing: extract a tight band and demodulate it before sending it to your loudspeaker.

There are several methods to perform the channelization:

  1. Use a set of cascaded optimized low-pass filters followed by a decimation stage to reduce sample rate,
  2. Use FFT based convolution methods like Overlap-Add or Overlap-Save.

For Cloud-SDR the main concern is to stay efficient when multiple users want to have data from the same SDR input but with different bandwidths or different compression schemes… In this case, using multi-core processorsĀ helpsĀ to spread the processing among different threads.

But not all algorithms can be efficiently distributed on multi-core architectures…

The following figure illustrates the Multi-User Multi Filter Overlap-Save specific algorithm that was designed and developed for Cloud-SDR. This DSP technique is very unique because the mixing and decimation stages are optimized, done in one single step, at the end of the processing (low-rate).

Multi user overlap save technique in Cloud-SDR

In this method we have:

  • One unique direct FFT per incoming block,
  • A set of convolution with user specific filters (Kronecker product),
  • One inverse FFT (IFFT) per user,
  • A combined post-mix and decimate stage per user. The mix stage is required to “zero IF” the output stream, but as we are heavily over-sampled at this stage, this can be done efficiently.

Note that the exact structure varies with time: new connections add new branches, disconnected users remove branches etc. This is done in real-time while streaming without impact for remaining users. The SDR hardware device is only turned off when there is no active branch.

A typical Overlap-Save requires 2 FFT, hence with N users this leads to 2*N FFTs. The chosen technique reduces the number of FFTs and the gain is significant : we now only have 1+N FFT.

For example, with 5 users :

  • Classical methods would use 10 FFT,
  • Our method requires 6 FFT.

The good news with FFT is that it can be paralleled over multi-core processors. The GPU for example are extremely good at doing this full DSP algorithms with tremendous throughput.

You can enhance your SDRNode performances if running it on a multi-core machine by specifying how many processor cores you want to involve in the FFT steps.

To do this, edit your sdrnode.conf file and add the following :

[cpu_trimming]
FFT_THREADS=8

This will use 8 cores for the FFT. Note that a restart of the SDRNode service is required. The confirmation that the setup has been taken into account will be found in the sdrnode.log file with the following:

[13-02-2017 19:22:16.162] Multithreaded FFT on - using 8 concurrent threads.