Compile Time Frequency Splitting

About a year ago, I watched a presentation given at ADC 21 by Matthew Robbetts that discusses the usage of the standard template library to generate DSP structures at compile time. If you have the time it’s an interesting watch. The talk is centered primarily around embedded devices, but it got me thinking about how similar strategies could be applied to an audio plugin (related: Compile-time signal chains in c++). This seemed like a cool way to accomplish two things: create a multi-band audio plugin and write some cool template code. The end product is Fumigate, Vain Audio’s newest vst3 plugin. Fumigate is a multi-band stereo expansion effect plugin which uses c++ template meta programming to generate an N-way frequency splitter along with N band effect processors.

 

The Goal

The goal from the beginning was to design and write the plugin in such a way that the number of processing bands would depend entirely on one compile time integer. Changing the total band count would result in the generation of a different signal flow, different parameters, and different ui elements.

TotalBandCount = 5 TotalBandCount = 16

Having more than four or five bands generated at compile time or otherwise is objectively not useful in this type of plugin. Since manually initializing an array of four processors is trivial (and increasing from four to five processors is even simpler), why bother templatizing this at all? Because why not.

Some things can’t be done at compile time. juce::Identifiers for example, are run time objects. Creating and accessing parameter ids would have to be done at run time. The signal flow and processors would be the main focus of the compile time effort.

Individual Band Processors

The individual band processors are made up of basic effect processors held within templated container processors. Each container class performs another basic task like mixing or enabling. The container processors take a contained Processor type and an offset into the BandParams struct (a juce::Identifier pointer) as their template arguments. This allowed us to create mixers and enablers that were controlled by any parameter in the BandParams struct. The stripped down example below shows how parameter identifiers are templated in our band processors:

				
					template <typename Processor, const juce::Identifier BandParams:: *Parameter>
class ExampleTemplateProcessor
{
public:
    explicit MixProcessor(int bandNumber)
        : m_processor(bandNumber)
    {
        // Params::Band(int) returns a reference to a BandParams struct
        const juce::Identifier parameterId = Params::Band(bandNumber).*Parameter;
    }

private:
    Processor m_processor;
}

				
			

These band processors are then chained together to create more complex processors. The final band processor is pretty straightforward albeit a little ugly:

				
					class MainBandProcessor
{
	EnabledProcessor<MixProcessor<MidSideProcessor, &BandParams::MidSideMix>, &BandParams::MidSideEnable> m_midSideProcessor;
	EnabledProcessor<MixProcessor<HaasProcessor,	&BandParams::MidSideMix>, &BandParams::MidSideEnable> m_haasProcessor;
	EnabledProcessor<MixProcessor<ChorusProcessor,  &BandParams::ChorusMix>,  &BandParams::ChorusEnable>  m_chorusProcessor;
}

using FumigateBandProcessor = SoloProcessor<BandCountBypassProcessor<Enableable<MuteProcessor<Mixable<MainBandProcessor>>>>>;

				
			

Note: m_midSideProcessor and m_haasProcessor are being mixed and enabled by the same parameter, they should probably be combined.

Frequency Splitter Architecture

The entire signal flow can be separated into four parts: splitting, all passing, band processing, and summing.

The first stage is implemented with a cascading set of Linkwitz Riley Filters. Each of the filters strips high frequency information out of the signal and sends it to an Endpoint in the AllpassBank. Remaining low frequency information is passed along to the next filter. Both outputs from the last filter are written into the AllpassBank. The user can select a number of processing bands which determines which of the filters receives the incoming signal first. Single band is special cased and bypasses the entire frequency splitter.

The next stage is the AllpassBank. The AllpassBank is a series of allpass filters that correct the phase of the signals in each processing band. Each of the Endpoints written to by the Linkwitz Riley Filters is fed through allpass filters depending on which band the output came from. The correct number of allpass filters is determined by a partial sum beginning at 3. After allpassing, the bands are each processed with their respective band processors before being summed together and returned to the DAW.

Implementation

Making the above structure generate at compile time took some doing. Most of the code that creates the processor arrays is partially cannibalized from a Stack Overflow post. The code below initializes an array of any object which takes its index as its first constructor argument.

				
					template<typename Processor, std::size_t Size, typename ... T>
static constexpr auto CreateArray(T &... args)
{
      const auto tuple = std::forward_as_tuple(args ...);
  	return CreateArrayImpl<Processor>(tuple, std::make_index_sequence<Size>{});
}

template<typename Processor, typename TupleType, std::size_t ... I>
static constexpr auto CreateArrayImpl(TupleType tuple, std::index_sequence<I ...> seq)
{
    	return std::array<Processor, seq.size()>{ ConstructFromTuple<Processor>((int)I, tuple) ... };
}

template<typename Processor, typename ... T>
static constexpr auto ConstructFromTuple(int band, std::tuple<T ...> &tuple)
{
    	return Processor{ band, std::get<T>(tuple) ... };
}
				
			

The top function CreateArray first packs each of the T &… args into a single tuple to be forwarded along. Packing the arguments into a tuple is important in order to use two sets of variadic template arguments in a single template function. The second variadic argument is an index sequence.

				
					std::make_index_sequence<5> => std::index_sequence<0, 1, 2, 3, 4>
				
			

The sequence <0, 1, 2, 3, 4> inside the std::index_sequence matches against the std::size_t … I template argument in CreateArrayImpl. This allows the indexes to be expanded and passed along as a regular int argument to each call to ConstructFromTuple. ConstructFromTuple unpacks the tuple initially created in CreateArray into the constructor for each Processor. This processor is then returned in place to the array (shoutout RVO)

Initializing arrays with CreateBandArray looks like this:

				
					std::array<FumigateBandProcessor, StaticParams::TotalBandCount> bandProcessors{
    TemplateHelpers::CreateArray<FumigateBandProcessor, StaticParams::TotalBandCount>(parameterAccess)
}

// ... translates to:

std::array<FumigateBandProcessor, 5> bandProcessors{
     FumigateBandProcessor{ 0, parameterAccess },
     FumigateBandProcessor{ 1, parameterAccess },
     FumigateBandProcessor{ 2, parameterAccess },
     FumigateBandProcessor{ 3, parameterAccess },
     FumigateBandProcessor{ 4, parameterAccess },
};
				
			

Not very different but slightly easier on the eyes.

A Thought

Theoretically, packing the objects you’re using into the same area of memory should increase performance. If this is true the template code I suffered through writing could have an actual benefit for the plugin. After completing the template code, I created a branch where each Processor is created in a std::unique_ptr, as if it was created dynamically. These are benchmarks for the frequency splitter processor:

Compile time generation: Run time generation:
163923 ns 164958 ns

These results were consistent. The version of the frequency splitter in which all of the data structures are near each other in memory is around 0.6% faster.

A monumental victory.