Sometimes, we find ourselves in a situation where we have a function that needs to accept a number of inputs and produce a variable number of outputs per input. In the particular case where we have hundreds of inputs and a handful of outputs per input, a reliable approach likely to last a good while is to apply two complementary techniques: sort by type and linear processing.
So, for some MultiOp
function, we'll seperately allocate two regions of memory: one region for the outputs per input, and another for integers to count the outputs per input. We'll then process the outputs in a completely linear fashion with some help from the counters.
Let's illustrate this concretely with some code:
Output* outputs = (Output*) Allocate(max_outputs); // In most practical cases, max_outputs can be computed before processing begins
uint32_t* output_counts = (uint32_t*)Allocate(input_count);
MultiOp(output_counts, outputs, inputs, input_count);
// Future-self will thank you for this cursor
Output* output_cursor = outputs;
// Iterate the headers, one per input
for (uint32_t icount = 0; icount < input_count; ++icount)
{
// Process the outputs for this input
uint32_t output_count = output_counts[icount];
for (uint32_t ioutput = 0; ioutput < output_count; ++ioutput)
{
// Do something (preferably awesome) with inputs[icount] and output_cursor[ioutput]
}
// Advance cursor
output_cursor += output_count;
}
This approach has a few nice properties:
It also has a few downsides:
outputs
can get largemax_outputs
can sometimes be well above what "typical_outputs
" would beIn most cases, though, the downsides are well worth the upsides, especially compared to the live alternatives, and it's pretty easy to implement and get right to boot!
-- Vitor Menezes (Associate Engine Programmer)