Implementing a Synthesizer in an FPGA

Depending on how similar the two circuits are, yes that could be possible.
At 256 word deep, the M9K can store 31 coefficients multiple times, so that would not be an issue, but switching between the two circuits might get complicated, if they are not almost identical. If you don’t unroll the loop you probably have enough hardware multipliers and M9Ks to implement the two circuits separately.

Another architecture option is to design your own ALU, optimized for this problem (maybe with two or three parallel floating point multipliers and some floating point adders?) and write some “software” to sequence the events properly.

It all depends on what you enjoy most.

Personally I’d go with the two independent circuits like the one described in my previous post, although I’ve always wanted to implement a FORTH engine in an FPGA (and yes I know, it’s been done).

3 Likes

Thanks for your insights and suggestions!

In my code I use floating point arithmetic. I’ve not found the need for doubles and ultimately everything is output as integers anyway ( DA-conversion ), so the maximum numerical resolution needed is limited.

I’ve looked into the matter of choosing precision a bit by watching some tutorials on YouTube. But my gut feeling is that it takes a bit of experience to master this. Starting small with something like a simple all pole filter is the obvious choice, I guess.

I take it the pipeline processing fashion is similar to processing a buffer of data sequentially / block processing of data and waiting for the next buffer? Or is this similar to stream processing?

The M9K memories sound like the perfect solution to store temporary data.

The suggestion in your follow up post of making an ALU dedicated to the problem sounds really interesting. That is maybe not a thing I should start with, but it sounds like something I’d like to try at some point in the future.

1 Like

I think block processing is favored with traditional processors because access to external memory is orders of magnitude slower than register and cache access.

If you chose the size of the block just right and the size of the process just right, you can have all the data in the data cache and your processing’s inner loop all in the instruction cache.

Additionally, access to sequential blocks of external memory can be significantly faster than random access, so reading a block of data, processing it, and writing back a block of data, if much more efficient for traditional computers.

But block processing kind of implies gaps in the pipeline between the blocks which is sub-optimal in an FPGA. Once data is in the FPGA there is no memory access penalty, or cache hit bonus, so keeping the pipeline full is how you get the most performance.

Block processing is still often used with FPGAs as the FPGA often needs to get its data to and from external memory, but inside an FPGA, back to back block processing or stream processing is essentially the same thing. As long as the pipeline is always full the hardware is being used at maximum efficiency.

3 Likes