FPGA architecture

This week we have settle the FPGA architecture in order to choose the right FPGA. Fortunately it seems that this cyclone 3 will do the job. The main issues to tackle were the I/O count, the clocks domains, and the internal memory. So before going into details let’s define some terms:

  • We call image a whole cylinder (i.e. a whole turn)
  • We call slice a 2D image displayed by the led panel at a given position

We plan to use 256 slices per turn. Those slices are embedded into a single image sent by the SBC, but this part is quite flexible : the layout of the slices, the frequency and the blanking can be tuned. This is actually very important, because it allows us to rotate the display just by changing the order of the slices in the image send by the SBC. This means that we don’t need to store a whole image in the FPGA, which would have been impossible with the cyclone 3. Instead we jut need to store a few slices to cross the two clocks domain (RGB clock and drivers clock). So let’s describe our architecture.

General Architecture

Modules’ role and architecture

Parallel RGB

The image sent by the SoM contained all the 128 slices :

The RGB logic module will write the pixels sent by the SBC into the RAM.
Since we will only use 16 bits per LED we drop the least significant bits and just store 16 bits per pixel.

In the RAM we store a whole slice in each µBlock, thus the RGB logic module has to decode the right address for each pixel.
For instance the first pixel from 1 to 80 must be written at address 0 to 79 (if we start at 0), but the following pixels 81 to 160
must be written at address 80*48 to 80*48+79 as they are part of the second slice.

Double frame buffer

From the drivers’ perspective a slice looks like this :

In poker mode, the drivers need the bits of all the 16 LEDs, thus we dont really send a pixel, we send a column of 16 bits
of the previous image (in red). When the drivers have sent those 16 bits we have to send the next ones, thus the frame buffer is read
by 30 columns at a time ! This makes it impossible to write the next slice without destroying relevant data. Thus we need a second buffer.

The double frame buffer contains two buffers of size 80*48 pixels. One store the current slice and send it to the driver controllers while
the other read the next slice from the RAM. When the first buffer has sent all data to the driver controllers it just switch the role of the two
buffers. Whenever this module receives the new position from the encoder it starts to fill the right buffer with the next slice.

It takes exactly 512 cycles (driver_clk, up to 33MHz)  to the drivers to send the data for their 16 LEDs , so with an 8 multiplexing the frame buffer will be read in 512*8 = 4096 cycles. The second buffer will be filled in 80*48 = 3840 cycles < 4096 thus we won’t have timing issue if the frame buffer use the same clock as the driver controllers.

Another solution is to have a double frame buffer of size 128 pixels for each driver, and to use an arbitrator to handle the RAM access
(one simple solution is just to let the buffers read one after another, or broadcast the pixel to every frame buffers and let them keep the
one they are interested in).

Driver controller and driver logic

Each driver controller follow those steps:
– send 1 bit at each clk rising during 432 cycles
– wait for 80 cycles (wait for the 512th cycle)

The driver logic will send the correct LAT commands to latch the data in the driver.

The driver logic is also used to configure the drivers (at reset or when the UART tells him to do so),
it can do so by sending the correct LAT commands and replacing the pixels of the frame buffer with
the configuration data (send by UART or stored inside the module for the reset).

Encoder logic

The encoder sends the new position through SSI3:
– 1 bit per cycle (up to the 16th bit)
– then 1 bit error flag
– then we need to wait for at least 20µs before the next transaction

At 2MHz it means that we get a new position every 28µs.
We want to use 256 position at 30 fps. It means that we have
1/(256*30) = 130µs between each position, thus we don’t have timing issue.

I/O count

The cyclone 3 we choose claims to have 128 I/O, and we need 80. However many I/O are specific (PLL…) or used for configuration, thus the real count is the following:

  • 32 VREFIO, used for reference voltage, since we do single ended I/O we don’t need reference voltage and we can use those I/O
  • 6 simple I/O, they don’t have any particular function
  • 8 resistor reference I/O, we can use them too as regular I/O
  • 8 PLL I/O, those are just for output clocks
  • 58 DIFFIO, most of them can be usedas regular I/O except the one needed to configure the FPGA (We will probably use STAPL which needs the four JTAG pin)

Therefor we have more than 100 I/O available.

Leave a Reply

You can use these HTML tags

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>