Categories

[SpiROSE] Yummy voxels

Howdy!

This week, we finally fixed the LED count on our panel, though it might get modified due to PCB routing constraints. Anyways, we got away with a 83×46 display.

I also polished the renderer, and got the voxelisation working. It runs real time on the GPU with a single pass. I will now describe how this voxelization works, and show some results.

Voxelization

In my previous post, I mentioned a paper that shows a technique to voxelize an OpenGL scene in a single pass. To simplify, I will explain it with a desired “resolution” of 8x8x8, with the scene in an cube from (-1, -1, -1) to (1, 1, 1) (in OpenGL units). We represent the voxels using the bits of a texture. There we’ll need a 8×8 texture with 8 bits per pixel (thus grayscale). Each pixel represents a column, where each bit represents a voxel : a set bit means there is a voxel, while an unset one means that there are no voxel. The least significant bit represents the lowest voxel on the z axis, while the most significant one represents the highest voxel on the z axis. To know whether we have a voxel at OpenGL coordinates (x, y, z), we map each coordinate to the [[0, 8]] interval and look at the zth bit of the (x, y) pixel.

Now, voxelization. For this, we need a fragment shader. For starters, a fragment shader is a little program that runs on the GPU for each drawn pixel (a fragment) after rasterization of a triangle, that outputs the final color of the said fragment. For the same pixel, there can be multiple fragments : when several triangles get on top of each other. This shader can know about multiple properties, including the position in camera space of the fragment. By using an orthographic projection from the bottom (with the appropriate clipping planes), our xyz coordinates are unchanged and are the same in both camera space and world space.

To get the fragment color, we map the z coordinate of the fragment from [-1, 1] to [[0, 8]] (integer). This gives us the proper bit to set. We then set the corresponding bit, and all bits lower than this one. This gives us the final color of the fragment.

Courtesy of the aforementioned paper.

Now, we tell OpenGL how to combine our fragments. This is done through the XOR blending mode. When taking two fragments, OpenGL will apply a bitwise xor and use the resulting values. When two fragments overlap, only the bits between them will remain set. If the mesh is watertight, we will get an alternation of bits after each fragment encounter. Thus we get the same result as a scanline algorithm, without costly loops.

Now, to the realtime rendering. This time, the voxelization is done with a 32x32x32 resolution. To get additionnal bits per pixel, I simply used each pixel channel. Red is the bottom 8 layers, then green, then blue, then alpha is the top 8 voxels.

Voxelized suzanne w/ pizza transform

Voxelized suzanne w/o pizza transform

 

 

 

 

 

 

 

You may notice that the first one is crying. This is due to the suzanne mesh being lame : it is not watertight at the eyes, and produces some glitches which I managed to avoid for the second one.

Also, notice that on the bottom-right is the direct output of the voxelization pass. For those screenshot, a second pass was needed to visualize the result, as the raw output is hard to parse for our eyes.

Back to the pizza

You may notice that I posted a screenshot with the pizza transform (that gets then reversed in the second visualisation pass). Here is a screenshot outlining the benefit of it.

Thanks to the colors, you may be able to see all the radiuses from the center made by the voxels. These exactly depict a refresh from our rotating panel. Each refresh is a “radius slice”, which maps to a pixel column in our voxel image.

 

Outside voxels may seem extremely stretched, but this is because the transformed geometry was rendered on a 32×32 texture, giving a 32 voxels resolution along the radius, and along the perimeter. This is equivalent of having 32 refreshes from our rotating panel, which is, obviously, way too low.

However, as interesting as this transform is, it does require geometry shaders, which is in OpenGL ES core 3.2. That does drastically limit our SBC choice. Yet, some SoC do support the extension on lower versions of GL ES, since this is a very useful feature, and they may pack it without all the bells and whisles of GL ES 3.2. Note that all the above voxelization does not require any modern OpenGL features. Even GL ES 1.0 hardware can do it. For reference, the authors of the paper were rocking commodity 2008 GPUs.

Data streaming

The very first requirement of this project was to be able to stream a video from a computer to SpiROSE. However vague it may be, there are quite many steps before getting a 3D video, and data the FPGA can understand. However, this is neither the only solution nor the most interesting one. Many use cases are present:

  • We have a 2D video on a computer (Big Buck Bunny for a change). When streaming it to the display, we somehow need to project it. Be it wrapping around on a cylinder, horizontally on a single layer, or vertically on a random plane (easiest). This would still require some software on the SBC, whose job would be to translate this 2D stream into a proper thing for the FPGA.
  • We have a 3D scene. So many things can be streamed, in so many steps in the rendering pipeline.
    • Streaming the inputs. This is essentially sending the scene/mesh/… to the SBC, with it rendering in 3D and generating images for the FPGA. There the computer does nothing, except getting user input to manipulate the render. A typical application would be a game, where SpiROSE is an arcade machine.
    • Streaming the cuts. On the PC, the 3D scene would be arranged : all usual 3D transforms applied (translations, rotations, …). Then, n slices would be done along a vertical plane, each representing a refresh of the panel. This gives us a set of n 2D outlines. The resulting cut geometry would be filled and triangulated, then sent to the SBC. It would then rasterize it and forward the result to the FPGA.
    • Streaming the end render. The computer would do all the heavy lifting and generate an image stream that the FPGA can understand. Compress it, stream it, run gstreamer on the SBC, and you’re done!

Each of those have their advantage and drawbacks. The 2D one is limited, but is trivial to use. Onto the 3D scene, the first is the easiest one on bandwidth. However, we are limited by what is programmed into the SBC, just like an arcade machine is locked to a single game; but this may also be an advantage since SpiROSE can run on its own, with the ability to be interacted with.

The second option looks really nice. However, the cutting thing is CPU-only as the resulting geometry shall be sent to the SBC. That means it will be hard to run it on the SBC, and impossible to run on a GPU. However, it is really light on bandwidth and on onboard computations. But it also forbids streaming any kind of bitmap (2D video).

Last option is really nice, since we can record a video of the output, and simply stream it as with the first 2D option. However, bandwidth is a real concern, and compression might end up … messy, to say the least. The issue is the hardware decoder of a SBC, that is incapable of pushing more than 60 frames per second, which mean we cannot encode a panel refresh as a video frame : we need to multiplex them on a single video frame. However, video codecs really don’t like discontinuities, and 256 seemingly independent streams on a single frame is too much for them. Either the final size is larger than the raw video, or everything gets blurred out. Moreover, realtime H264/H265 compression is not a good idea, since those codecs may do a lot of backannotations. For proper compression, we’d have ~1s of delay added, which is way too much for, say, a game.

So, we still have to decide which route to go (well, 2D video is kinda mandatory).

SBC / FPGA communication

Last week, I spoke about HDMI -> parallel RGB bridges. These chips have an issue, being the low information availability about them. It is pretty hard to tell whether the chip will output bursts of information when an HDMI frame comes in, or if it will buffer it to output a slower, steadier data stream. This matters, because routing 24 traces @168MHz is not exactly fun. This is why we are exploring 2 routes :

  • SBC with integrated RGB output (aka MIPI-DPI). Since the SoC generates the signal, it will be much easier to control its timing. For example, the i.MX6 SoC is very flexible on it. (it is the only one I had time to analyse, as this kind of information is hard to find).
  • Some kind of memory interface (GPMC/other), the same way ROSEace did. However, those interfaces are harder and harder to find, where only the Gumstix SBCs have one, but it is too slow. The only other SoC (I found) still offering a similar interface are the i.MX6 series, with their EIM (External Interface Module). Problem is, this kind of interface is becoming obsolete and being replaced by PCIe. But that’s out of the question.

TODO

Next week, I’ll continue analysing SoCs to find one with a flexible RGB interface, that could keep signals not too quick (hello signal intergrity).

I will also continue that renderer, where I’ll interlace the resulting voxelized output, to get a mosaic of panel refresh : a single video frame being a whole SpiROSE frame, embedding 256 LED frames.

See you next week 🙂

Leave a Reply

You can use these HTML tags

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>