Categories

[SpiROSE] Routing voxels

Howdy!

Good to see you again after the holidays, Christmas, New Year’s Eve, alcohol, …

I haven’t written here in a while, so I’ll do a large combined post of the week before the holidays, the holidays and this week (i.e. the week after the holidays).

Routing the rotative base

Before the holidays, I did most of the place and route of the rotative base. Obviously, it ended up being heavily modified around the board-to-board connectors. The pinouts (and even the shape!) of those connectors were modified several times, due to space constraints.

In the end, this is our rotative base PCB. This is a 4 layers, 20x20cm board that we now call “The Shuriken” due to its odd shape.

Right now, it is in fab, but with some delay due to a bug in the board house website.

Voxelization

Now that the renderer has a working PoC, it was time to wrap it in something more flexible and reusable.

Introducing libSpiROSE, which allows you to turn any OpenGL scenery in voxel information usable by SpiROSE. It can output both to the screen (which will by piped to the FPGA through the RGB interface), or to a PNG file.

For example, this is actual code that voxelizes a cube to then dump it in a PNG :


#include <iostream>
#include <spirose/spirose.h>
#include <glm/gtx/transform.hpp>

#define RES_W 80
#define RES_H 48
#define RES_C 128

int main(int argc, char *argv[]) {
glfwInit();
GLFWwindow *window = spirose::createWindow(RES_W, RES_H, RES_C);
spirose::Context context(RES_W, RES_H, RES_C);

float vertices[] = {1.f, 1.f, 0.f, 0.f, 1.f, 0.f, 1.f, 0.f,
0.f, 0.f, 0.f, 0.f, 1.f, 1.f, 1.f, 0.f,
1.f, 1.f, 0.f, 0.f, 1.f, 1.f, 0.f, 1.f};
int indices[] = {3, 2, 6, 2, 6, 7, 6, 7, 4, 7, 4, 2, 4, 2, 0, 2, 0, 3,
0, 3, 1, 3, 1, 6, 1, 6, 5, 6, 5, 4, 5, 4, 1, 4, 1, 0};
spirose::Object cube(vertices, 8, indices, sizeof(indices) / sizeof(int));
// Center the cube around (0, 0, 0)
cube.matrixModel = glm::translate(glm::vec3(-.5f));

// Voxelize the cube
context.clearVoxels();
context.voxelize(cube);

// Render voxels as slices
context.clearScreen();
context.synthesize(glm::vec4(1.f));

context.dumpPNG("cube.png");
return 0;
}

This leads to this :

Not impressive, but the code to generate it is all contained above.

Now, add an MVP matrix, a few rotations to the cube and swap context.synthesize with context.visualize and you’ll get this :

Next week

We have a first PCB that arrived from fab this week, so we’ll assemble it next week. However, this is the LED panels, sooooooo that’s going to take a long time.

We will also focus on building actual demos, now that we have libSpiROSE in an usable state.

See you next week!

[SpiROSE] OpenGL ES and mainboard

Howdy :]

OpenGL ES

The first half of this week was dedicated to porting the renderer to OpenGL ES, which I managed to pull off. In the end, the port was really easy, as the desktop version used OpenGL 3.3, which is very close to OpenGL ES.

The renderer PoC running on the wandboard

You may notice there are no 3D visualisation in the middle. This is actually expected, as this view relied on geometry shaders, not available on this GPU. Still, I did not waste time re-implementing it as the only thing that matters are the white things on the bottom left hand of the window.

However, some issues do remain. Those issues only happen on the Wandboard with the Vivante drivers. Any other GL ES environment is working just fine, without those issues (for example, the VMware Mesa driver that now supports OpenGL ES 3.2).

Mainly, I have uniforms that are not found for a shader stage, and steadily returns -1, which is “no uniform found” in GL terminology. strange thing is, only this specific shader is problematic.

Furthermore, the keen-eyed amongst you may have noticed a small glitch on the above picture. The top-right hand slice is missing a pixel column, effectively cutting poor Suzanne’s ear.

What is infuriating is, neither the glitch nor the missing uniforms happen on my Arch Linux VM, where the VMware OpenGL ES driver is consistent and reliable.

Mainboard

This week I also seriously attacked the mainboard and finished up the schematics. As a quick recap, this board is what we called previously the “rotative base”. Its main features are :

  • SBC
  • FPGA
  • Hall Effect sensors
  • Power supplies:
    • 5V (SBC)
    • 4V (LEDs)
    • 3V3 (Drivers, buffers)
    • 3V (FPGA IO)
    • 2V5 (FPGA IO and FPGA PLL)
    • 1V2 (FPGA core)
  • Board-to-board connector to link to the LED panels

Phew!

PSU

I made the PSU with some help on switching ones from Alexis. You may have noticed, we may have gone overboard with the supply rails (man, 7 different voltages on a single board!). However, each have a purpose, and I’ll only bother explaining the two odd ones:

  • 4V: the LEDs are driven to ground by the drivers. Said drivers dissipate the excess voltage themselves. Thus, to avoid overheating, we chose a supply voltage for the LEDs as close to their forward voltage as possible.
  • 3V: The FPGA I/Os can work up to 3V3. However, they are poorly protected, and some overshoot on a 3V3 signal might kill the IO ping. (to be short, the protection diode used by altera is too weak). That’s why the banks which output any signal are using 3V, which is still readable by the buffers and the drivers, while being much more tolerant to overshoot, and other signal integrity problems.

Now, upon its architecture. A picture is worth a thousand words:

As for power, well… With overdrive, at the maximum power of the drivers, we can chew through 22A of current per panel. That’s 44A of blinky goodness. Oooops. Well, fortunately, people already have had this problem, and Ti happens to have a wonderful buck converter module (PTH12040WAD) that can deliver up to 50A. Fantastic!

Place and route

This turned up to be much harder than I expected. It is quite a mess, for one simple reason: the LED panels are identical, but back to back. This means our board-to-board connectors are parallel, but opposite of each other. Most of the signals that go to one go to the other. This means that most of the signals have to cross. Oooopsie!

However, this is getting together. Placement is mostly done, except for a few components that can be tucked anywhere. And most submodules are, internally placed and routed. Think of buck converters, that once grouped and routed, can be moved around, almost like a single component.

The board is quite spacious because of the SoM that takes quite a lot of space. The smallest bounding circle I could do has a diameter of 175mm.

Next week

Simple: moar routing. Right now, this looks more like the giant spaghetti monster than a PCB….

[SpiROSE] Supplying power voxels … Wait watt?

Okay this may not be obvious from the title, but I mainly worked on to parts of the project this week: a new voxelization algorithm and the power supply.

Voxels

To my great surprise, OpenGL ES GPUs do not support integer operations. This include logic operations. Especially between fragments. Do you see where I am going? If you recall my post regarding the voxelization algorithm, I rely on XOR-ing two fragments to voxelize a column. The OpenGL ES standard does not defines the glLogicOp functions, which is the one I need to XOR my fragments. Bummer.

However, there is a solution. We can emulate a bitwise XOR on n bits by doing n one-bit XORs. But how can we do XOR without XOR? If we look at a truth table, we see that XOR is equivalent to an addition where we keep only the LSB. Indeed, the LSB of 1+1 is 0. Lucky for us, OpenGL has a feature, called the blend mode, that allows to do just that. Now, for every bit of our output voxel texture, we are doing a bitwise add of the fragments.

But there is a huge downside to this method. This method now required a whole byte of data for each layer, where the XOR method required a single bit. Or, to be accurate, this method required a whole color channel per layer. This means we now need several output textures for the whole scene, while a single one was enough for up to 32 layers (32 bpp) previously. Fortunately, even older GL ES did support multitarget rendering (i.e. writing to several textures in a single pass), but with limitations. Our wandboard has the limitation of 16 output textures, which gives 64 total layers (1 layer / channel, 4 layers / texture).

Anyways, we now have this version ready to rock, where the result is basically indistinguishable from the XOR version.

XOR-less version. Notice that there are now 8 voxel textures on the bottom left, each storing 4 voxel layers.

XOR version, for reference

 

As another acheivement, I got OpenGL ES to work reliably on the SBC, which proved tricky, especially with the broken package for the library I use. I use GLFW as a context-creator (awesome library!), but the version that ships with Ubuntu 16.04 (official linux flavor provided for this board) is utterly broken, with wrong includes, … So much for an LTS distribution!

Furthermore, porting this app to OpenGL ES on the wandboard is in progress, and is looking good so far. Except I have no output. OpenGL magic I guess. Anyhow, OpenGL ES is much more finicky than its desktop counterpart (especially Nvidia’s one), which makes porting very tedious. Oh and working remotely is not the easiest of the tasks to debug graphic apps…

Power supply

We estimated the total worst-case-scenario-if-everything-blows-up current consumption. We are looking at :

  • LEDs: 44A
  • FPGA: 700mA
  • LED drivers: 840mA
  • Clock buffers: 120mA
  • SoM: 2A

Some of those figures are more empiric than anything: FPGA comes from Altera’s excel document calculator thingy, and the SoM comes from an overkill stress test (CPU+GPU+WiFi).

To make the drivers drop the least possible power, LEDs will be powered from a 4V rail, not too far off from their 3.4V forward voltage. Beefy buck DC-DC converters are needed!

Then next big hog is the SoM, which will get his own dedicated buck DC-DC converter, supplying 5V straight away.

All the rest will be fed from a 3V3 buck converter, which will be downstepped to others voltages for the remaining components (3v, 2v5, 1v2).

A single 12V supply will feed all this mess through the various DC-DC converters. We are looking at 190 odd watts.

What’s next?

Next week, I’ll finish the PSU schematics and we’ll be able to route the main rotative PCB, containing PSU, SoM and FPGA (to name a few).

I will also continue to port that damn renderer to GL ES.

Thanks for reading 😀

[SpiROSE] Oh My Schematic

Not a lot to talk about, except that Adrien and I finally began the schematics for the rotative base. This PCB is the one hosting the SoM, FPGA and power supplies. We also determined the exact signal count going from this PCB to the PCB hosting the LEDs and their drivers.

Speaking of power supplies, we’ll need beefy ones. As a worst-case calculation with extreme currents for the LEDs, a single color can eat up to 100mA. That’s 0.1 * 3 * 80 * 48 / 8 = 144A of current. The /8 comes from the 8-multiplexing we are doing with the LEDs.

This week: finishing the schematics and fixing OpenGL.

[SpiROSE] SBC testing and test LED panel

This week was kinda quiet as far as the project went, due to the Athens week. However, this did not stop me from doing some actual testing on the SBC and the LEDs.

RGB on the SBC

First off, since we’ll be using the parallel RGB interface to transfer data to the FPGA, testing the flexibility of this bus was a must. Parallel busses get tricky once you get above 25 MHz. However, RGB is a simple display interface with basic control signals. Added to the 24 data bits, you have PCLK (pixel clock), HSYNC and VSYNC. The latter two are used to frame the actual image, with blanking intervals at the end of each line, and blanking intervals after the actual image. Because RGB is sort of a digital VGA, it uses the same timings, that are inherited from the CRT days. With those, you needed long blankings to let the electron beam go to the next line, wasting some precious bandwidth. For example, a 800 px wide image would have 224 wasted pixels per line on blanking.

However, an FPGA doesn’t care about electron beams, thus reducing those blankings to their minimum is essentials. As it turns out, the IPU of the i.MX 6 is very flexible, and allows us to set those blankings arbitrarily. A 1 px horizontal blanking and 1 line vertical blanking is feasible, and has been confirmed by measuring the output of the RGB interface.

Measuring RGB sync signals using poor’s man frequency meter

The measurement was accomplished using 3 STM32F103 boards (chinese clones of the Maple Mini, worth $2 a pop on eBay).

But what kind of frequencies can we expect? Well, our display being an LED matrix of 80×48, refreshing 256 times/turn at 30 turns/second, the total pixel clock is 80x48x256x30 = 29.5 Mpixels/s; thus a pixel clock a tad above 29.5 MHz (remember that cycles are wasted after each line, thus a slight overhead). Ouch, too high. But Wait! Our LED matrix is a whole slice of the cylinder along the diameter, not the radius. This means that, in a half-turn, we covered all of our display space, with the second half of the turn using the same pixel data. Great, we have our bandwidth reduced to a “mere” 14.75 MHz, which is much easier to route and work with.

But how to set those blankings exactly? I’m glad you asked! Linux drivers are well made for the i.MX 6, and once the LCD/RGB output has been enabled in U-Boot, a custom modeline in xrandr will do the trick. In our case, a resolution of 1024×480 exactly matches our pixel count, giving us a resolution of 1025×481 on the RGB (counting minimum blankings). As of frequencies, at 30 fps, this gives a pixel clock of 1025x481x30 = 14.79 MHz. This value shows that extra-reduced blankings are negligible for our application.

Corresponding measurements are the following :

From top to bottom : PCLK, VSYNC, HSYNC

Here is the modeline for this specific resolution:

Modeline "1024x480Z@30" 14.79 1024 1024 1025 1025 480 480 481 481

A few xrandr commands and you’re all set!

LEDs

As FPGA development began and time advances, it was critical to validate the LEDs, and that our drivers are working. With the help of Alexis, we soldered samples drivers to breakout boards (QFN 56 is no trivial task) in order to test them.

However, what is a driver without any LEDs? Since 50 LEDs were ordered, I designed a very simple PCB that could be fabricated at school to test both the LEDs themselves and the multiplexing we’ll be doing. A single driver controlling 16×8 RGB LEDs, 50 units would not cut it, thus I only put a full column (16 leds) and a full line (8 multiplexed LEDs) with the required control logic (8 MOSFETS, here AO3401 I had laying on my desk). This only is an L shape, but allows us to test and develop the driver driver (no that’s not a typo). Oh and lots of pin headers for the wiring.

Bare board with small SMD parts soldered

I can’t say it enough, but kudos to Alexis for soldering all those components. For reference, the LEDs are 1.13×1.13 mm and the resistors are standards 0603.

Soldered PCB, wired up to the driver (not the blue board, which is yet another STM32 board)

Regarding brightness, I dare you to look straight at a single one of those LEDs for more than a few seconds. We did tests at extremely low brightness in an attempt to simulate the effective brightness once the panel would rotate. Even at 1.5% brightness, a single LED was easy enough to see in a brightly lit room. Furthermore, considering the density we have, I have no real concern at the moment regarding brightness.

What to do next?

Next week, I’ll try to build and run OpenGL apps on the SBC, and confirm that the voxelization algorithm can be run with OpenGL ES. It appears to be impossible without desktop OpenGL. If you recall my post about the algorithm, I use bitwise operands to combine different fragments. However, this is a feature exclusive to OpenGL. The rationale behind it (according to Khronos) is that most embedded GPUs lack a proper integer manipulation unit. Thus it is not part of the GL ES spec (for example, the glLogicOp function lacks, which is the function used to set the bitwise operation to logic XOR).

Thus I’ll either try to find a workaroud, or totally redesign the rederer, mainly on Alexandre’s ideas.

I will also benchmark the SBC in term of CPU power, to estimate if a 100% CPU implementation of the current algorithm is possible (spoiler: I doubt it).

[SpiROSE] Moar render

SoM – Finally the final choice

This week, we finally chose an appropriate SoM along all those I did find. Well, actually, only one did match the requirements :

  • Onboard Wifi
  • GPU
  • Parallel RGB and/or fast GPMC-like interface
  • No “contact us” bullcrap to get one

As it turns out, those criterias are so specific that only the WandBoard did match our requirements. The wifi kicked the FireFly boards out, GPU removed all FPGA SoC, Parallel RGB removed RK3399 based boards. Fast GPMC removed all Gumstix ones, and the “contact us” threw away Variscite and Theobroma.

Phew. Thus it has been ordered on friday. We expect it to arrive as soon as possible.

Rendering – pipeline mostly done

I also refined the rendering side of things, by generating the image that would be sent by the SBC to the FPGA.

In the above picture, the bottom left corner still has the voxel texture, but also the end goal, with the white things. Those are the 32 slices of the voxelized suzanne along a vertical plane.

Those slices (and micro images) are effectively a refresh of the rotating panel each. Thus, the FPGA will only have to cherry pick the proper subpart of the image to refresh the LED panel.

This sliced version is generated in a pixel shader from the voxel texture (the blue thing on the bottom left).

Next week

We’ll (hopefully) get to play on the SBC. Despite having an Athens week, I’ll try to port this renderer to OpenGL ES, which may not be very trivial.

[SpiROSE] Yummy voxels

Howdy!

This week, we finally fixed the LED count on our panel, though it might get modified due to PCB routing constraints. Anyways, we got away with a 83×46 display.

I also polished the renderer, and got the voxelisation working. It runs real time on the GPU with a single pass. I will now describe how this voxelization works, and show some results.

Voxelization

In my previous post, I mentioned a paper that shows a technique to voxelize an OpenGL scene in a single pass. To simplify, I will explain it with a desired “resolution” of 8x8x8, with the scene in an cube from (-1, -1, -1) to (1, 1, 1) (in OpenGL units). We represent the voxels using the bits of a texture. There we’ll need a 8×8 texture with 8 bits per pixel (thus grayscale). Each pixel represents a column, where each bit represents a voxel : a set bit means there is a voxel, while an unset one means that there are no voxel. The least significant bit represents the lowest voxel on the z axis, while the most significant one represents the highest voxel on the z axis. To know whether we have a voxel at OpenGL coordinates (x, y, z), we map each coordinate to the [[0, 8]] interval and look at the zth bit of the (x, y) pixel.

Now, voxelization. For this, we need a fragment shader. For starters, a fragment shader is a little program that runs on the GPU for each drawn pixel (a fragment) after rasterization of a triangle, that outputs the final color of the said fragment. For the same pixel, there can be multiple fragments : when several triangles get on top of each other. This shader can know about multiple properties, including the position in camera space of the fragment. By using an orthographic projection from the bottom (with the appropriate clipping planes), our xyz coordinates are unchanged and are the same in both camera space and world space.

To get the fragment color, we map the z coordinate of the fragment from [-1, 1] to [[0, 8]] (integer). This gives us the proper bit to set. We then set the corresponding bit, and all bits lower than this one. This gives us the final color of the fragment.

Courtesy of the aforementioned paper.

Now, we tell OpenGL how to combine our fragments. This is done through the XOR blending mode. When taking two fragments, OpenGL will apply a bitwise xor and use the resulting values. When two fragments overlap, only the bits between them will remain set. If the mesh is watertight, we will get an alternation of bits after each fragment encounter. Thus we get the same result as a scanline algorithm, without costly loops.

Now, to the realtime rendering. This time, the voxelization is done with a 32x32x32 resolution. To get additionnal bits per pixel, I simply used each pixel channel. Red is the bottom 8 layers, then green, then blue, then alpha is the top 8 voxels.

Voxelized suzanne w/ pizza transform

Voxelized suzanne w/o pizza transform

 

 

 

 

 

 

 

You may notice that the first one is crying. This is due to the suzanne mesh being lame : it is not watertight at the eyes, and produces some glitches which I managed to avoid for the second one.

Also, notice that on the bottom-right is the direct output of the voxelization pass. For those screenshot, a second pass was needed to visualize the result, as the raw output is hard to parse for our eyes.

Back to the pizza

You may notice that I posted a screenshot with the pizza transform (that gets then reversed in the second visualisation pass). Here is a screenshot outlining the benefit of it.

Thanks to the colors, you may be able to see all the radiuses from the center made by the voxels. These exactly depict a refresh from our rotating panel. Each refresh is a “radius slice”, which maps to a pixel column in our voxel image.

 

Outside voxels may seem extremely stretched, but this is because the transformed geometry was rendered on a 32×32 texture, giving a 32 voxels resolution along the radius, and along the perimeter. This is equivalent of having 32 refreshes from our rotating panel, which is, obviously, way too low.

However, as interesting as this transform is, it does require geometry shaders, which is in OpenGL ES core 3.2. That does drastically limit our SBC choice. Yet, some SoC do support the extension on lower versions of GL ES, since this is a very useful feature, and they may pack it without all the bells and whisles of GL ES 3.2. Note that all the above voxelization does not require any modern OpenGL features. Even GL ES 1.0 hardware can do it. For reference, the authors of the paper were rocking commodity 2008 GPUs.

Data streaming

The very first requirement of this project was to be able to stream a video from a computer to SpiROSE. However vague it may be, there are quite many steps before getting a 3D video, and data the FPGA can understand. However, this is neither the only solution nor the most interesting one. Many use cases are present:

  • We have a 2D video on a computer (Big Buck Bunny for a change). When streaming it to the display, we somehow need to project it. Be it wrapping around on a cylinder, horizontally on a single layer, or vertically on a random plane (easiest). This would still require some software on the SBC, whose job would be to translate this 2D stream into a proper thing for the FPGA.
  • We have a 3D scene. So many things can be streamed, in so many steps in the rendering pipeline.
    • Streaming the inputs. This is essentially sending the scene/mesh/… to the SBC, with it rendering in 3D and generating images for the FPGA. There the computer does nothing, except getting user input to manipulate the render. A typical application would be a game, where SpiROSE is an arcade machine.
    • Streaming the cuts. On the PC, the 3D scene would be arranged : all usual 3D transforms applied (translations, rotations, …). Then, n slices would be done along a vertical plane, each representing a refresh of the panel. This gives us a set of n 2D outlines. The resulting cut geometry would be filled and triangulated, then sent to the SBC. It would then rasterize it and forward the result to the FPGA.
    • Streaming the end render. The computer would do all the heavy lifting and generate an image stream that the FPGA can understand. Compress it, stream it, run gstreamer on the SBC, and you’re done!

Each of those have their advantage and drawbacks. The 2D one is limited, but is trivial to use. Onto the 3D scene, the first is the easiest one on bandwidth. However, we are limited by what is programmed into the SBC, just like an arcade machine is locked to a single game; but this may also be an advantage since SpiROSE can run on its own, with the ability to be interacted with.

The second option looks really nice. However, the cutting thing is CPU-only as the resulting geometry shall be sent to the SBC. That means it will be hard to run it on the SBC, and impossible to run on a GPU. However, it is really light on bandwidth and on onboard computations. But it also forbids streaming any kind of bitmap (2D video).

Last option is really nice, since we can record a video of the output, and simply stream it as with the first 2D option. However, bandwidth is a real concern, and compression might end up … messy, to say the least. The issue is the hardware decoder of a SBC, that is incapable of pushing more than 60 frames per second, which mean we cannot encode a panel refresh as a video frame : we need to multiplex them on a single video frame. However, video codecs really don’t like discontinuities, and 256 seemingly independent streams on a single frame is too much for them. Either the final size is larger than the raw video, or everything gets blurred out. Moreover, realtime H264/H265 compression is not a good idea, since those codecs may do a lot of backannotations. For proper compression, we’d have ~1s of delay added, which is way too much for, say, a game.

So, we still have to decide which route to go (well, 2D video is kinda mandatory).

SBC / FPGA communication

Last week, I spoke about HDMI -> parallel RGB bridges. These chips have an issue, being the low information availability about them. It is pretty hard to tell whether the chip will output bursts of information when an HDMI frame comes in, or if it will buffer it to output a slower, steadier data stream. This matters, because routing 24 traces @168MHz is not exactly fun. This is why we are exploring 2 routes :

  • SBC with integrated RGB output (aka MIPI-DPI). Since the SoC generates the signal, it will be much easier to control its timing. For example, the i.MX6 SoC is very flexible on it. (it is the only one I had time to analyse, as this kind of information is hard to find).
  • Some kind of memory interface (GPMC/other), the same way ROSEace did. However, those interfaces are harder and harder to find, where only the Gumstix SBCs have one, but it is too slow. The only other SoC (I found) still offering a similar interface are the i.MX6 series, with their EIM (External Interface Module). Problem is, this kind of interface is becoming obsolete and being replaced by PCIe. But that’s out of the question.

TODO

Next week, I’ll continue analysing SoCs to find one with a flexible RGB interface, that could keep signals not too quick (hello signal intergrity).

I will also continue that renderer, where I’ll interlace the resulting voxelized output, to get a mosaic of panel refresh : a single video frame being a whole SpiROSE frame, embedding 256 LED frames.

See you next week 🙂

[SpiROSE] Pizza

Last week was … eventful. After getting turned down again and again by the mechanic, we finally arrived at a design that might work. Simply put, a stack of ROSEace won’t cut it (haha), but a big ol’ plate à la HARP might work. At least the mechanic is okay with it ¯\_(ツ)_/¯

Renderer

Anyways. I searched for algorithms suitable for the renderer we intend to write. Its job would be double :

  • Voxelize a 3D scene
  • Apply what I will call the Pizza Transform™ from now on

Voxelization

To my surprised, the voxelization is really easy to do, even on a GPU. I found a paper performing real time voxelization (Single-Pass GPU Solid Voxelization for Real-Time Applications by Elmar Eisemann). It is based on OpenGL Xor blending mode. Wonderful, we’ll have a renderer that will work even on complex OpenGL scenes NOT meant to be voxelized. The only constraint is for the scene to be mathematically watertight (along one axis is enough).

Pizza Transform

Now, to the Pizza Transform™. Remember that we have a circular display, where voxels are not square but round-ish rectangles ? It would be better if we could “unroll” the circular image into a nice cartesian matrix. But first, why’d we want this ?

Think about ROSEace. What they did is lay down a square image on their circular display. Think a tablecloth on a round table. Then, for each blade position, they took the pixel just under each LED. This is pretty ineffective, as well as inaccurate : you waste data by not using the corners, and you risk using the same pixel twice for two different voxels.

Even by using a smarter yet harder way, and not refresing all LEDs on the blade at the same time to keep each voxel roughly the same length, you still end up wasting space and reusing pixels.

This is a simulation done using Python and Excel for a 88×88 image. Gray pixels are unused pixels, green are used exactly once, and red are used more than once. On the left is the simple way (ROSEace), right is the more complicated one. Both waste roughly the same amount of pixels (circa 32%), and neither has a perfect 1:1 mapping of a voxel to a texel.

Enter the pizza transform. Its name derives to a simple way of explaining what we want to acheive. Keep in mind we are making a renderer, so output images may not necessarily be cartesian, and we have unlimited resolution on the input.

Take a pizza (we have a 3D model of it). You may want to display it in its glorious 3D :

However, bandwidth is sparse and you don’t want to waste anything, especially not any details by missing stuff and replicating voxels ! Take a knife, cut it along a radius (say, from middle to bottom). Now, stretch it into a rectangle with crust only on one edge :

That’s our transform. Back to the renderer. We only need to do this with our whole OpenGL scene and voxelize it. The first half is kinda done, as I wrote a Pizza Proof of Concept, that generates the cut and transform an OpenGL scene in real time, using a single Geometry Shader.

On the rightmost half of the middle circle, you can see a vertical line. This is our vertical cut that allows us to unwrap the mesh into what we have on the top right. The result may look weird, but it is due to the center of the circle not being at the origin, where the cut happens. Note that the render is in wireframe only to show the cut.

SBC – FPGA

Yes, we are using an FPGA after all. Who would have believed it to be easier ? Anyways, Ethernet and the likes are out of the question.

Wait … We’re doing GPU rendrering … Straight out taking the video output to the FPGA would be perfect ! Oh, any IPs for an HDMI, DisplayPort or the likes would cost an arm, a leg and your cat 🙁 Luckily for us, the RGB protocol is trivial to use on an FPGA (think digital VGA), and event luckier, HDMI <-> RGB bridges ICs do exist, like this one from Ti. One problem solved !

Now

This week, I’ll try to smooth out the renderer PoC and even add voxelization in it.

We will also make a final decision on the components we will use, especially on the SBC and the FPGA. Speaking SBC, T.G. did suggest SoMs from Variscite based on NXP i.MX6 SoCs. Any experience, pros and cons about those ?

We plan on using this dev board from ST (I have one and I know Alexis to have one too) to have an embedded display with touch screen interface to control some aspects of the display. I’ll setup a base graphical project for it using ChibiOS and µGFX.

See you next week ! Or even before, any feedback is welcome !

[SpiROSE] Hunting for a SBC, blinkies and bandwidth considerations

Hey y’all !

So we’re getting a compressed video stream from a laptop / computer / wizard, and we somehow need to send it to our blinkies. Thus I went hunting for a single board computer (à la Raspberry Pi) which meets the following criterias :

  • 5GHz Wifi : even a compressed H264 or H265 stream takes some bandwidth. Moreover, we’d like to have the most reliable link possible. With all the networks there are @telecom, 2.4GHz is out of the question.
  • High bandwidth wired link : be it LVDS, Ethernet, anything; we need to get the data out of the SBC. Either to an FPGA or straight to the blades, chances are hundreds of megabytes will be transferred. Gigabit ethernet is the most interesting one, as it is really easy to use and distribute to many nodes.
  • Hardware video decoder : H264 (or even better, H265) is a huge help to overcome bandwidth limitations. However, even a powerful SBC has troubles decoding a 1080p@60Hz stream in real time. The test was done on an Odroid C1+ (Cortex A5 @1.5GHz x4), wich decoded a 1080p H264 stream at 70fps, but with frequent dips at 30fps. This also leaves very little room for other stuff to be happening, as all 4 cores were maxed out.
  • SoM form factor : oh my, connectors are a mess. This is not a mandatory criteria, but it helps. We’d like to get a SBC that offers an edge connector, to easily integrate it on a custom PCB. A widely known example is the Raspberry Pi Compute Module.

After sifting through the interwebz, this table was born. It allowed us to highly narrow down a suitable candidate. A totally overkill board is the FireFly RK3399, but is pretty hard to find and is considered a VERY bad idea by T.G. because of the Rockchip chip. The final choice still needs to be made, but this will come in handy.

I also quickly went shopping for small RGB leds, and a few were found. This table summarises the 3 most interesting one, where the second one looks really good.

Since we are considering sending data to the blades using ethernet, I wanted to know what kind of bandwidth we could expect of a microcontroller loaded with a 100Mbps link. After some Quick’n’dirty® coding, I ended up with a board able to receive 7.38 MiB/s of data. This test was done without any kind of optimization, with lwIP+ChibiOS running on a STM32F407.

Much more was done, but I won’t repeat my colleagues :]

Next week, I’ll focus with the others of defining the exact LED count we want, but right now, we are looking at 128 LEDs per blade. We will also discuss some possible optimisations of the data we send to the blade, and the layout of the voxels : do we just map a square image on a circle (à la ROSEace) or do we try to be smarter and wrap it on it, or maybe be even smarter?