Categories

[SpiROSE] Pizza

Last week was … eventful. After getting turned down again and again by the mechanic, we finally arrived at a design that might work. Simply put, a stack of ROSEace won’t cut it (haha), but a big ol’ plate à la HARP might work. At least the mechanic is okay with it ¯\_(ツ)_/¯

Renderer

Anyways. I searched for algorithms suitable for the renderer we intend to write. Its job would be double :

  • Voxelize a 3D scene
  • Apply what I will call the Pizza Transform™ from now on

Voxelization

To my surprised, the voxelization is really easy to do, even on a GPU. I found a paper performing real time voxelization (Single-Pass GPU Solid Voxelization for Real-Time Applications by Elmar Eisemann). It is based on OpenGL Xor blending mode. Wonderful, we’ll have a renderer that will work even on complex OpenGL scenes NOT meant to be voxelized. The only constraint is for the scene to be mathematically watertight (along one axis is enough).

Pizza Transform

Now, to the Pizza Transform™. Remember that we have a circular display, where voxels are not square but round-ish rectangles ? It would be better if we could “unroll” the circular image into a nice cartesian matrix. But first, why’d we want this ?

Think about ROSEace. What they did is lay down a square image on their circular display. Think a tablecloth on a round table. Then, for each blade position, they took the pixel just under each LED. This is pretty ineffective, as well as inaccurate : you waste data by not using the corners, and you risk using the same pixel twice for two different voxels.

Even by using a smarter yet harder way, and not refresing all LEDs on the blade at the same time to keep each voxel roughly the same length, you still end up wasting space and reusing pixels.

This is a simulation done using Python and Excel for a 88×88 image. Gray pixels are unused pixels, green are used exactly once, and red are used more than once. On the left is the simple way (ROSEace), right is the more complicated one. Both waste roughly the same amount of pixels (circa 32%), and neither has a perfect 1:1 mapping of a voxel to a texel.

Enter the pizza transform. Its name derives to a simple way of explaining what we want to acheive. Keep in mind we are making a renderer, so output images may not necessarily be cartesian, and we have unlimited resolution on the input.

Take a pizza (we have a 3D model of it). You may want to display it in its glorious 3D :

However, bandwidth is sparse and you don’t want to waste anything, especially not any details by missing stuff and replicating voxels ! Take a knife, cut it along a radius (say, from middle to bottom). Now, stretch it into a rectangle with crust only on one edge :

That’s our transform. Back to the renderer. We only need to do this with our whole OpenGL scene and voxelize it. The first half is kinda done, as I wrote a Pizza Proof of Concept, that generates the cut and transform an OpenGL scene in real time, using a single Geometry Shader.

On the rightmost half of the middle circle, you can see a vertical line. This is our vertical cut that allows us to unwrap the mesh into what we have on the top right. The result may look weird, but it is due to the center of the circle not being at the origin, where the cut happens. Note that the render is in wireframe only to show the cut.

SBC – FPGA

Yes, we are using an FPGA after all. Who would have believed it to be easier ? Anyways, Ethernet and the likes are out of the question.

Wait … We’re doing GPU rendrering … Straight out taking the video output to the FPGA would be perfect ! Oh, any IPs for an HDMI, DisplayPort or the likes would cost an arm, a leg and your cat 🙁 Luckily for us, the RGB protocol is trivial to use on an FPGA (think digital VGA), and event luckier, HDMI <-> RGB bridges ICs do exist, like this one from Ti. One problem solved !

Now

This week, I’ll try to smooth out the renderer PoC and even add voxelization in it.

We will also make a final decision on the components we will use, especially on the SBC and the FPGA. Speaking SBC, T.G. did suggest SoMs from Variscite based on NXP i.MX6 SoCs. Any experience, pros and cons about those ?

We plan on using this dev board from ST (I have one and I know Alexis to have one too) to have an embedded display with touch screen interface to control some aspects of the display. I’ll setup a base graphical project for it using ChibiOS and µGFX.

See you next week ! Or even before, any feedback is welcome !

7 comments to [SpiROSE] Pizza

  • phh

    Hi,

    First question: you say you want to reduce bandwidth compared to Roseace. Ok, but which bandwidth?
    Is it the bandwidth between the renderer and the SBC? Or the SBC and FPGA?
    Depending on whether we’re discussing compressed or uncompressed video stream, the arguments change…

    I’m a bit sad you’re going back to FPGA, I found your previous uC+Ethernet architecture really nice

    Finally, going from rk3399 to imx6 is a huge step down. You no longer require 4k video decode?

    • Tuetuopay

      > Ok, but which bandwidth?

      Actually everywhere. If the renderer is embedded on the SBC, there is not BW issue. However, if it is located on PC then streamed over Wifi, any bandwidth reduction is welcome ! And since we’re using HDMI -> RGB, the reduction is not *that* critical, but is puts less strain on the FPGA and the GPU output.

      > I’m a bit sad you’re going back to FPGA, I found your previous uC+Ethernet architecture really nice

      We needed high bandwidth stuff, thus gigabit ethernet was mandatory. However, routing GbE is a pain in the butt according to Alexis. Moreover, this means controlled impedance PCBs, making the manufacture cost skyrocket. The advantage here is we have a single HDMI link, that we’ll be able to minimize (under a cm) by placing the HDMI -> RGB near the SoM. According to Alexis, we might be able to get away without controlled impedance board, only controlled impedance traces.

      > RK3399 -> i.MX6

      This is not definitive. I was mentioning it, as Tarik looked at them. One of the reason I’m writing a PoC for the renderer is estimating what we need for a GPU. As for the decoding, 4k is not needed, as 1080p60 is enough if we want to stream a pre-rendered video from a PC.

  • Phh

    First, most SBC have RGB output, no need to have a HDMI -> RGB, unless you eventually want fully forward-upgradable system for future SBC which will no longer have RGB…? (rk3399 indeed doesn’t have parallel interface, but imx6 does have it)
    Also, using SBC’s RGB output would probably make outputting at the correct FPS (and other timings) easier. I wonder if you wouldn’t be better off with MIPI/DSI or eDP instead of HDMI.

    Then for bandwidth, concerning renderer => SBC, I don’t see how pizza rendering helps. An encoder doesn’t care much about dead-zones. Though it does care about continuity, and with the pizza transform, you lose the continuity at the cut. For instance, when an object goes through this cut, it will be fully-reencoded, while with the continuity, only the Motion Compensation will be there.
    Also, the deformation itself can be problematic for the codec. Scaling objects is not really well supported by H264. I think it’s better on H265, but still far from optimized.

    But then, I still haven’t grasped how you render 3D onto your pizza, so perhaps my comments are meaning-less there.
    Also I understand that the pizza transformation is very useful to simplify FPGA work, this alone might be worth even +100% codec bandwidth

    Since you use RGB output, if you have the proper transformed output and timings, you could have a bypass FPGA, which just demux-es what it receives, without even a framebuffer. Is that your goal, or will you have a framebuffer near the FPGA?

    BTW, we previously discussed maximum bitrate, and you wanted very high bandwidth WiFi:
    Please note that, for instance, on iMX6QuadPlus, the maximum bitrate is anyway 50Mbps (source: http://cache.freescale.com/files/training/doc/ftf/2014/FTF-CON-F0165.pdf).
    (Other SoCs doesn’t go much higher)

    • Tuetuopay

      Indeed, quite a lot of them do have RGB parallel interface. However, powerful ones usually don’t embed them as they are meant for set-top boxes. As for MIPI/DSI, this interface is a mess with closed spec according to Alexis. As for eDP, it is not easily feasible on an FPGA. Except if this is a replacement for HDMI, while still using an HDMI/MIPI/DSI/eDP -> RGB bridge.

      Thanks for this very informative insight about motion compensation which I totally forgot. It would be interesting to see an actual comparison of the two, if the pizza gains overcome the compression loss. However, this is a matter for when the renderer is not running on the SBC. If it is, then the slower the HDMI – RGB link, the better (I think). And your point on scaling is fair, we’ll investigate.

      Well the pizza is just a slice (no pun intended) of our scene. The OpenGL scene is sliced vertically (the pizza view I showed was a top view), then transformed, then voxelized, then each voxel layer gets rendered on a separate part of the frame. You’d have the a matrix of images, each representing a voxel layer on the final screen. And yes the pizza transform means that each transformed line (or column) corresponds to a specific LED angle. Thus the FPGA work is trivial on this point.

      We’d still need the FPGA to store the image displayed at each refresh, since we’d be getting a single line at a time, but not the whole LED matrix. It could be done with internal registers/SRAM for ROSEace as ~128*3 internal registers were enough. Here, we have 88*40*3 ~= 10.5kB per refresh, at worst. Moreover, with a video output, data comes in at a steady pace. However, we plan on having a refresh rate smaller for the inner LEDs than for the outer ones. Thus the bitrate would be variable, depending on your angular position. But yeah, this way the FPGA is essentially a demuxer, timer and remuxer (different kind of muxing, we may plan on electrically muxing LEDs).

      That is a concern, which may be the critical factor for not choosing this i.MX SoM, especially since Alexis requires some sort of streaming from a PC. Be it the fully rendered thing or a simple video that will be played on it. You mention other SoM not getting much higher, but FireFly has PCIe for a fully-fledged wifi card 😀 (half troll)

      • Phh

        > Well the pizza is just a slice (no pun intended) of our scene. The OpenGL scene is sliced vertically (the pizza view I showed was a top view), then transformed, then voxelized, then each voxel layer gets rendered on a separate part of the frame. You’d have the a matrix of images, each representing a voxel layer on the final screen. And yes the pizza transform means that each transformed line (or column) corresponds to a specific LED angle. Thus the FPGA work is trivial on this point.

        I have a suggestion for an alternative encoding, though I can’t tell if it’s better or worse:
        Encode every slice as a different frame, with a keyframe for every “3d-frame”. This would mean a ~ 1200fps video.

        Positive points:
        – You make the video features lower-frequency in space, thus the codec will be more likely to keep it
        – You take into account the fact that one slice looks a lot alike the other one
        – Deblocking-pass will make more sense (it won’t deblock across different slices, and won’t filter out a whole slice)
        – You won’t have $numberOfSlices discontinuities in the frame

        Negative points:
        – You loose all time-based Motion Compensation
        – You’ll need your decoder to agree to this stupid fps/frame-format
        – There might be more data to match between frame A slice X and frame A+1 slice X, than frame A slice X and frame A slice X+1.

        > That is a concern, which may be the critical factor for not choosing this i.MX SoM, especially since Alexis requires some sort of streaming from a PC. Be it the fully rendered thing or a simple video that will be played on it. You mention other SoM not getting much higher, but FireFly has PCIe for a fully-fledged wifi card 😀 (half troll)

        My point here was about the hw video decoder, which won’t go any higher than 200Mbps, and it’s likely the limit is much lower. Don’t expect to have >50Mbps working. (though a powerful H265 encoder with maximum research space at 50Mbps should be really good)

        > We’d still need the FPGA to store the image displayed at each refresh, since we’d be getting a single line at a time, but not the whole LED matrix. It could be done with internal registers/SRAM for ROSEace as ~128*3 internal registers were enough. Here, we have 88*40*3 ~= 10.5kB per refresh, at worst. Moreover, with a video output, data comes in at a steady pace. However, we plan on having a refresh rate smaller for the inner LEDs than for the outer ones. Thus the bitrate would be variable, depending on your angular position. But yeah, this way the FPGA is essentially a demuxer, timer and remuxer (different kind of muxing, we may plan on electrically muxing LEDs).

        Ok, it makes sense. I was hoping you could have an evolved shader such as you could just wire RGB pins to the LED drivers (or perhaps dumb multiplexers before that), and drop the FPGA. That would have been pretty cool 🙂 (though I guess writing SystemVerilog is much easier than tricking shaders into using RGB as DMA-ed GPIOs :p)

        • Tuetuopay

          God with math exams I totally forgot to respond here 🙁

          > I have a suggestion for an alternative encoding, though I can’t tell if it’s better or worse:
          > Encode every slice as a different frame, with a keyframe for every “3d-frame”. This would mean a ~ 1200fps video.

          This sounds like a neat idea. However, the limiting factor here is the framerate we can get out of the SBC. Since we are looking at a HDMI/RGB output, I doubt we’ll be able to have a quicker refresh rate than usual display ones. I would be honestly surprised if an SBC could do 75fps on its HDMI out.

          However, one option would be to use the “3D” of H265. With h265, you can encode several layers in a single frame. For example, 3D movies would get two layers, one for each eye. The drawback is being able to find a sbc whose VPU can process multi-layer H265. Oh and how exactly will it be outputted on the HDMI ?

          > My point here was about the hw video decoder, which won’t go any higher than 200Mbps, and it’s likely the limit is much lower. Don’t expect to have >50Mbps working. (though a powerful H265 encoder with maximum research space at 50Mbps should be really good)

          Oh I get it. You mean that an encoded h265 file at more than 50Mbps is likely not to be decoded. I wouldn’t think it to be a problem (correct me if I’m wrong), as 1080p h265 at a few Mbps is already pretty good.

          > Ok, it makes sense. I was hoping you could have an evolved shader such as you could just wire RGB pins to the LED drivers (or perhaps dumb multiplexers before that), and drop the FPGA. That would have been pretty cool 🙂 (though I guess writing SystemVerilog is much easier than tricking shaders into using RGB as DMA-ed GPIOs :p)

          With enough hacking around I’m pretty sure it could be done, IF we had no multiplexing. Without multiplexing, we were looking at hundreds of 48 channels drivers. At 3 bucks a pop, that ended up expensive.
          One advantage of shaders is that testing a new one is just a matter of a recompilation, compared to a SV synthesis, placing, routing, … And don’t get me started on shaders, I love this kind of black magic 😀

          • Phh

            > This sounds like a neat idea. However, the limiting factor here is the framerate we can get out of the SBC. Since we are looking at a HDMI/RGB output, I doubt we’ll be able to have a quicker refresh rate than usual display ones. I would be honestly surprised if an SBC could do 75fps on its HDMI out.
            >
            > However, one option would be to use the “3D” of H265. With h265, you can encode several layers in a single frame. For example, 3D movies would get two layers, one for each eye. The drawback is being able to find a sbc whose VPU can process multi-layer H265.

            Well my suggestion was the encoding between renderer and SBC, not SBC to FPGA. For SBC to FPGA, one frame containing 42 sub-frames, or 42 frames, the only different is the frame clock, so it won’t change anything

            I didn’t know H265 could encode multiple (>=3) layers per frame, nice.
            MV-HEVC seems to be an extension to HEVC, not part of it. And rk3399 datasheet doesn’t mention MV-HEVC at all, so I don’t expect the VPU to support it.
            Also, my guess would be that even if you find hardware with MV-HEVC, it might very well only support 2 frames.

            > Oh and how exactly will it be outputted on the HDMI ?

            Well, decoding and and outputting to HDMI is mostly decorrelated (those are totally different IPs in the SoC). Just output whatever you can on HDMI, even if it wouldn’t make any sense for a standard HDMI renderer. You control both the emitter and receiver, so…

            > Oh I get it. You mean that an encoded h265 file at more than 50Mbps is likely not to be decoded. I wouldn’t think it to be a problem (correct me if I’m wrong), as 1080p h265 at a few Mbps is already pretty good.

            Well then you don’t need super-speed WiFi 🙂

Leave a Reply

You can use these HTML tags

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>