Categories

Welcome to the Jungle

Before reading this post, you might want to make sure you have read this one, and this one. Just saying.

The purpose of this post is to describe the algorithm known as Jungle and the problem is solves.

Synchronization

Making WaDeD a little bit more abstract that what it actually is–which, conveniently for me, makes all the electronics disappear ;-)–, you can see it as follows: a single device holds a set of messages. When two devices meet, we want them to synchronize their sets, so that when walking away from one another their internal states are the same. Since the interaction might be short, we want this to be as fast as possible. We also want to reduce the energy costs of the whole interaction as much as possible. The energy cost of using the radio, as much in transmitter mode than in receiver mode, is orders of magnitude higher than the cost of CPU cycles or FRAM access over SPI, so we must critically minimize the number of radio packets transmitted. However, once you have started transmitting, the energy cost of adding an extra byte is not really significant, so as much as energy is regarded the size of the packets is of minor importance (it actually is pretty much important, but for reasons of radio transactions efficiency and packets integrity). Furthermore, since any number of devices can be in the same area, the communication should not isolate devices. Instead they should all talk together, so that exchanged messages do not need to be repeated.

Put in other words, we want to compute the symmetric difference of two sets of messages (that is, the elements that they do not have in common) and then exchange those messages, while minimizing the number of times a device communicates with another.

Data storage

In order to do this, we need to be able to quickly compare the state of the memories of two devices. As mentioned in a previous post, hash trees are really convenient data structures to do integrity checks and compare data, provided they are organised precisely in the same way, regardless of the device, their order of arrival and the other data that the device is holding.

This is a very strong constraint: if the messages are stored differently, the trees might have to make more comparisons (consuming energy), or, even worse, they might never completely synchronize, and be stuck in comparing their hashes forever. Can we find a way to organize data so that the same two message will end up being stored in the same place, regardless of what other data is present and the order they were inserted? It appears we can, using a very common structure, namely open hash tables: each leaf of the tree will point to a chained list. Given the 64-bits identifiant of a single message, we determine in which list it should be inserted. The leaf pointing to a list contains a hash of the concatenation of the IDs of the list. The lists are sorted by ID, so the hashes of their leaves differ if and only if the lists differ.

hashtree

That’s it, here is our structure. Messages are stored in small lists, which form the leaves of a hash tree. It is deterministic in its insertion, and comparison can be fast because of the hash tree structure.

The algorithm

Say two devices meet. We want to design a procedure which, at the end, leave the two devices with the same messages in memory, while making them communicate as few as possible, and we can use the augmented hash trees that we have implemented.

In the following description, we need a bit of terminology. The nodes of the tree are divided in two categories: the leaves, which have no sons, and the internal nodes, which have sons. Among those, the penultimate layer (that is, the internal nodes whose sons are leaves) will deserve special treatment. When we talk about broadcasting an internal node, we mean transmitting an information about its position in the tree, and its hash. When we talk about broadcasting a leaf, we mean transmitting an information about its position, and the IDs of all the elements in the list that corresponds to the leaf.

Here are the rules that drive the behaviour of a WaDeD in its communication:

  1. If nothing has happened over a certain period of time, broadcast your root.
  2. If getting a root, compare it with your own. If the hash match, do nothing, otherwise broadcast all the sons of the root.
  3. If getting a node, compare it with the corresponding nodes in your own tree. If they match, do nothing. Otherwise:
    • If the node is not on the penultimate layer, broadcast its sons.
    • If  it is on the penultimate layer, its sons are leaves. For each son, broadcast the leaf.
  4. If getting a leaf, compare it with the matching list in your own tree. For every ID that you hold and is not in the distant list, broadcast the corresponding message. If no message was broadcasted, broadcast your own corresponding leaf.
  5. If getting a message, check whether a message with this ID already exists in memory. If it does, do nothing, otherwise insert it in memory and update the tree.

This algorithm has good properties: when two devices are in a conversation, they are strictly moving down their respective trees, so there can never be any loop in the conversation. A typical exchange will appear as this:

WaDeD A sends a ROOT
root: 1D5DD10E

WaDeD B sends a ROOT
root: B04424B1

WaDeD A sends a NODE N0
sons: 21E5338F 21E5338F 21E5338F 21E5338F 21E5338F 21E5338F 21E5338F 21E5338F

WaDeD B sends a NODE N3
sons: EEDFA82 EEDFA82 3D7B7AC4 EEDFA82 EEDFA82 EEDFA82 EEDFA82 EEDFA82

WaDeD A sends a NODE N11
sons: 0 0 0 0 0 0 0 0

WaDeD B sends a LIST N144
list size: 1
Element 1: 1804A114 

WaDeD A sends a LIST N144
list size: 0

WaDeD B sends a MESSAGE
I'm a WaDeD!

WaDeD A sends a ROOT
root: B04424B1

WaDeD B sends a ROOT
root: B04424B1

This is an actual dump of two WaDeD synchronizing. You can see that the trees have eight sons. The synchronization process is extremely fast if the trees are similar to begin with, and not remarkably longer than a list comparison if the trees are hugely different.

Welcome to the Jungle

But another strong property of this algorithm is that it is stateless: everything is broadcast to everybody, and any device arriving in the middle of a synchronization can benefit from hearing a message. Potentially this leads to conversations dividing and many, many messages being exchanged–hence the name Jungle–, but the WaDeD always act upon messages it gets in a way that leads to increasing its knowledge of the environment. After a few seconds, the jungle calms down: all devices share the same memory state.

In conclusion, we have an algorithm that is extremely efficient when synchronizing two sets that are close and still very good when the sets differ a lot. It deals well when a lot of devices are communicating at the same time, and, since a given device can enter a conversation at any point, missed packets cannot do more harm than having to start the conversation a few messages before.

I think that’s pretty neat.

The cube’s last moments…

Cube last moments

(I’m kidding of course, it’s perfectly fine ;-))

Planting trees

Now that we know that we want to store our packets in a hash tree, we still have to choose what exactly we are going to implement. This post explains the different parameters we have to consider to make that choice.

What’s our memory?

A WaDeD board has five types of memory we can access, ranging from the smallest to the largest:

  • MCU backup registers (80 bytes): these are used to save the registers when entering standby mode. Reading and writing is extremely fast, but this is not multi-purpose memory, and it is not addressable.
  • MCU internal EEPROM (4 KB ranging from 0x0808 0000 to 0x0808 0FFF): non-volatile memory, addressable by octet. Writing is slow (up to 3.94 ms), reading is fast. Should be used only for data that we want to keep over the long run and not update frequently.
  • MCU internal RAM (16 KB ranging from 0x2000 0000 to 0x2000 3FFF): volatile memory, addressable by octet. Reading and writing is fast. This is our main RAM, and, with the exception of the backup registers, the fastest memory we can access.
  • MCU internal flash (128 KB ranging from 0x0800 0000 to 0x0801 FFFF): non-volatile memory, addressable by page. Reading and writing are slow. In normal use, this is where we will store the firmware.
  • External FRAM (256 KB): non-volatile memory, addressable by octet over the SPI2 bus. The FRAM can work with a frequency up to 40MHz, so it is only limited by the working frequency of the MCU, which is set at 1MHz. This means access in microseconds, a thousand times faster than flash or EEPROM, but a thousand time slower than RAM.

The RAM is therefore the best memory to use for frequent R/W operations, followed by the FRAM. It is not reasonable to expect frequent writing on the flash and EEPROM, so the normal use of the software should treat those zones as read-only.

What does a tree weight?

It is obvious that we cannot stock the messages in the RAM: we need to store them on non-volatile memory, and there is not enough space to store a satisfactory amount of them. Therefore, it is a no-brainer that they will go to the FRAM. Storing the tree raises more questions: how much does the tree “weighs” anyway?

If well constructed, the nodes of the tree need not store a pointer to their sons (think of a heap for instance), so the internal nodes only need to store, for instance, 8 bytes, if we use 8 bytes hashes as messages fingerprints. With a message size of about 180 bytes, we can store 1024 of them on the FRAM while keeping enough space for the rest of the application, so our tree must have 1024 leaves. The leaves are special, because they do not contain the same things as an internal node: they either contain a whole bucket (less flexible, more economic) or the address of a bucket (more memory consuming, but allows to specify the kind of bucket when inserting the message in the tree). In the later hypothesis, each leave weighs 11 bytes, since the FRAM uses three-octets addresses. The leaves take up a total memory space of 11264 bytes, around 11kB.

With a fixed number of sons in our tree, we can use either a binary tree (with 1023 internal nodes) or a 4-sons tree (with 381 internal nodes) (we could use a 512-sons tree, but that would not be very efficient). That is, the internal part of the tree weighs about 8kB if it is a binary tree, and about 3kB a 4-sons tree (8184 and 2728 octets, to be exact). The total tree weighs either 19 or 14 kB.

How many sons should we have?

What are the pros and cons of a binary tree versus a 4-sons tree? Size is a parameter, as we just saw. But by storing less information, the 4-sons tree is also less precise. When updating a part of the tree, the affected subtree is considerably larger than for a binary tree. It means that in the case two trees do not differ a lot (which, as we saw, is when the hash tree structure comes in handy), we need to make more hash comparisons to identify the leaves that differ. Since hash are what are exchanged through radio during the comparison phase of the transaction, a hash comparison is a costly operation.

In the same time, we saw that FRAM accesses are a thousand times slower than RAM accesses. This means that if we can store something in the RAM, we have a strong incentive to do so. It is not reasonable to store 11 kB of leaves in the 16 kB RAM, but we can consider storing the internal tree. 8 kB is a lot, so storing a 4-sons tree in RAM is easier than storing a binary tree. Can we afford storing a binary tree? A 4-sons tree? Neither? This will need experimentation, and might need last minute adjustments as the firmware becomes more complex and uses more RAM.

What’s a hash tree?

In this post I am going to talk about hash functions, hash tables, Merkle Trees and data structure choices for WaDeD. Feel free to skip any section.

What’s a hash table?

A hash table is a data structure that provides an associative array: it stores values in buckets, each bucket being uniquely mapped to an index, or key of fixed size. It does so by using a hash function (more on that in a minute) to compute the key from the original data. Hash tables are usually very efficient in terms of access times, and are one of the most common data structures.

Example of a simple hash table
We usually want to be able to insert values into the table, with at most one value linked to a key: the table must not be ambiguous. Since we use an algorithm to compute the key from the original value and that there usually are more possible values than possible keys, it may happen that the algorithm produces the same result for two different values: it will try to map two different values to the same key. This is called a collision, and any self-respecting hash table must take these into account and define a policy of dealing with hash collisions. These can be not storing the new value, creating a linked chain from the original bucket, storing the value in a neighbouring bucket, etc., depending on the way the hash table is effectively implemented and what behaviour we expect from it.

Illustration of a hash conflict

What’s a hash function?

Hash tables use hash functions to compute the keys it will associate to its values. A hash function is any algorithm that takes input from a data set, and maps it to a smaller data set of which all elements have the same size. The operator “Modulo n” for a given integer n, or truncation to the n last bits of data (which is really the modulo 2n+1 operation), or the constant function equal to four are simple examples of hash functions.

Cryptographic hash functions have additional properties. The goal is to provide security about the original value. We want it to be highly discontinuous: small changes of an entry should greatly change the output value. We call this property the “avalanche effect”.

Illustration of the avalanche effect in the SHA-1 hash function

In order to be secure, a cryptographic hash function should also be unpredictable (one should not be able to design an entry that will have a given hash), irreversible (one should not be able to retrieve the original value from its hash) and, in practise, should be collision-free. This is, obviously, a theoretical impossibility; however all outdated cryptographic hash functions have been depreciated because ways of designing collisions in practise have been found. It has been shown theoretically that we could design collisions in the used cryptographic hash function, the SHA-1 algorithm, so its days could be over soon whenever somebody figures out how to go from theory to practise.

Because of their good properties, cryptographic hash functions have a lot of applications. They are commonly used to encrypt passwords: storing a password is unsafe and dangerous, but since it cannot be retrieved from its hash, it can be safely stored on a distant server. When trying to connect to the service, the input password is hashed and compared to the stored hash, instead of direct password comparison. Because in practise cryptographic hash functions can be considered as injectives, we can use their hash to check for integrity of a file. This is why Linux distributions always provide hashes of the installation files: after the download, the user can run a hash algorithm on the file. Any corruption, accidental or intentional, will result in a different hash and the file can be discarded. Finally, we can use cryptographic hashes to “fingerprint” a sequence: it is a simple, low cost way of giving a unique identifier to something. Git uses the SHA-1 algorithm to identify commits. If we have a big hash table, using a cryptographic hash function is a good way to minimize collisions.

What’s a hash tree?

When exchanging files over a network, integrity checks are crucial: we want to protect our computer from installing bad software, and want to be able to ask for data again if it was corrupted during transmission. Therefore there is a need to do quick, efficient integrity checks. We already saw that cryptographic hash functions are good tools to do so, but the way we use them can make a big difference.

A hash tree is a structure that can be used to rapidly check differences between parts of a file. It is a tree whose leaf nodes are data, packets we want to store independently, and whose internal nodes are hashes of the concatenation of their son nodes.

Principle of a hash tree

Say I have two copies of the same file, stored as hash trees (the files are split into small chunks which are stored into the leafs) and I want to check whether they are really copies. If the top hash match, then I know that the whole tree does, and therefore the leafs do: the files are the same. If they differ, not only do I know that the files differ, but by going down the tree I can identify exactly which parts of the files are identical and which ones are not.

This is exactly how bittorrent works: when downloading a file with bittorrent, the user first gets the top hash of the file through a secure source. Then she can get the tree through any user, since having a valid top hash ensures the whole tree is valid. Then she can freely communicate with other users on the network who pretend to have parts of the file. As she downloads the packets, she compares the tree that builds up with the copy of the tree she first got, knowing when to discard false or damaged packets. When she has all the packets, the hash tree of her file matches the one she got in the first place, and she knows she successfully downloaded the file.

Comes in WaDeD

So what do we do with all this?

In WaDeD, modules have to store packets to be transmitted on the network. We want two modules that meet to synchronize all their messages, so that each one can retransmit packets that were held by the other. Therefore we want to be able to quickly know what the symmetrical difference between the two lists of messages is. A hash tree seems like a good idea for this: if two modules already have a lot of packets in common, we can expect them to have huge subtrees in common and minimize the number of comparisons.

This is harder than it first appears. In order for the hash trees to work, two users who have packets in common have to store them at the same place in the tree: otherwise the similarities will not appear when comparing hashes. In a worst case scenario, two modules could have all their messages in common and have different top hash, provided the order of only one stored message differs. This is because the usual application case of hash trees is checking integrity of fixed files, not to compute symmetrical differences of two sets whose content and size. If we want to use hash trees, we need two users who carry the same files to store them in the same place of the tree, independently of any other factor.

An acceptable solution to this problem is to consider the leaf layer of the tree as a hash table: the leafs are buckets in which we place packets according to their hash value. This gives us certainty that two almost similar sets of packets will produce almost similar trees, at some costs:

  • The number of packets that we can store is limited by the number of buckets (the tree cannot adapt its size). This is not really an added cost, since we already have strong memory constraints (limiting the number of stored packets) and that we want to avoid dynamic memory allocation (forcing us to use buckets anyway).
  • Hash tables, especially fixed size ones, always induce a risk of collision. It is inversely proportional to the size of the table.

Is it worth it?

Two modules that meet will always have to compare their packets lists, so that is a common operation. We intend to store at most around a thousand packets, so that is up to a thousand comparisons at every meeting if they want to compare their lists directly. That means a thousand hashes over the radio. Radio communication is slow and the highest energy cost of the whole application, so that is something we absolutely want to avoid.

An amelioration is to have a “top hash” representing the state of a list of messages. This cuts the cost of the 1k radio communications when the lists are the same, but if they differ, we still have to exchange the whole list of hashes, which remains costly.

A hash tree allows us to minimize the amount of data we will exchange on the radio, at the cost of a few more CPU cycles and a 1/1000 collisions chance, which will cost us a few more CPU cycles to deal with, so this seems more than reasonable. A hash tree it will be, then.

In another post I will explain which choices we have to make concerning the effective implementation of the hash tree in memory.

February 25

Happy birthday Carlo Goldoni!

Today:

  • Stayed up for an hour in front of the class while exposing what WaDeD will be all about. I am unsure whether we made a good impression or not, but at least it was not terrible. We still have to think more thoroughly where the batteries will be located and probably will have to review our planning.
  • I got OpenOCD working, erased the flash, got the RGB led blinking in red and green. Maybe I’ll pick that up later, since getting the board to do something was quite motivating.

February 24

Happy birthday Sid Meyer, Steve Jobs and Plastic Bertrand!

Today:

  • Had another lenghty reunion with Pedro, Hubert and Sacha. We’ve got our things ready for tomorrow.
  • Went to the gym club. ROSE-unrelated, but it felt nice.

That’s all, folks!

February 23

Yesterday :

  • I missed Heinrich Hertz‘s birthday. Sorry about that.
  • I got my STM32 TP card, then I did nothing with it. Yay.

Today:

  • Happy birthday George Frideric Handel !
  • Explored ChibiOS architecture, trying to understand how to do what I’m supposed to.
  • Had a (quite long) reunion with Hubert, Sacha and Pedro about Monday’s exposé.

 

February 21

Happy birthday Sacha Guitry!

Today:

  • Worked on my XBee exposé. I hope we won’t forget anything crucial.
  • Had a nice meal.

 

February 20

Happy birthday Ludwig Boltzmann!

Not much time for ROSE-related work today:

  • Had a reunion with Hubert, Sacha and briefly Pedro (I had to go to my Leadership in Cinema class, which was a pretty enjoyable time) about specifications for the projects. Good news is: we have a name! From now on, we will be the WaDeD (Walking Dead Drops) team! This being said, we have much more to do before Monday, but this really helped clarifying what we need to do.
  • We agreed on meeting tomorrow afternoon with Paulin and Victor to prepare our Friday morning exposé. I found datasheets for XBees (they were sort of hidden, Digi are little devils) and started reading them; I also read more about IEEE 802.14.5 and ZigBee protocols.

I have not looked at Aurélien’s RF solution yet, but 802.15.4 sounds like the sort of thing we are looking for: low-consumption,  short-ranged, small bandwidth transmission.

February 19

Today:

  • Gave some thought about lemon-powered low voltage radios. When life gives you lemons, you might as well make embedded systems out of them.
  • Firmly decided to commit myself to learn buses protocols before Monday.
  • Talked about PSSCs with Hubert and Pedro. I think writing down the transmission protocol between our radios would be something that I like. I am thinking of a simplified version of IP.
  • Read about the ZigBee protocol for my exposé on Friday. It looks like the sort of thing we could use—not only because it is what we learn in class, but also because it was design for simplicity and low energy consumption.

We also began converging towards a name : Zombie Drops (Pedro’s Idea), Walking Dead Drops (my interpretation of it, a bit too long I’m afraid), WaDeD (Hubert’s shorter version of the precedent). It looks like it will be close to that sort of meaning. I like WaDeD!