Categories

[Little Brosers] From Python to C

Soooooo… No joke this week, I’m almost gonna write an essai this time.

 

First step: Allocating memory for our Merkle tree

So, the first issue about implementing our Merkle tree structure using C is memory allocation.

A simple way would be to use recursive build function for each node and malloc() each of them. But, let’s remember that we will use this code on an embedded system. Dynamic allocation is not allowed.

So the first struggle was to find out how to staticaly allocate the space needed for our Merkle tree.

 

First I need to refresh your memory. Our Merkle tree is 4-ary balanced-ish. That is to say, each child of a node represents the time frame of the node divided by four. But this balanced time distribution rule is broken for the 4 first children of the tree, a.k.a. the root’s children. Those nodes have a an arbitrary part of the root’s time frame. This choice has been made because we think a lot of messages will be recent. Thus, the very last 24 hours will be more likely to contain messages than the last week.

To know when do we stop to split a node’s time frame, we have a constant in the code called BUCKET_SIZE. It gives the minimum time frame a node can represent. Thus, in this case, if the time frame width of a node divided by the number of children (4 here) is greater than BUCKET_SIZE, we split it. Otherwise, this node becomes a leaf.

Here is a basic representation:

So how do we statically allocate such a structure?

Several solutions were examined and to be short, the chosen one is the magic “python generated C header”.

In other terms, a python script included in our Makefile is called whenever the file containing our Merkle tree’s constants is modified and re-write a second header which only contains the size of our sub trees. This way we only have to include this generated header in our merkle_tree.c file and we know at compilation time the size needed for each subtree.

Oh and the sub trees are stored in C arrays.

 

Second step: Building the Merkle tree

Ok, now we know we have all the memory pre-allocated. The thing is…. we only have four dumb arrays. We need to turn this into a tree structure.

See below, we have 4 arrays of ‘node’ C structs and one node C stuct:

The sub trees are stored in their array in a Breadth First Search manner. The sizes of the arrays have been “#define” declared by the python script in a header.

The image above is a simplified representation of the sub trees and arrows are missing. Actually, There is also pointers to parent node.The job in this part has been to make sure each node C struct of the arrays is filled correctly and that each of its pointers is set.

For the curious ones, here is the content of a node C struct (for now):

struct node_t {
    uint64_t start;
    uint64_t end;
    uint32_t child_width;
    node_t *p_parent;
    node_t *p_children[LB_MT_NB_CHILDREN];
    uint16_t depth;
    uint16_t child_id;
    uint16_t array_id;

    slot_t *p_slots;
    uint8_t p_hash[SHA256_BLOCK_SIZE];
};

 

 

Third step: Attaching messages to leafs

We have now a Merkle tree structure we can explore without knowing it’s an array and just using the provided pointers in each node.

The next step is to find out which message has to be attached to which leaf. Also, the actual messages will be stored in external flash so we can’t “put” the message in the leaf structure. We will instead use structures called ‘slot’. Each slot can only correspond to one message and will have an integer which will correspond to the address of its message in the external flash. The only thing we will copy from the message is its time stamp, for convenience. This is what it looks like :

struct slot_t {
    slot_t *p_previous;
    slot_t *p_next;
    uint32_t msg_addr;
    uint32_t timestamp;
};

So each leaf will be able to have what we call a bucket, a.k.a. a linked list of slot_t C structs.

By the way, the slots are stored in a global array of slot_t. The array’s size is the max number of messages we can store in the external flash. We also have three global variables in the program of type slot_t* named begin_busyend_busy and begin_free. We will see that our array of slots can be seen as a pool of slots.

At startup, none of slots are used. Thus, we will attach the slots all together. Then will assign begin_free to the first slot of the array. Making begin_free the head of a linked list of ready-to-use slots.

During the tree building process, a slot will be detached from this linked list each time a message is being attached to a leaf’s bucket. The begin_busy and end_busy pointers will be set latter.

At the end of this process, each message will be attached to its slot. This slot will be placed in a leaf’s bucket following those rules:

  • When there is only one message belonging to a leaf’s time frame, the bucket of this leaf will be of size 1 and will contain this messages’s slot.
  • If several messages with different time stamps all belong to the same leaf, they will be attached to this leaf’s bucket. The bucket’s order will be based on the slots time stamps.
  • Finally, if several messages with the same time stamp all belong to the same leaf, the will be attached to this leaf’s bucket. We will not be able to order them by time stamp since their are equal. We will us their SHA256 instead.

Fourth step: Building our linked list of time-ordered messages

 

So now, we have a Merkle tree with (small) linked lists attached to its leafs. The only thing to do next is to attache all those linked lists together to have a long one. The begining and ending of this long list will be stored in the begin_busy and end_busy pointers. This long linked list will be used to travel across the messages list in an time ordered manner. Being able to doing so will be very useful because it allows us to not order the messages in the external flash.

 

What to do next ?

 

So yeah, it has been quit challenging to find an efficiente way to store and build this Merkle tree in C. I now understand why python is such an easy-to-prototype language.

 

Even if i have a some messages pools for which the tree building process succeed, I know for sure that it is not over yet. I found particular message pool which make crash my building process. So i might spend the following week fixing it.

 

By the way, there is still no automatic test using criterion in my code. I plan on adding some as soon as possible. For now I only use my eyes to read the text-representation of my tree and printf() error reporting in order to detect errors.

 

See you next week !

 

Antony Lopez

Antony Lopez

Leave a Reply

You can use these HTML tags

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>

  

  

  

This site uses Akismet to reduce spam. Learn how your comment data is processed.