Merkle Trees: Growing in Use

Merkle _What_

The overall structure of a Merkle Tree is quite simple and is very familiar to computer scientists: a tree data structure. Data trees have one root node (the main file or piece of data) which is then divided or branched out into child nodes — in this case exactly one or two. By branching out again, these children then become parent nodes to subsequent children and so on. The final child node of any branch is called a leaf node. In reverse, each leaf node in combination with its sibling node (should it have one) yields its parent node and so on, until the original root node is reconstructed.

  1. Moving up the tree towards the root, each leaf or child hash is XOR’d (cryptographically combined) with the hash of its sibling node
  2. Finally, the hash of the the top parent nodes are XOR’d into the root node which precisely equals the aggregate hash of the original piece of data

Merkle _Why_

For many years Merkle Trees were little more than a cryptographic magic trick. But as is common with mathematical and computational breakthroughs, years later it began to play a critical roll in various protocols and software projects. Merkle Trees are mainly used for two reasons:

  1. They allow a client or server to validate any segment or sub-segment of the file/data without possessing any other segments

Merkle _How_

One important use of Merkle Trees is in downloading files. If a user/client attempts to download a large file all at once and something goes wrong, the entire file can be corrupted and the full download process would need to be restarted. Using Merkle Trees, the user can download a smaller segment of the data and hash it. By combining this hash with the hashes of each other segment (trivially small to download/check compared to the data itself), they can check to see if any data was corrupted during the download process. These steps continue for each segment such that if the Merkle Root doesn’t match up at any step during the download process, they know which segment of the data was corrupted. It is important to note that this process takes place automatically in the background and does not require any action from the actual user.

Merkle _Root_

In conclusion, Merkle Trees are a very clever way to maintain and verify databases and large files across networks of users/devices. When checked against the original Merkle Root, every piece of data must remain completely unchanged, else the roots will not align. As we live in a continuously more digital age and distributed systems gain popularity, it is likely we will start to see more Merkle Trees sprouting up in new and exciting places.

Data Science | Data Engineering | Python Development

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store