Merkle Trees: Growing in Use

Image for post
Image for post

Ralph C. Merkle (not pictured above), born 1952, is one of the founding fathers of Public Key Cryptography. Throughout his career he has developed and contributed to a list of monumental cryptographic systems, some of which are embedded in the backbone of the online protocols and applications we rely on in daily life. In this post we will look at his work on Merkle Trees, how they function, and why and how they are growing in use.

Image for post
Image for post

Merkle _What_

Merkle Trees differ from traditional data trees in one simple way: they use cryptographic hashes of each piece of data rather than the data itself. From leaf nodes up to the root, the process works as such:

  1. Each leaf node is hashed
  2. Moving up the tree towards the root, each leaf or child hash is XOR’d (cryptographically combined) with the hash of its sibling node
  3. Finally, the hash of the the top parent nodes are XOR’d into the root node which precisely equals the aggregate hash of the original piece of data
Image for post
Image for post

Merkle _Why_

  1. They allow a client or server to efficiently validate the contents of large files/data
  2. They allow a client or server to validate any segment or sub-segment of the file/data without possessing any other segments

To understand why Merkle Trees have these properties, it is important to know a little bit about Cryptographic Hash Functions. Hash functions are easy to compute in one direction but extraordinarily hard to compute in the opposite direction. For example, the industry standard SHA-256 can be hashed in milliseconds but would theoretically take about 3.85 × 10²⁹ years to reverse, according to computer scientist Luke Dash Jr.

In the case of Merkle Trees, the hash function is used to calculate the hash of each segment of data which becomes the leaf nodes, then XOR each leaf node with sibling leaf/child nodes into parent nodes and so on, all the way up to the root node or Merkle Root. If each segment of the data is in tact and unchanged, the Merkle Root is exactly equal to the aggregate hash of the entire file or piece of data. If even one bit of information is changed in any segment, the hash is completely different which propagates up the tree, resulting in an entirely different Merkle Root. This root can then be compared with the actual/desired root and thus any inconsistency is detected.

“These numbers have nothing to do with the technology of the devices, they are the maximums that thermodynamics will allow. They strongly imply that brute-force attacks against 256-bit keys will be infeasible until computers are built from something other than matter and occupy something other than space.”

— Bruce Schneier, Cryptographer

Image for post
Image for post

Merkle _How_

Another place we see Merkle Trees being used is in the Distributed Version Control System Git. Git maintains and reconciles many different versions of a piece of software as it is built, changed and updated simultaneously by many contributors. Git is both fast and secure because instead of storing and comparing every instance of the actual software, it instead stores and compares the hashes of each segment and version in the form of a Merkle Tree. Git’s implementation is slightly more complicated as it XOR’s some additional nonces along the way, but the overall structure is that of a Merkle Tree.

Finally, Merkle Trees play a key roll in the construction of the Bitcoin blockchain. When a Bitcoin node broadcasts a transaction to the rest of the peer-to-peer network and is included in a block by a mining node, the miner hashes that Transaction ID along with every other Transaction ID in that block. Those hashes are then put into pairs and XOR’d into a new hash and so forth in the form of a tree, all the way up to the Merkle Root. The root is then XOR’d once more with some additional data pertaining to the block itself and the hash of the preceding block (forming the chain), which finally results in a new block hash. While it is impossible to infer the details any particular transaction from it, the details of all transactions in the block are needed to compute the final block hash.

Image for post
Image for post

Merkle _Root_

Written by

Data Science | Data Engineering | Python Development

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store