← head back

bittorrentnodejs

BitTorrent Protocol in 25 minutes

29/1/2023

Summary

In this article you will learn how:

  1. The network discovers nodes
  2. BitTorrent's wire protocol works
  3. Nodes exchange torrent metadata
  4. Data is transferred in pieces

 

The Bittorent Network

Bittorrent is a decentralized network of nodes. Each discovers relevant nodes through a p2p network called Kademlia.

Nodes share data in pieces that are small and easy to transfer. Torrent's data can then be reconstructed from these pieces.

Node Discovery

Bittorrent uses a distributed hash table (DHT), named Kademlia, to discover relevant nodes. Nodes in Kademlia either own the desired data or know other nodes that are closer to the data. A bootstrap node, which is hardcoded, is used to start the process.

Discovering new nodes and peers in the Bittorrent network

Takeaway 1: Bittorrent nodes are discovered recursively from a hardcoded bootstrap node.

The following script starts from a hardcoded bootstrap node and recursively discovers new nodes with get_peers messages. Discovered nodes get closer at each step until a node that owns the data is found.

Connect to the bootstrap node

The script above has 1 dependency.

DHT Wire Protocol

The DHT nodes communicate over UDP. Each UDP message contains the message type and a payload.

The bittorrent wire protocol network

Takeaway 2: Bittorrent messages are sent over UDP and contain a payload and some metadata.

The get_peers returns nodes closer to a torrent hash, find_node returns nodes closer a node id and announce_peer indicates that a node is downloading a torrent hash. Then ping is sent periodically to check that a node is online.

Bittorrent wire protocol version handshake syn ack messages Takeaway 3: The get_peers message is used to find nodes close to torrent hashes. The ping message is sent periodically to show the node is alive.

 

The following script finds nodes containing a torrent hash by sending get_peers recursively.

Discover DHT nodes

The script above has 1 dependency.

Bittorrent Wire Protocol

The Bittorrent nodes communicate over TCP. Each TCP message contains some metadata and a payload. The metadata are the payload length, the message id and the message payload.

The bittorrent wire protocol network

Takeaway 4: Bittorrent messages are sent over TCP and contain a payload and some metadata.

First a handshake message should be send to establish connection. Then interested and unchoke signals that the node wants a torrent hash and is willing to send data respectively. The bitfield message returns information about which pieces it owns and piece message returns the data at a specific offset.

Bittorrent wire protocol version handshake syn ack messages Takeaway 5: The handshake message is used to establish connection. Then interested, unchoke and bitfield messages are used to establish interest and piece messages to transfer the torrent.

 

We can establish connection to a Bittorrent node by sending handshake, interested and unchoke messages. Then piece messages can be used to request data from the node.

Connect to a BitTorrent node

The script above has 1 dependency.

Files and Pieces

Torrents are transferred in smaller pieces for efficiency. These pieces are in turn split into even smaller units called blocks. When all blocks constituting a piece are received the data are checked for correctness using the piece hash. To know the piece size a BitTorrent node need the torrent's metadata.

When the BitTorrent network was first introduced in 2001 it was using specialized servers, called trackers, to shared the torrent metadata. Trackers where storing the torrent's piece hashes, peers information, information about included files, the torrent size and others but not the torrent's actual data.

The BitTorrent's DHT was introduced in 2008 with BEP_0005 to eliminate centralized parties and allow the network to transfer files in a completely decentralized manner. Clients establish connection with BitTorrent nodes with a handshake, then an extended handshake (as in BEP_0010) and then a ut_metadata request (as in BEP_0009) for the torrent metadata.

Bittorrent address derivation private key Takeaway 6: The torrent's metadata can also be downloaded from nodes in the DHT without a need for a central party.

 

Clients establish a connection first with a handshake and an interested message. When they receive an unchoke message they request data from other nodes with piece messages. For better download performance clients can connect to multiple nodes in the network and request piece data in parallel. Downloaded piece are hashed and these hashes are checked for correctness against pieces' hashes from the torrent's metadata.

Bittorrent address derivation private key Takeaway 7: Files in Bittorrent are transferred in pieces from multiple nodes in parallel.

 

The script bellow connects to a Bittorrent node and request the first piece of a torrent.

Download from a BitTorrent node

The script above has 1 dependency.

Related Posts

Andreas Tzionis @

Github

Twitter

LinkedIn