Skip to content

Tor: the 2nd Generation Onion Router Paper Notes

This blog notes brief ideas about Tor onion router design based on the paper of Tor: the 2nd Generation Onion Router. This blog will introduce what is tor, and how it's designed to work.

Abstract

Tor is the acronym of "the Onion Router", is a circuit-based low-latency anonymous communication service.

Tor requires no special privileges or kernel modifications. The core mission of Tor is to protect the users from being tracked based on their visiting activities. Tor hides IP address while accessing network services.

Overview: Introduction of Core Concepts

Onion Routing is a distributed overlay network designed to anonymize TCP-based applications.

The circuit and cell are significant in Tor due to the requirements of encrypting.

  • circuit: a connection across multiple relays(ORs) about transportation among relays.
  • cell: the packet unit encrypted by symmetric keys of each ORs in fixed size.

Soon we will see these two concepts tightly concern about the key exchange and encryption. Several features are provided by Tor network, and I listed some important points here:

  • Perfect forward secrecy: Incrementally encrypts across relays.
  • Separation of “protocol cleaning” from anonymity: Additional proxy is needed.
  • Leaky-pipe circuit topology: any relay might be used as an exit relay on a circuit.
  • Directory servers: Centralize directory servers instead of flooding information through the network.
  • End-to-end integrity checking.
  • Rendezvous points and hidden services: Hidden services serves under .onion domain.

Overall Design

The Tor network is an overlay network.

  • OR: Onion router in charge of forwarding requests from OP/OR to the end server or next OR.
  • OP: onion proxy used by users to interact with the onion network.

tor-or-op.png

Each onion router maintains a long-term identity key and short-term onion keys.

  • The identity key is used to sign TLS certificates for OR identity.
    It's used to sign the OR’s router descriptor (a summary of its keys, address, bandwidth, exit policy, and so on), and (by directory servers) to sign directories.

  • The onion key is used to decrypt requests from users to set up a circuit and negotiate ephemeral keys.

Cells

Onion routers communicate with one another, and with users’ OPs, via TLS connections with ephemeral key by cells. Cell ensures the perfect forward secrecy.

Traffic passes along these connections in fixed-size cells. Each cell is 512 bytes, and consists of a header and a payload.

The header contains a circuit identifier that specifies which circuit the cell refers to, and a command to describe what to do with the cell's payload.

  • control cell: interpreted by the node who receives it. It has padding, create and destroy control cell.

  • relay cell: carries end-to-end stream data. The relay cell has additional header(the relay header) containing the streamID, and end-to-end checksum for integrity, and a relay command. The entire relay header and payload are encrypted and decrypted as the cell moves along the circuit using a 128-bit AES cipher.

tor_cell_layout.png

Circuit

Tor allows streams multiplexes the circuit, and it rotates the circuit per minutes. In this section, we will discuss how the circuit is created and closed.

Create Circuit

A user’s OP constructs circuits incrementally, negotiating a symmetric key with each OR on the circuit, one hop at a time. The establishment of circuit is the process of exchanging key through DH.

tor_circuit_create.png

One thing to note is that the public key of OR is retrieved from the directory server, and the tor browser/proxy embedded the public keys of the directory servers to verify. That's also why using the authentication browser is crucial.

Extend Circuit

Once the OP creates a circuit with a OR, we can extend the circuit by connecting more ORs. This means the OP and extended ORs doesn't communicate directly, but talk through the intermediate ORs. Note that even they don't communicate directly, the OP knows the extended ORs address through directory server.

The OP sends a relay extend cell specifying the address of next \(OR_2\) retrieving form the directory server, and a \(g^x_2\) encrypted by the public key of \(OR_2\). The OR copies the half-handshake into a create cell and pass it to the \(OR_2\), while choosing a new circuit ID which is not used by the OP. Forwarding messages by \(OR\), \(OP\) and \(OR_2\) exchange the symmetric key and extend the circuit securely.

Note that the OR here doesn't know anything about the content of handshake from OP because it's encrypted by the public key of \(OR_2\) retrieved from the directory server, and the directory server is verified by the public key embedded in the Tor browser.

By this way, the Tor network extends its circuit across multiple onion routers.

Relay Cells

Once the circuit is established, sending relay cells is available. Upon receiving a relay cell, an OR looks up the corresponding circuit, and decrypts the relay header and payload with the session key for that circuit.

The OR checks the digest for the decrypted incoming cell, then looks up its maintained circuits for the next step of the circuit. Then it replaces the cirId as appropriate, and sends the cell out. The circuit will be torn down if the end of circuit receives an unrecognized relay cell.

To construct a relay cell addressed to a given OR, the end user's OP:

  • assigns the digest(end-to-end integration check).
  • iteratively encrypts the cell payload(the relay header and payload) with the symmetric key of each hop up to that OR.

This model quite likes an onion model. THe real data are encrypted multiple times, as the layers of leaves in an onion.

Tear Down Circuit

There are two ways to tear down circuits. The first way is to send a destroy cell to the OR, and the OR will close all streams on this circuit and then pass a new destroy cell forward.

The second way is to destroy the circuit incrementally like the circuit extension. The OP could send a relay truncated cell to a single OR on a circuit, and that OR sends a destroy cell forward while sending back a relay truncated cell as acknowledgement.

By this way, the OP can then extend the circuit to different nodes, without signaling to the intermediate nodes (or a limited observer) that it has changed its circuit.

Streams

The streams refer to the case when the OP wants to make a TCP connection to a given address and port. When a user wants to establish a TCP connection, it asks the OP via SOCKS to make the connection. The OP chooses the newest open circuit on that circuit to be the exit node(usually the last one but depends on the exit policy).

After choosing the exit node, the OP opens the stream by sending a relay begin cell to the exit node through the circuit, using a new random streamID. Once the exit node connects to the remote host, it responds with a relay connected cell.

Upon receipt, the OP sends a SOCKS reply to notify the application of its success. The OP now accepts data from the application’s TCP stream, packaging it into relay data cells and sending those cells along the circuit to the chosen OR

img.png

Hostname Resolving

During using SOCKS, some applications passes the alphanumeric hostname into Tor directly, while others resolve it into the IP address first. This might be a flaw because it might be tracked by censoring.

Rendezvous Points and Hidden Services

Rendezvous points are a building block for location-hidden services (also known as responder anonymity) in the Tor network. The services under virtual domain .onion serves based on the rendezvous points mechanism.

Tor provides location-hiding by allowing a service to advertise several ORs(onion routers), or its introduction points as contact points. He may do this on any robust efficient key-value lookup system with authenticated updates, such as a distributed hash table (DHT).

The OP chooses an OR as its rendezvous point to connect one of the location-hiding service to establish a circuit.

Rendezvous Points in Tor

The hidden service, called Bob, and the user Alice will perform the following steps to communicate. First of all, Bob needs to advertise his service into the lookup service and several ORs. The advertised ORs are regarded as introduction points as well.

  • Bob generates the long-term public key pair to identify his service
  • Bob chooses some OR(introduction points), and advertises them on the lookup service, signing the advertisement with his public key.
  • Bob builds circuits to his introduction points(ORs), and tell them to wait requests.

Then, Alice needs to access Bob's service through Tor network via the onion domain by a rendezvous point (it's an OR), where the onion domain is analyzed by the OP by looking up the lookup service.

  • Alice knows the hidden service from somewhere, and then she retrieves the details of Bob's service, which is advertised by Bob, from the lookup service.
  • Alice chooses an OR as the rendezvous point(RP) for her connection to Bob's service. She builds a circuit to the RP and then randomly choose a "rendezvous cookie".
  • Alice opens an anonymous stream to Bob's introduction points, and sends Bob a message encrypted by Bob's public key to exchange symmetric key.
  • If Bob wants to communicate with Alice, he builds a circuit to Alice's RP and sends the rendezvous cookie, the second-half of DH handshake, and a hash of session key.
  • The RP connects Alice's circuit to Bob's.
  • Alice sends a relay begin cell to the circuit and the cell arrives to Bob's service OP.

tor_rendezvous_points.png

Integration

As the location hiding service requires to advertise itself to multiple ORs to make them as its introduction points, additional proxy is needed to know the local IP address, port and its public key. The onion proxy anonymously publishes a signed statement of Bob’s public key, an expiration time, and the current introduction points for his service onto the lookup service, indexed by the hash of his public key.

This strategy ensures the web server is unmodified, and it doesn't even know that it’s hidden behind the Tor network.

The end users applications also work unchanged, which remains a SOCKS proxy. All necessary information is encoded into the fully qualified domain(FQDN). It's in the famous format as a hostname with .onion top domain.

The encoded FQDN is in format of x.y.onion, where x is the optional authorization cookie and y encodes the hash of the public key. Alice’s onion proxy examines addresses; if they’re destined for a hidden server, it decodes the key and starts the rendezvous as described above.