Time, Clocks and the Ordering of Events in Distributed Systems

Have you ever wondered what it takes to tell which event happened before which in Distributed Systems? This concept was examined in Time, Clocks and the Ordering of Events in Distributed Systems paper. It was published in 1978 by Turing Award Winner Leslie Lamport. This paper created a paradigm shift in how we think about Distributed Systems and also became one of the most cited papers in computer science.

Before we dive into the details of this paper, let us first understand the definition of a Distributed System. There are a lot of definitions out there on the web, but let us use the below definition for this blog post.

What is a Distributed System?

A Distributed System is a collection of independent computers which talks to on another using unreliable network and also performs its own tasks. Any system that doesn’t communicate with others and functions on its own is not a Distributed System.

DistributedSystem

Why does the ordering of the events important in Distributed Systems?

Let us say multiple processes are trying to access one resource which can be used by only one process at a time. A process requesting for the resource must be granted only when there is no other process waiting beforehand to acquire the resource. In such cases, ordering of events globally among multiple processes is necessary to avoid any inconsistencies.

System Clocks

In our daily life, we use the physical time to order events. For example, we say that something happened at 8:15 if it happened after the clock read 8:15 and before it read 8:16. The same can’t be done in Distributed Systems because System Clocks are not perfect and they don’t keep precise time.

In real life, we can say that an event \(a\) happened before event \(b\) if timestamp of \(a\) is before \(b\). In Distributed Systems, we can’t use the physical time to order events because of the above problems. So, Lamport defined a Happened before relation without using physical clocks.

DistributedSystem

Partial Order

Lamport defined the distributed system more precisely as follows:

Let us use right arrow(\(\rightarrow\)) to denote Happened Before relation. By using the above definition, he defined the Happened Before as follows.

The Happened Before(\(\rightarrow\)) relation among the events satisfy the below conditions

Two different events \(a\) and \(b\) said to be concurrent, if \(a \nrightarrow b\) and \(b \nrightarrow a\). For any event \(a\), \(a \nrightarrow a\). This means that Happened Before(\(\rightarrow\)) is irriflexive partial order (Irriflexive, Anti Symmetric and Transitive) on set of all events in system. Another way of viewing the definition of \(a \rightarrow b\) is to say that it is possible for event \(a\) to casually affect event \(b\).

DistributedSystem

In the above image, vertical lines are individual processes, dots are events, wavy arrows are messages, and time increases from bottom to top. \(p1 \rightarrow p2\) because they occurred in the same process. \(q1 \rightarrow p2\) because q1 is message sent by process \(Q\) and \(p2\) is receipt of same message by process \(P\). \(p1 \rightarrow r3\) because \(p1 \rightarrow q2\) , \(q2 \rightarrow q4\) and \(q4 \rightarrow r3\). Because time is increasing from bottom to top, \(q3\) seems to have happened before \(p3\) if we consider physical time. But we treat them as concurrent events in this system because there is no causal relationship between them.

Logical Clocks

Now let us order the events in the system according to Happened Before(\(\rightarrow\)) relation using logical clocks. A logical clock is a way of assigning a number to an event which can be thought of as the time at which that event happened. Let us define a clock \(Ci\) for each process \(Pi\) as a function which assigns a number \(Ci(e)\) to each event \(e\) in that process \(Pi\). System of Clocks can be represented by \(C\) which assigns a number \(C(b)\) for any event \(b\), where \(C(b)=Cj(b)\) if \(b\) is an event in process \(Pj\).

We can evaluate the correctness of logic clocks using the Clock Condition. For any two events \(a, b:\) if \(a \rightarrow b\) then \(C(a) < C(b)\). For example, logical clock must satisfy the condition \(C(p1) < C(r3)\) for \(p1 \rightarrow r3\) in the above image. Also, for every \(a, b\) if \(C(a) < C(b)\) then it doesn’t mean \(a \rightarrow b\). In the above image \(C(q3) < C(p3)\) but \(q3\) and \(p3\) are concurrent and not casually related.

Clock Condition is satisfied if the following two conditions hold:

Logical Clock Algorithm

Our Logical Clock algorithm satisfies the Clock Condition which means it is consistent with Happened Before(\(\rightarrow\)) relation.

Ordering events totally

Lamport’s Algorithm for Mutual Exclusion in Distributed System

Let us say there are a fixed number of processes trying to acquire a resource and only one process can use the resource at a time. So the processes must synchronize themselves to avoid any conflict. This is nothing but a variant of the mutual exclusion problem.

We have to come up with an algorithm that grants the resource to a process that must satisfy the below conditions

Why can’t we have a centralized scheduler to synchronize among processes? The centralized scheduler won’t satisfy the second condition.

Let us look at the following scenario:

DistributedSystem

DistributedSystem

DistributedSystem

We can solve this problem by using Lamport’s Logical clock algorithm which gives the total order of all request and release operations. Before moving forward with the solution, let us make some assumptions

The algorithm is defined by the following 5 rules:

DistributedSystem

Conclusion

References