Distributed Systems is a term used to describe systems that are spread across multiple nodes connected over the network. Data is typically stored on more than one node in the distributed system designed for the cloud. It is easy to scale a distributed system horizontally by adding more nodes to increase the storage and computing capacity of the system. Another benefit of designing a distributed system for the cloud is that it greatly reduces the risk of a single point of failure since the data and compute is spread across multiple nodes providing a high availability and fault tolerance.
CAP Theorem
Designing a distributed system for the cloud is challenging since the network is always assumed to be unreliable and can cause a part of the system to stop communicating with the rest of the system. Network partitioning in a distributed system can have an impact on the availability of the system or it could cause the data to become inconsistent.
CAP theorem which is also known as Brewer’s theorem states that any system with distributed data has to choose between consistency or availability of the system when a network partition occurs within the system. It also means that a system with distributed data can’t be built with both consistency and availability guaranteed at the same time. If both consistency and availability are important for the system and it needs to implement ACID transactions, then the data can’t be spread across multiple nodes. The system should leverage a relational database that can provide strong data consistency.
Consistency means that all clients see the same data at the same time irrespective of the data being partitioned across multiple nodes. Every read should return the most recent write
Availability means that clients will always receive a valid response. It is not guaranteed that the response has the most recent data.
Partition Tolerance means that the system continues to operate even in the presence of network failure when some of the nodes in the distributed system can’t communicate with each other
Essentially it’s about how the system should react when a network partition occurs which is always assumed for a cloud platform with distributed nodes.
As per the CAP theorem, any distributed system in the cloud needs to be designed for either AP (Availability and Partition Tolerance) or CP (Consistency or Partition Tolerance).
- AP – Systems that doesn’t guarantee strong consistency for the data. The preference is given to availability over data consistency.
- CP – Systems that doesn’t guarantee high availability. The preference is given to strong data consistency over the availability of the system.
CAP Theorem Limitation for Cloud Systems
CAP theorem is based on network partitioning which is always assumed for a system hosted in the cloud. However, distributed systems hosted in the cloud need to be designed for eventual consistency which may not always be due to network partitions. Systems that are spread across multiple availability zones in a given region or across multiple regions have network latency which needs to be taken into account. Any system designed for high availability can have eventual data consistency which can be tuned based on the number of replicas created for the data and whether there is a single primary data partition that is responsible for serving the data.
There are use cases where data consistency is critical and hence the request from the client will be declined (impacts availability) until the data is consistent across all the nodes in the network within the same region and across multiple regions in the cloud. Some use cases can work with eventual consistency of the data. Even if the data hasn’t been replicated to all the nodes in the same region or across multiple regions, the client request is processed immediately while giving preference to availability over data consistency.
PACELC Theorem
The PACELC theorem was introduced to address the limitation of CAP theorem which always assumes network partitioning in a distributed system. PACELC theorem extends CAP theorem to also include latency within a distributed system hosted in the cloud across multiple AZs and Regions.
In the case of a network partition (P), there is a choice between availability (A) and data consistency (C) or else (E), when there is no network partition, the choice is between latency (L) or data consistency (C).
Cloud applications built for high availability should be able to deal with the eventual consistency of data. Modern cloud systems build for scale still expect data to be consistent, however majority of the use cases may not always require strong data consistency. Distributed systems can be built for both eventual consistency and high availability while addressing specialized use cases that need strong data consistency in certain business domains like payments.
Thank you for taking the time to create this post and share your knowledge, it is very much appreciated!
A distributed system is a system that is spread across multiple nodes connected over the network. Data is typically stored on more than one node in the distributed system designed for the cloud. It is easy to scale a distributed system horizontally by adding more nodes to increase the storage and computing capacity of the system.
Another benefit of designing a distributed system for the cloud is that it greatly reduces the risk of a single point of failure since the data and compute is spread across multiple nodes providing a high availability and fault tolerance.
Wayne
LikeLike