Cassandra DB replication
05.11.2024
Cassandra DB Replication
Introduction
Tutorial#5 Cassandra architecture – how to identify nodes to write data
May 4, 2019 … In this tutorial, I will explain the architecture of Cassandra NoSQL and how Cassandra writes data into a cluster node using the Simple …
Cassandra is a distributed NoSQL database system known for its high availability and fault tolerance. Replication in Cassandra is crucial for ensuring data durability and availability in case of node failures. In this article, we will explore how Cassandra handles replication and the strategies it offers.
Replication Factor
- Replication Factor: The replication factor in Cassandra determines the number of nodes that will have a copy of the data. It is defined at the keyspace level and can be adjusted based on the desired level of redundancy and fault tolerance.
- Consistency Level: Consistency level defines how many replicas must respond to a read or write operation to consider it successful. It can be set per operation and helps in balancing between consistency and availability.
- Network Topology Strategy: Cassandra uses the Network Topology Strategy for placing replicas based on the data center and rack awareness. This strategy ensures that replicas are distributed across different racks and data centers for fault tolerance.
Write Path
- Write Path: When a write operation is performed in Cassandra, the data is first written to the commit log and memtable in the local node. Then, the data is asynchronously replicated to the specified number of replica nodes based on the replication factor.
- Hinted Handoff: In case a replica node is unavailable during the write operation, Cassandra uses Hinted Handoff to temporarily store the write until the replica node becomes available. This ensures that no data is lost during temporary node failures.
- Read Repair: Cassandra performs Read Repair in the background to reconcile any inconsistencies between replicas. It compares the data from multiple replicas and updates them to ensure consistency.
Read Path
- Read Path: When a read operation is performed in Cassandra, the coordinator node sends the request to the replicas based on the consistency level. The data is then fetched from the replicas and returned to the client.
- Read Repair: If the consistency level is not met, Cassandra performs Read Repair by comparing the data from multiple replicas and updating them to ensure consistency before returning the result to the client.
- Quorum Reads: Quorum reads require responses from a majority of replicas to ensure strong consistency. It provides a balance between consistency and availability by allowing the system to continue functioning even if some replicas are unavailable.
Replication Strategies
- Simple Strategy: The Simple Strategy places replicas on different nodes in the cluster without considering data center or rack information. It is suitable for single data center deployments.
- Network Topology Strategy: The Network Topology Strategy considers data center and rack information to place replicas for fault tolerance and high availability. It is recommended for multi-data center deployments.
- Local Strategy: The Local Strategy is used for workloads that require data to be stored and replicated within the same data center. It ensures low latency and high availability for local reads and writes.
Conclusion
In conclusion, Cassandra’s replication mechanisms play a critical role in ensuring data durability, fault tolerance, and high availability. By understanding the replication factor, consistency level, and strategies available, developers can design robust and scalable systems using Cassandra.