How Cassandra DB Works

03.02.2025

Cassandra DB is a powerful distributed NoSQL database that is designed to handle large amounts of data across many commodity servers, providing high availability and scalability. Here is how Cassandra DB works:

What Is Cassandra? Working, Features, and Uses

Data Distribution:

  • Partitioning: Data is partitioned across multiple nodes in a cluster using consistent hashing. Each node is responsible for a range of data based on the hash value of the partition key.
  • Replication: Data is replicated across multiple nodes to ensure high availability and fault tolerance. Cassandra uses replication strategies to determine how many replicas of each piece of data are stored and where they are located.

Read and Write Path:

  • Write Path: When a write request is received, the data is first written to a commit log for durability. Then, the data is written to an in-memory data structure called Memtable. Periodically, the Memtable is flushed to disk in an immutable data structure called SSTable.
  • Read Path: When a read request is received, Cassandra first checks the Memtable for the latest data. If the data is not found in the Memtable, Cassandra checks the SSTables on disk. Cassandra uses Bloom filters to quickly determine which SSTables may contain the data.

Gossip Protocol:

  • Node Discovery: Cassandra uses a gossip protocol to discover and communicate with other nodes in the cluster. Each node periodically exchanges state information with a few other nodes, spreading information about the cluster topology.
  • Failure Detection: The gossip protocol helps detect node failures by allowing nodes to gossip about the health of other nodes. When a node is marked as unreachable by a certain number of nodes, it is considered failed.

Compaction:

  • Compaction Process: Over time, SSTables accumulate deleted or obsolete data. Compaction is the process of merging and compacting these SSTables to free up disk space and improve read performance. Cassandra uses several compaction strategies to balance performance and disk space usage.
  • Compaction Strategies: Cassandra supports different compaction strategies such as SizeTieredCompactionStrategy, LeveledCompactionStrategy, and TimeWindowCompactionStrategy, each with its own trade-offs between read performance and disk space efficiency.

In conclusion, Cassandra DB’s architecture is optimized for high availability, fault tolerance, and linear scalability. By distributing data across multiple nodes, replicating data for resilience, and using efficient read and write paths, Cassandra can handle massive amounts of data with low latency and high throughput.

01 | Intro to Cassandra – Cassandra Fundamentals – YouTube
Apr 21, 2021 … Welcome to the Intro to Cassandra Crash Course! In this video series, we’ll go over the basics of how Apache Cassandra works and get hands …

Yan Hadzhyisky

fullstack PHP+JS+REACT developer