Cassandra is a decentralized No-Sql database. It works on multi node cluster where every node is identical to every other node (server symmetry – all node features same). There is no master node concept, as in Hadoop, hence there is no single point of failure.
A few features/terms
- Elastic scalability: able to scale, up or down, dynamically without restart or disruption of services.
- Consistency level: decide when to consider transaction successful.
- Tunable consistency (strict, casual, and weak), which is inversely proportional to availability.
- Stores data in multidimensional hash table.
- Schema free: model requires queries and then work on data.
- Designed to take advantage of multiprocessor/core machines.
- optimized for excellent throughput for write.
- Column, contains name value pair, is a most basic unit of data structure. Columns are stored in following way – name:value:timestamp. There is no need to define columns and they can be added anytime.
- Column Family is a container for rows with similar column sets (analogous to sql tables).
- Ordered collection of rows, each of which is ordered collection of column columns.
- Different from RDBMS table in a way that cassandra is schema free – where columns can be added anytime.
- Column families are each stored in separate files.
- Each row, which is a collection of values, has unique row key.
- Keyspace is a outer most container for data, and has a name and a set of attributes. Think of it like sql DB – container for one or more column families. Basic attributes that can be set per keyspace
- Replication factor: No of nodes that will act as copies of each data
- Replica placement strategy – how to place replicas
It is a internal keyspace that cannot be modified. It stores metadata for local node as well as for handoff information. There data is maintained in two DBs: Master and TempDB. Master contains information about diskspace, usage, sys settings, and internal nodes. While TempDB is used for intermediary results and to perform general tasks.
Peer to Peer (P2P)
P2P is optimized for write (note that master slave – one used in Hadoop – is optimized for read). The aim of Cassandra’s design is overall system availability and ease of scaling. As mentioned earlier each node is identical and new nodes can be added dynamically.
Gossip and Failure Detection
Gossip and failure detection is used to support decentralization and fault tolerance for intra-ring communication, so that each node can have information about other nodes. Each node, when started, registers itself with gossiper. In short cassandra gossip is used for failure detection.
Anti Entropy and Read Repair
Anti entropy is a replica synchronization mechanism in cassandra. It’s based on merkle tree (binary tree that summarizes short form of data in larger data set), where each column family has its own merkle tree. Anti entropy algo works after each update.
To read data cassandra connects to any node in cluster and based on the consistency level no of nodes are read. Read operation remains blocked until consistency level is met. During read if any/some node(s) return outdated data, cassandra returns the most recent one.
How write takes place – Memtable, SSTable, and Commit logs
On write operation data is immediately written to the commit log, which is crash recoverable. Any write is not counted as successful until it’s written to commit log. After written to commit log, data is written to memtable, a memory resident DS. After content in memtable reach threshold, the content are flushed to disk in a file called SSTable. Here multiple memtables can exist for single column family, one current and rest waiting to be flushed, but only one commit log is written across entire server. Once memtable is flushed to disk as an SSTable, it’s immutable and cannot be changed by application.