Azure Cosmos DB is Microsoft’s globally distributed, multi-model database service for mission-critical applications. It supports turn-key global distribution, elastic scaling of throughput and storage worldwide, single-digit millisecond latencies at the 99th percentile, five well-defined consistency levels, and guaranteed high availability, all backed by industry-leading SLAs. Azure Cosmos DB automatically indexes data without requiring you to deal with schema and index management.

Cosmos DB supports five different data models, one of which is the Cassandra API.  Let’s take a quick look at this API and see what it is all about.

What is Cassandra

First of all: What is Cassandra?

Cassandra, or more specifically Apache Cassandra, is an open-source distributed NoSQL database.  Cassandra delivers outstanding performance due to its distributed node design.  This design leads to a linear increase in both read and write performance as more nodes are added.  Another benefit to the design is that each node is identical because data is replicated across all nodes, meaning no single node represents a point of failure.  Based on Cassandra’s design, the CAP theorem would indicate that Cassandra focuses on the A and P guarantees – A for Availability and P for Partition tolerance – while sacrificing Consistency.

CAP theorem diagram
CAP theorem diagram

Cassandra shares some design qualities with popular NoSQL databases such as Amazon’s DynamoDB and Google’s Big Table.  Cassandra is a Column store database like Google’s Big Table.  However, Cassandra is also similar to Amazon’s DynamoDB in that it uses a key-value system in which the keys point to column families that represent the structure of the stored data.  Unlike Big Table and DynamoDB, Cassandra takes a different approach to query its data.  Cassandra employs a query language that shares many similarities to SQL called the Cassandra Query Language.  CQL helps ease the onboarding of developers who are already familiar with SQL with.  Let’s take a look at the CQL syntax.

Quick CRUD in CQL

One of the strengths of Cassandra is the CQL query language.  For developers with experience in SQL, the similarities found in CQL provides a gentler learning curve when using the language to query a Cassandra database.   Here is how common CRUD operations are performed in CQL.

(C)reate

INSERT INTO KeyspaceName.TableName (ColumnName1, ColumnName2, ColumnName3 . . . )
VALUES (Column1Value, Column2Value, Column3Value . . . )

 

KeyspaceName is equivalent to an object namespace in SQL.  Keyspaces are the containers that house column families.

Cassandra Keyspace diagram
Cassandra Keyspace diagram

(R)ead

SELECT ColumnName1, ColumnName2, ColumnName3 . . .
FROM KeyspaceName.TableName
WHERE ColumnName1 = Column1Value AND
    ColumnName2 = Column2Value AND
    ColumnName2 = Column2Value AND
  . . .

(U)pdate

UPDATE KeyspaceName.TableName
SET ColumnName1 = new Column1Value,
    ColumnName2 = new Column2Value,
    ColumnName3 = new Column3Value,
    . . .
WHERE ColumnName1 = ColumnValue

(D)elete

DELETE FROM KeyspaceName.TableName
WHERE ColumnName1 = ColumnValue

As a reminder CQL != SQL

As demonstrated above, the CQL syntax is similar to standard SQL.  However, Cassandra is not a traditional RMDB and CQL is not SQL, and for that reason, CQL is not a 1-for-1 substitute for SQL.  For instance, certain select queries can be considered invalid as filtering in these queries can produce unpredictable performance due to the possibility of large amounts of data needing to be scanned.  There are many other differences between CQL and SQL, but those cases generally are the results of the fundamental differences in the designs of the databases and the databases not being used to their core strengths.

Cassandra is not limited to using CQL to extract value from its data.  Other tools such as Apache Spark can be used to process Cassandra data in a variety of ways.

Cassandra on Cosmos DB

Now that we have a better idea of what Cassandra is, let’s consider how Cosmos DB comes into play.  Cosmos DB offers Cassandra-as-a-service.  And this service is compatible with existing Cassandra SDKs, drivers, and tools.   What this means is that with only a few modifications, an application utilizing Cassandra on-premise can easily start using Cassandra in Cosmos DB.  It’s great to know it is easy to start using Cassandra on Azure but then there is the question:  Why use Cassandra-as-a-Service?

Why use it?

What is so compelling about using Cassandra on Azure?  The answer lies in the combined strengths of both Cosmos DB and Cassandra.  Cassandra can handle high read/write throughputs; this makes it an excellent option for workloads such as messaging apps, event logging, IoT and metrics data collection, and the tracking of user activities which can be used for recommendation/personalization functionality as well as fraud detection.  Cosmos DB complements Cassandra’s ability to handle high throughputs by guaranteeing the consistency of throughputs.

Another use case for Cassandra is when there is a need for high availability and tolerance of outages.  Once again Cassandra’s distributed design ensures there is no single point of failure and when backed by Cosmos DB’s guarantee of service availability, mission-critical workloads can be reliably hosted on Azure.

There are more reasons than availability that make Azure a good home for Cassandra databases.  Azure provides numerous services and technologies as well as migration strategies for moving on-premise resources to the cloud.  And moving to Azure helps reduce one of the most challenging and resource-intensive parts of software development, which is managing IT resources.  The same can be said for moving a Cassandra DB to Cosmos DB.  By using the Cassandra-as-a-Service on Cosmos DB, the need to manage Cassandra clusters is essentially removed from the development process.  Cosmos DB can automatically scale a Cassandra cluster to add more nodes as needed without requiring manual configuration of the nodes.   Helping to remove the tedious and time-consuming tasks in the development process which allows IT personnel to focus on delivering value instead of managing resources.

Cassandra in Cosmos DB is employed effectively by numerous Fortune 500 companies, including Symantec, who uses it as the data store for a sophisticated reputation management system. The ease of migrating and integrating with Cassandra-as-a-Service in conjunction Azure’s rapid resource deployment made for compelling reasons that Symantec chose to use Cosmos DB. A large world-wide retailer uses Cassandra and Cosmos DB to house an extensive catalog of products, and another Fortune 500 company uses them to collect and analyze vast amounts of data from IoT devices.

Outie

Cosmos DB, the multi-data model, multi-API data store on Azure is a Swiss army knife of versatility.  Since the Cassandra API became Generally Available at the end of September 2018 the doors have been opened for existing, and new Cassandra workloads to exist on Azure.

Next Steps

Find below a few links to dive deeper into the details on Cassandra and Cosmos DB.  Happy Coding.