Running Apache Cassandra with Docker

by Gautham Pai

Apache Cassandra is a powerful, distributed NoSQL database that can handle large amounts of data across many commodity servers. This guide will walk you through setting up Cassandra using Docker, interacting with it, and performing basic operations.

1. Installing Cassandra via Docker

Docker provides a convenient way to run Cassandra without needing to install it directly on your machine. Ensure that you have Docker installed. You can continue with the following steps after the Docker installation.

1.1 Pulling the Cassandra Image

First, you need to download the latest Cassandra image from Docker Hub:

docker pull cassandra

Once downloaded, verify that the image is present on your system:

docker images

1.2 Running a Cassandra Container

With the image ready, you can now run Cassandra in a Docker container:

docker run --name cassandra -d cassandra

Verify that the container is running:

docker ps

2. Connecting to the Cassandra Container

To interact with the Cassandra instance running inside the container, you'll need to connect to it. This can be done by executing a shell inside the container:

docker exec -it cassandra /bin/bash

Once inside, you can see the running processes, including the Java process that represents the Cassandra server:

ps aux

3. Troubleshooting Cassandra

Here are some common troubleshooting steps for Cassandra:

3.1 Checking Logs

You can view the Cassandra logs to diagnose issues:

ls /var/log/cassandra/
tail -f /var/log/cassandra/*.log

To stop viewing the logs, use Ctrl + C.

3.2 Checking Listening Ports

You can check where Cassandra is listening using:

ss -nltp

Look for an entry like 0.0.0.0:9042—this indicates that Cassandra is listening for client connections on port 9042.

4. Useful Cassandra Commands

Once inside the container, Cassandra provides several tools and commands that you can use:

4.1 Available Tools

To see the available tools, you can use the following commands:

nodetool<TAB>
sstable<TAB>
cqlsh<TAB>
ls /opt/cassandra/tools/bin

4.2 Describing the Cassandra Cluster

You can learn more about your Cassandra cluster with the following command:

nodetool describecluster

In a Docker setup, a "node" refers to one container running a Cassandra server. Use the following commands to get more details:

nodetool status
nodetool ring

5. Overview of cqlsh

Cassandra Query Language Shell (cqlsh) is a command-line interface for interacting with your Cassandra database.

5.1 Connecting to Cassandra

By default, cqlsh connects to 127.0.0.1:9042:

cqlsh

You'll see information about the version of Cassandra you are connected to, like so:

[cqlsh 6.1.0 | Cassandra 4.1.3 | CQL spec 3.4.6 | Native protocol v5]

5.2 Basic CQL Commands

CQL (Cassandra Query Language) is similar in syntax to SQL. Here are some basic commands to get you started.

5.2.1 Creating a Keyspace

A keyspace in Cassandra is a namespace that defines data replication on nodes:

CREATE KEYSPACE peoplekeyspace WITH REPLICATION = {
  'class' : 'SimpleStrategy', 'replication_factor' : 1
};

Verify that the keyspace has been created:

DESC KEYSPACES;
DESC KEYSPACE peoplekeyspace;

5.2.2 Creating a Table

Tables in Cassandra must have a primary key, which uniquely identifies records and is also used for sharding:

CREATE TABLE people (
  id int PRIMARY KEY,
  name text,
  email text
);

You can verify the creation of the table with:

DESC TABLES;
DESC TABLE people;

5.3 Basic CRUD Operations

Cassandra supports basic DML operations: Create, Read, Update, and Delete (CRUD).

5.3.1 Inserting Data

To insert data into the people table:

INSERT INTO people (id, name, email) VALUES (1, 'John', '[email protected]');

5.3.2 Reading Data

You can retrieve data using the following queries:

SELECT * FROM people;
SELECT * FROM people WHERE id = 1;
SELECT name FROM people WHERE id = 1;
SELECT name, email FROM people WHERE id = 1;

5.3.3 Updating Data

Updates must target specific rows identified by the primary key:

UPDATE people SET name = 'John Doe' WHERE id = 1;

Surprisingly, re-inserting with the same primary key acts as an update:

INSERT INTO people (id, name, email) VALUES (1, 'John Doe', '[email protected]');

5.3.4 Deleting Data

To delete a row:

DELETE FROM people WHERE id = 1;

5.4 Advanced Query Concepts

Cassandra has certain constraints that can make querying different from SQL databases:

  • No range queries: Cassandra requires exact keys for hashing, so range queries (WHERE id > 1) are not supported.

  • ALLOW FILTERING: Inefficient queries, like those that don't use the primary key, require ALLOW FILTERING:

    SELECT email FROM people WHERE name = 'John' ALLOW FILTERING;
    
  • IN Queries: Supported for specific values:

    SELECT name, email FROM people WHERE id IN (1, 2);
    

5.5 Dropping Tables and Keyspaces

Finally, to remove a table or keyspace:

DROP TABLE people;
DROP KEYSPACE peoplekeyspace;

Test Your Knowledge

No quiz available

Tags