Apache Cassandra is a powerful, distributed NoSQL database that can handle large amounts of data across many commodity servers. This guide will walk you through setting up Cassandra using Docker, interacting with it, and performing basic operations.
1. Installing Cassandra via Docker
Docker provides a convenient way to run Cassandra without needing to install it directly on your machine. Ensure that you have Docker installed. You can continue with the following steps after the Docker installation.
1.1 Pulling the Cassandra Image
First, you need to download the latest Cassandra image from Docker Hub:
docker pull cassandra
Once downloaded, verify that the image is present on your system:
docker images
1.2 Running a Cassandra Container
With the image ready, you can now run Cassandra in a Docker container:
docker run --name cassandra -d cassandra
Verify that the container is running:
docker ps
2. Connecting to the Cassandra Container
To interact with the Cassandra instance running inside the container, you'll need to connect to it. This can be done by executing a shell inside the container:
docker exec -it cassandra /bin/bash
Once inside, you can see the running processes, including the Java process that represents the Cassandra server:
ps aux
3. Troubleshooting Cassandra
Here are some common troubleshooting steps for Cassandra:
3.1 Checking Logs
You can view the Cassandra logs to diagnose issues:
ls /var/log/cassandra/
tail -f /var/log/cassandra/*.log
To stop viewing the logs, use Ctrl + C
.
3.2 Checking Listening Ports
You can check where Cassandra is listening using:
ss -nltp
Look for an entry like 0.0.0.0:9042
—this indicates that Cassandra is listening for client connections on port 9042.
4. Useful Cassandra Commands
Once inside the container, Cassandra provides several tools and commands that you can use:
4.1 Available Tools
To see the available tools, you can use the following commands:
nodetool<TAB>
sstable<TAB>
cqlsh<TAB>
ls /opt/cassandra/tools/bin
4.2 Describing the Cassandra Cluster
You can learn more about your Cassandra cluster with the following command:
nodetool describecluster
In a Docker setup, a "node" refers to one container running a Cassandra server. Use the following commands to get more details:
nodetool status
nodetool ring
5. Overview of cqlsh
Cassandra Query Language Shell (cqlsh) is a command-line interface for interacting with your Cassandra database.
5.1 Connecting to Cassandra
By default, cqlsh
connects to 127.0.0.1:9042
:
cqlsh
You'll see information about the version of Cassandra you are connected to, like so:
[cqlsh 6.1.0 | Cassandra 4.1.3 | CQL spec 3.4.6 | Native protocol v5]
5.2 Basic CQL Commands
CQL (Cassandra Query Language) is similar in syntax to SQL. Here are some basic commands to get you started.
5.2.1 Creating a Keyspace
A keyspace in Cassandra is a namespace that defines data replication on nodes:
CREATE KEYSPACE peoplekeyspace WITH REPLICATION = {
'class' : 'SimpleStrategy', 'replication_factor' : 1
};
Verify that the keyspace has been created:
DESC KEYSPACES;
DESC KEYSPACE peoplekeyspace;
5.2.2 Creating a Table
Tables in Cassandra must have a primary key, which uniquely identifies records and is also used for sharding:
CREATE TABLE people (
id int PRIMARY KEY,
name text,
email text
);
You can verify the creation of the table with:
DESC TABLES;
DESC TABLE people;
5.3 Basic CRUD Operations
Cassandra supports basic DML operations: Create, Read, Update, and Delete (CRUD).
5.3.1 Inserting Data
To insert data into the people
table:
INSERT INTO people (id, name, email) VALUES (1, 'John', '[email protected]');
5.3.2 Reading Data
You can retrieve data using the following queries:
SELECT * FROM people;
SELECT * FROM people WHERE id = 1;
SELECT name FROM people WHERE id = 1;
SELECT name, email FROM people WHERE id = 1;
5.3.3 Updating Data
Updates must target specific rows identified by the primary key:
UPDATE people SET name = 'John Doe' WHERE id = 1;
Surprisingly, re-inserting with the same primary key acts as an update:
INSERT INTO people (id, name, email) VALUES (1, 'John Doe', '[email protected]');
5.3.4 Deleting Data
To delete a row:
DELETE FROM people WHERE id = 1;
5.4 Advanced Query Concepts
Cassandra has certain constraints that can make querying different from SQL databases:
-
No range queries: Cassandra requires exact keys for hashing, so range queries (
WHERE id > 1
) are not supported. -
ALLOW FILTERING: Inefficient queries, like those that don't use the primary key, require
ALLOW FILTERING
:SELECT email FROM people WHERE name = 'John' ALLOW FILTERING;
-
IN Queries: Supported for specific values:
SELECT name, email FROM people WHERE id IN (1, 2);
5.5 Dropping Tables and Keyspaces
Finally, to remove a table or keyspace:
DROP TABLE people;
DROP KEYSPACE peoplekeyspace;