PolarSPARC

Cassandra Quick Notes :: Part - 2


Bhaskar S *UPDATED*11/27/2023


Overview


In Part-1 of this article series, we performed the necessary setup, started a single-node Apache Cassandra cluster and got our hands dirty with CQL operations.

In this part, we will perform additional setup to activate a multi-node (3 nodes) Apache Cassandra cluster. In addition, we will elaborate more on the concepts from Part-1 and introduce additional concepts on Apache Cassandra.


Additional Setup


To run a multi node Apache Cassandra cluster, we need to download additional configuration files to the directory $CASSANDRA_HOME/etc/cassandra by executing the following commands:


$ cd $CASSANDRA_HOME/etc/cassandra

$ wget https://github.com/apache/cassandra/raw/trunk/conf/jvm-clients.options

$ wget https://github.com/apache/cassandra/raw/trunk/conf/jvm17-clients.options

$ wget https://github.com/apache/cassandra/raw/trunk/conf/logback-tools.xml

$ cd $CASSANDRA_HOME


We will setup additional directory structures by executing the following commands from the users home directory:


$ mkdir -p cassandra/data2

$ mkdir -p cassandra/data3

$ mkdir -p cassandra/logs2

$ mkdir -p cassandra/logs3


To start the second node in the Apache Cassandra cluster, execute the following command:


$ docker run --rm --name cas-node-2 --hostname cas-node-2 --network cassandra-db-net -e CASSANDRA_SEEDS=cas-node-1 -u $(id -u $USER):$(id -g $USER) -v $CASSANDRA_HOME/data2:/var/lib/cassandra/data -v $CASSANDRA_HOME/etc/cassandra:/etc/cassandra -v $CASSANDRA_HOME/logs2:/opt/cassandra/logs cassandra:5.0


The following would be the typical trimmed output that appears in the first node:


Output.1

... [ SNIP ] ...
INFO  [GossipStage:1] 2023-11-27 14:26:30,274 Gossiper.java:1460 - Node /172.18.0.3:7000 is now part of the cluster
INFO  [GossipStage:1] 2023-11-27 14:26:30,277 Gossiper.java:1405 - InetAddress /172.18.0.3:7000 is now UP
... [ SNIP ] ...

Finally, to start the third node in the Apache Cassandra cluster, execute the following command:


$ docker run --rm --name cas-node-3 --hostname cas-node-3 --network cassandra-db-net -e CASSANDRA_SEEDS=cas-node-1 -u $(id -u $USER):$(id -g $USER) -v $CASSANDRA_HOME/data3:/var/lib/cassandra/data -v $CASSANDRA_HOME/etc/cassandra:/etc/cassandra -v $CASSANDRA_HOME/logs3:/opt/cassandra/logs cassandra:5.0


The following would be the typical trimmed output that appears in the first node:


Output.2

... [ SNIP ] ...
INFO  [GossipStage:1] 2023-11-27 14:27:39,614 Gossiper.java:1460 - Node /172.18.0.4:7000 is now part of the cluster
INFO  [GossipStage:1] 2023-11-27 14:27:39,615 Gossiper.java:1405 - InetAddress /172.18.0.4:7000 is now UP
... [ SNIP ] ...

To check the status of the Apache Cassandra cluster, execute the following command:


$ docker exec -it cas-node-1 nodetool status


The following would be the typical output:


Output.3

Datacenter: datacenter1
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address     Load        Tokens  Owns (effective)  Host ID                               Rack 
UN  172.18.0.3  86.52 KiB   16      59.3%             992f1632-c660-498a-8a02-3a0a1646d28a  rack1
UN  172.18.0.2  121.88 KiB  16      64.7%             0073cae6-b42b-4133-b0a5-0a5f7ddb18c1  rack1
UN  172.18.0.4  81.45 KiB   16      76.0%             b42f26eb-ec36-41dc-a8e2-504190f59c4e  rack1

BINGO - At this point the 3-node Apache Cassandra cluster is ready !!!


Concepts - Level II


The following section elaborates and expands on the concepts on Apache Cassandra:


Hands-on with Cassandra - II


To launch the CQL command-line interface, execute the following command:


$ docker run -it --rm --name cas-client --network cassandra-db-net cassandra:5.0 cqlsh cas-node-1


The following will be the output:


Output.4

WARNING: cqlsh was built against 5.0-alpha2, but this server is 5.0.  All features may not work!
Connected to Cassandra Cluster at cas-node-1:9042
[cqlsh 6.2.0 | Cassandra 5.0-alpha2 | CQL spec 3.4.7 | Native protocol v5]
Use HELP for help.
cqlsh>

On success, CQL will change the command prompt to "cqlsh>".

To create a Keyspace called mytestks2, input the following command at the "cqlsh>" prompt:


cqlsh> CREATE KEYSPACE IF NOT EXISTS mytestks2 WITH replication = {'class': 'SimpleStrategy', 'replication_factor': 2};


There will be no output.

Notice that we have now specified a Replication Factor of 2 given we have a 3-node cluster.

To use the Keyspace called mytestks2, input the following command at the "cqlsh>" prompt:


cqlsh> USE mytestks2;


There will be no output and the input prompt would change to "cqlsh:mytestks2>".

To display information about any node in the cluster (for example cas-node-2), execute the following command:


$ docker exec -it cas-node-2 nodetool info -T


The following would be the typical output:


Output.5

ID                     : cf8c8360-eeea-49ea-9fa0-6b417d30d98f
Gossip active          : true
Native Transport active: true
Load                   : 169.54 KiB
Uncompressed load      : 315.92 KiB
Generation No          : 1701112276
Uptime (seconds)       : 8947
Heap Memory (MB)       : 504.97 / 8192.00
Off Heap Memory (MB)   : 0.00
Data Center            : datacenter1
Rack                   : rack1
Exceptions             : 0
Key Cache              : entries 11, size 984 bytes, capacity 100 MiB, 101 hits, 119 requests, 0.849 recent hit rate, 14400 save period in seconds
Row Cache              : entries 0, size 0 bytes, capacity 0 bytes, 0 hits, 0 requests, NaN recent hit rate, 0 save period in seconds
Counter Cache          : entries 0, size 0 bytes, capacity 50 MiB, 0 hits, 0 requests, NaN recent hit rate, 7200 save period in seconds
Network Cache          : size 8 MiB, overflow size: 0 bytes, capacity 128 MiB
Percent Repaired       : 100.0%
Token                  : -8225213037676917188
Token                  : -7289299864885401768
Token                  : -6873662205313761888
Token                  : -5661981858641780742
Token                  : -5264659006293067951
Token                  : -4006238290347624863
Token                  : -2971937643672220497
Token                  : -2545651142391091107
Token                  : -1195449315816413223
Token                  : -138072798887455006
Token                  : 1983622101114780101
Token                  : 3242042692258433134
Token                  : 4306383682031496531
Token                  : 6126387983011442930
Token                  : 8003452312524428959
Token                  : 9172407235703443338
Bootstrap state        : COMPLETED
Bootstrap failed       : false
Decommissioning        : false
Decommission failed    : false

To create a Table called address_book that consists of composite Primary Key (with a Partiton Key and Clustering Keys ), input the following command at the "cqlsh:mytestks2>" prompt:


cqlsh:mytestks2> CREATE TABLE IF NOT EXISTS address_book (email_id text, first_name text, last_name text, state text, zip text, PRIMARY KEY ((email_id), state, zip));


There will be no output.

To check the various settings on the table address_book, input the following command at the "cqlsh:mytestks2>" prompt:


cqlsh:mytestks2> DESCRIBE TABLE address_book;


The following will be the output:


Output.6

CREATE TABLE mytestks2.address_book (
    email_id text,
    state text,
    zip text,
    first_name text,
    last_name text,
    PRIMARY KEY (email_id, state, zip)
) WITH CLUSTERING ORDER BY (state ASC, zip ASC)
    AND additional_write_policy = '99p'
    AND allow_auto_snapshot = true
    AND bloom_filter_fp_chance = 0.01
    AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
    AND cdc = false
    AND comment = ''
    AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32', 'min_threshold': '4'}
    AND compression = {'chunk_length_in_kb': '16', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'}
    AND memtable = 'default'
    AND crc_check_chance = 1.0
    AND default_time_to_live = 0
    AND extensions = {}
    AND gc_grace_seconds = 864000
    AND incremental_backups = true
    AND max_index_interval = 2048
    AND memtable_flush_period_in_ms = 0
    AND min_index_interval = 128
    AND read_repair = 'BLOCKING'
    AND speculative_retry = '99p';

For the table address_book, the column email_id will be the Partiton Key, while the columns state and zip will be the Clustering Keys.

To display the Gossip information for the cluster, execute the following command:


$ docker exec -it cas-node-1 nodetool gossipinfo


The following would be the typical output:


Output.7

/172.18.0.2
  generation:1701112194
  heartbeat:19309
  STATUS:72:NORMAL,-1847922227964662486
  LOAD:19252:231902.0
  SCHEMA:10968:00313073-6c1f-356b-917c-525d4a726c36
  DC:8:datacenter1
  RACK:10:rack1
  RELEASE_VERSION:5:5.0-alpha2
  RPC_ADDRESS:4:172.18.0.2
  NET_VERSION:1:12
  HOST_ID:2:f142d327-4c51-43ea-8fa1-a5887c5359f4
  RPC_READY:82:true
  NATIVE_ADDRESS_AND_PORT:3:172.18.0.2:9042
  STATUS_WITH_PORT:71:NORMAL,-1847922227964662486
  SSTABLE_VERSIONS:6:big-nc
  TOKENS:70:<hidden>
/172.18.0.3
  generation:1701112276
  heartbeat:19226
  LOAD:19192:230385.0
  SCHEMA:10886:00313073-6c1f-356b-917c-525d4a726c36
  DC:8:datacenter1
  RACK:10:rack1
  RELEASE_VERSION:5:5.0-alpha2
  NET_VERSION:1:12
  HOST_ID:2:cf8c8360-eeea-49ea-9fa0-6b417d30d98f
  RPC_READY:97:true
  NATIVE_ADDRESS_AND_PORT:3:172.18.0.3:9042
  STATUS_WITH_PORT:84:NORMAL,-1195449315816413223
  SSTABLE_VERSIONS:6:big-nc
  TOKENS:83:<hidden>
/172.18.0.4
  generation:1701112367
  heartbeat:19134
  LOAD:19129:236722.0
  SCHEMA:10791:00313073-6c1f-356b-917c-525d4a726c36
  DC:8:datacenter1
  RACK:10:rack1
  RELEASE_VERSION:5:5.0-alpha2
  NET_VERSION:1:12
  HOST_ID:2:1072dc0b-9089-421c-8bf6-dc35f4a1a495
  RPC_READY:96:true
  NATIVE_ADDRESS_AND_PORT:3:172.18.0.4:9042
  STATUS_WITH_PORT:83:NORMAL,-1447708912539482187
  SSTABLE_VERSIONS:6:big-nc
  TOKENS:82:<hidden>

To insert three rows into the table address_book, input the following commands at the "cqlsh:mytestks2>" prompt:


cqlsh:mytestks2> INSERT INTO address_book (email_id, first_name, last_name, state, zip) VALUES ('alice@builder.com', 'alice', 'builder', 'NY', '10001');

cqlsh:mytestks2> INSERT INTO address_book (email_id, first_name, state, zip) VALUES ('bob@coder.com', 'bob', 'NJ', '08550');

cqlsh:mytestks2> INSERT INTO address_book (email_id, first_name, last_name, state, zip) VALUES ('charlie@dancer.com', 'charlie', 'dancer', 'NY', '10012');


There will be no output.

To select the values of the email_id, the first_name, and the latest update timestamp (column metadata) of first_name for all the rows from the table address_book, input the following command at the "cqlsh:mytestks2>" prompt:


cqlsh:mytestks2> SELECT email_id, first_name, writetime(first_name) FROM address_book;


The following will be the output:


Output.8

 email_id           | first_name | writetime(first_name)
--------------------+------------+-----------------------
  alice@builder.com |      alice |      1701131587563197
      bob@coder.com |        bob |      1701131597661015
 charlie@dancer.com |    charlie |      1701131606944917

(3 rows)

To select the values of the email_id and its corresponding token generated by the default Murmur3Partitioner for all the rows from the table address_book, input the following command at the "cqlsh:mytestks2>" prompt:


cqlsh:mytestks2> SELECT email_id, token(email_id) FROM address_book;


The following will be the output:


Output.9

email_id           | system.token(email_id)
--------------------+------------------------
  alice@builder.com |   -2094522160347831912
      bob@coder.com |   -1716231266002284945
 charlie@dancer.com |    1913932259381705508

(3 rows)

To check which nodes in the cluster stored the row for alice@builder.com, execute the following command:


$ docker exec -it cas-node-1 nodetool getendpoints mytestks2 address_book "alice@builder.com"


The following will be the output:


Output.10

172.18.0.3
172.18.0.4

Next, to check which nodes in the cluster stored the row for bob@coder.com, execute the following command:


$ docker exec -it cas-node-1 nodetool getendpoints mytestks2 address_book "bob@coder.com"


The following will be the output:


Output.11

172.18.0.4
172.18.0.3

Finally, to check which nodes in the cluster stored the row for charlie@dancer.com, execute the following command:


$ docker exec -it cas-node-1 nodetool getendpoints mytestks2 address_book "charlie@dancer.com"


The following will be the output:


Output.12

172.18.0.3
172.18.0.4

To drop the entire table address_book, input the following command at the " cqlsh:mytestks2>" prompt:

cqlsh:mytestks2> DROP TABLE address_book;


There will be no output.

To drop the entire keyspace mytestks2, input the following command at the " cqlsh:mytestks2>" prompt:


cqlsh:mytestks2> DROP KEYSPACE mytestks2;


There will be no output.

To exit the CQL command-line interface, input the following command at the " cqlsh:mytestks2>" prompt:


cqlsh:mytestks2> exit;


There will be no output.

This concludes the hands-on demonstration for this part in setting and using a 3-node Apache Cassandra cluster !!!


References

Cassandra Quick Notes :: Part - 1


© PolarSPARC