Cassandra 1.2.x – Wide rows are freaky on AWS m1.xlarge

CQL3 is a very nice abstraction to Cassandra but its important to pay attention to what it is doing.

In SQL land, 1 record == 1 row. In Cassandra 1 record == 1 row, but 2+ records can ALSO be on the same row. This has to do with CQL’s partition and primary keys. Your partition key is what decides which row a record belongs to while the primary key is where the record is in a row. If you only have a primary key and no partition key, 1 record == 1 row, but if you have a composite ( partition key, primary key) every record where partition key is the same is going on the same row.

I had a few rows that were ~30GB in size which put stress on nodes using m1.xlarge ( 8GB heap, 300MB new heap size ) with epic Compaction cycles of doom.