Cassandra + Eucalyptus

We all should know that Cassandra really likes dedicated hardware. Take a bunch of ram and fast SSDs… divide them up among your cluster and SMILE! However, I’ve been very interested in the on-demand scalability that AWS/EC2 offers. It seems like the answer to the wide demand variance that popular web services could see.

I’ve installed and configured eucalyptus and I’m going to document building an image for Cassandra, and configuring Cassandra for use in Eucalyptus. I will be trying to scale Cassandra instances into the Amazon cloud with a scaling group, if possible. Eucalyptus is capable of “hybrid cloud” with AWS/EC2, but I’m not sure just how diverse the feature set is. This experiment is meant to fill that knowledge gap!

EDIT: One question answered! Cloudbursting is the term for running an application on your private cloud and expanding into the public cloud (like AWS) to handle load spikes! 🙂

Eucalyptus supports just this! I’m so very excited!

Preface: Install CentOS 6.4 on your machine(s), Install Eucalyptus, Install CentOS 6.4 image from eustore

Creating custom images can be a little tricky, especially when you do a yum update. I have found udev rules keep reinstalling themselves against my will. Here is a description of everything you should check before doing a euca-bundle-instance command:

Then you install all the software you want. I installed oracle JRE 1.7, JNA, and datastax community rpm repo. I am using a 3GB root, which is 50% full currently. Once you have confirmed that the image is ready, you can use euca-bundle-instance as described here:

I have found that when loading an image (initial root image remapped to 3-5GB as you see fit), it is best to specify the kernel and ramdisk even though it isn’t necessary. It might be in my head, but it seems to be a problem for me.

I’ve run into hiccups with euca (no metadata) so I’ve got some work yet to do! 🙂

EDIT: Fixed networking issues… iptables rules for made it into my NC’s (wtf?) PROBLEM SOLVED!

The easiest way to make your instance if you are good with CentOS6.4 is to use eustore to download a 6.4 image. Then you can update that and modify a running instance as desired! Once you have it set up exactly how you want, you can prepare the instance:

Pay close attention to the udev rules which are changed on update!!!

Then you can bundle the running instance as a new image bundle:

Then register the image and make sure the user has permissions (if you use admin to add the instance, modify the image attributes!)

I used a simple bash script that detects the instance Name tag, and mounts (or doesn’t mount) volumes depending on that. Make sure permissions are set correctly after mounting. Then it uses uses a replace regex to fill in the seeds and node ip address in cassandra.yaml. It was a pretty simple process and it is really easy to migrate your seeds to larger stores when needed. I can start up all my seeds with a single command line, and start an auto-scaling group of nodes as needed as well!! I’ve not load tested my SSD-powered 4 “cpu” 4GB RAM instances yet, but I had nice performance from the dual core version!

It is completely persistent data although I need to configure an instance for Datastax’s fantastic OpsCenter (which now supports 2.0!!) so I can do easy backups. Hopefully saving backups to separate volumes won’t be hard!