Netflix Re-Invent recordings!

Netflix Re-Invent recordings!

There is some real gold in here. I’m only 2-deep ATM, but there is some serious potential.

I think I’m going to try to run Netflix Priam on my C* cluster. I will still use OpsCenter, but Priam’s backups solution looks great.

Advertisements

Cassandra + Eucalyptus

We all should know that Cassandra really likes dedicated hardware. Take a bunch of ram and fast SSDs… divide them up among your cluster and SMILE! However, I’ve been very interested in the on-demand scalability that AWS/EC2 offers. It seems like the answer to the wide demand variance that popular web services could see.

I’ve installed and configured eucalyptus and I’m going to document building an image for Cassandra, and configuring Cassandra for use in Eucalyptus. I will be trying to scale Cassandra instances into the Amazon cloud with a scaling group, if possible. Eucalyptus is capable of “hybrid cloud” with AWS/EC2, but I’m not sure just how diverse the feature set is. This experiment is meant to fill that knowledge gap!

EDIT: One question answered! Cloudbursting is the term for running an application on your private cloud and expanding into the public cloud (like AWS) to handle load spikes! 🙂

Eucalyptus supports just this! I’m so very excited!

Preface: Install CentOS 6.4 on your machine(s), Install Eucalyptus, Install CentOS 6.4 image from eustore

Creating custom images can be a little tricky, especially when you do a yum update. I have found udev rules keep reinstalling themselves against my will. Here is a description of everything you should check before doing a euca-bundle-instance command: http://www.eucalyptus.com/docs/eucalyptus/3.4/image-guide/ig_task_prepare_image.html

Then you install all the software you want. I installed oracle JRE 1.7, JNA, and datastax community rpm repo. I am using a 3GB root, which is 50% full currently. Once you have confirmed that the image is ready, you can use euca-bundle-instance as described here: http://www.eucalyptus.com/docs/eucalyptus/3.4/index.html#image-guide/img_task_modify_existing_instance_store_image.html

I have found that when loading an image (initial root image remapped to 3-5GB as you see fit), it is best to specify the kernel and ramdisk even though it isn’t necessary. It might be in my head, but it seems to be a problem for me.

I’ve run into hiccups with euca (no metadata) so I’ve got some work yet to do! 🙂

EDIT: Fixed networking issues… iptables rules for 169.254.169.254 made it into my NC’s (wtf?) PROBLEM SOLVED!

The easiest way to make your instance if you are good with CentOS6.4 is to use eustore to download a 6.4 image. Then you can update that and modify a running instance as desired! Once you have it set up exactly how you want, you can prepare the instance: http://www.eucalyptus.com/docs/eucalyptus/3.4/image-guide/ig_task_prepare_image.html

Pay close attention to the udev rules which are changed on update!!!

Then you can bundle the running instance as a new image bundle: http://www.eucalyptus.com/docs/eucalyptus/3.4/index.html#image-guide/img_task_modify_existing_instance_store_image.html

Then register the image and make sure the user has permissions (if you use admin to add the instance, modify the image attributes!)

I used a simple bash script that detects the instance Name tag, and mounts (or doesn’t mount) volumes depending on that. Make sure permissions are set correctly after mounting. Then it uses uses a replace regex to fill in the seeds and node ip address in cassandra.yaml. It was a pretty simple process and it is really easy to migrate your seeds to larger stores when needed. I can start up all my seeds with a single command line, and start an auto-scaling group of nodes as needed as well!! I’ve not load tested my SSD-powered 4 “cpu” 4GB RAM instances yet, but I had nice performance from the dual core version!

It is completely persistent data although I need to configure an instance for Datastax’s fantastic OpsCenter (which now supports 2.0!!) so I can do easy backups. Hopefully saving backups to separate volumes won’t be hard!

Cassandra requests explained (partition, clustering, requirements) + FREE TRAINING!

One thing I’ve struggled with is request requirements. I never fully understood them until taking the free course at https://datastaxacademy.elogiclearning.com/

 

First to explain the partitioning key (aka primary key) vs clustering keys:

(partitioningkey, optional_clusteringkey1, optional_clusteringkey2)

The partitioning key can be complex (ie, a composite)

( ( partitioning_key1, partitioning_key2), optional_clusteringkey1, etc)

 

So you probably understand that if you wish to request a certain key, you must request a certain primary key.

 

So for a table with this key definition (colA, colB, colC), you must request something like this:

select * from table where colA = ‘something’;

 

This applies to composite primary keys as well! Consider this key definition: ( (colA, colX), colB, colC )

select * from table where colA = ‘something’ and colX = ‘something’;

You have to specify both if you want to specify one!!

 

However I never understood clustering columns correctly. Again using key definition: (colA, colB, colC)

If you specify all keys, of course it works fine

select * from table where colA = ‘something’ and colB = ‘something’ and colC = ‘something’;

If you specify them in order and it works fine because they are grouped on disk by colB and THEN colC, which allows the following to work:

select * from table where colA = ‘something’ and colB = ‘something’;

If you try to skip one of the clustering columns, it will not work because it would have to dive into each of the skipped columns and search for the third value. This would be very expensive. I’m unsure if you could enable filtering to make it work anyway, but it shouldn’t be done even if that works. If you have to enable filtering, you are doing it wrong! You need to create a new table instead with the data grouped the way you want to pull it. For instance, the following will not work!

select * from table where colA = ‘something’ and colC = ‘something’;

I’m really glad I got that behind me. That was some voodoo when I was playing around with cqlsh trying to learn cassandra.

Delays and Issues

Delays and Issues

I’m sorry for the lack of updates. I’ve had many delays and issues with school. Primarily, I have spent ~20 hours trying to get a Lenovo Yoga 2 Pro to work for me to no avail. Linux is slow going on the machine (some buttons won’t work yet if you want the wifi to work), and reinstalling windows did no good at all! Also, the screen would go into fits of flashing on and off. Unacceptable. Even in windows, if you allowed the screen to turn off with power settings, it wouldn’t come back on. Sleep worked fine, but not the screen turning off. I believe it was a hardware issue with my machine. I’ve not found any other reports. Otherwise I found the machine beautiful although windows 8.1 is constricting for me.

In that regard, I’m moving to MacOSX as a primary mobile platform. Hopefully a 13″ Haswell Air will be a functional mobile desktop so I can stop focusing on getting things to work. At least there is Unix underneath… cmd.exe became increasingly upset at me “ls”ing the shit out of it.

That said, I promise to get back to compiling Cassandra information. I’ve been reading and watching, but I have a backlog of typing to do. G’day!

C* Summit 2013: The World’s Next Top Data Model

Fantastic video on data modeling in Cassandra! I think I like this even more than the last!! It is nice to see such a concentration of skill in a community!

 

This is part 3 of a series! Although they are independent, if you find it seems to go too fast, or you have to stop and look up things, go back and watch #2, or even all of them in order!

1: The Data Model is Dead! Long Live the Data Model http://www.youtube.com/watch?v=px6U2n74q3g

2: Become a Supermodeler https://www.youtube.com/watch?v=qphhxujn5Es