Delays and Issues
I’m sorry for the lack of updates. I’ve had many delays and issues with school. Primarily, I have spent ~20 hours trying to get a Lenovo Yoga 2 Pro to work for me to no avail. Linux is slow going on the machine (some buttons won’t work yet if you want the wifi to work), and reinstalling windows did no good at all! Also, the screen would go into fits of flashing on and off. Unacceptable. Even in windows, if you allowed the screen to turn off with power settings, it wouldn’t come back on. Sleep worked fine, but not the screen turning off. I believe it was a hardware issue with my machine. I’ve not found any other reports. Otherwise I found the machine beautiful although windows 8.1 is constricting for me.
In that regard, I’m moving to MacOSX as a primary mobile platform. Hopefully a 13″ Haswell Air will be a functional mobile desktop so I can stop focusing on getting things to work. At least there is Unix underneath… cmd.exe became increasingly upset at me “ls”ing the shit out of it.
That said, I promise to get back to compiling Cassandra information. I’ve been reading and watching, but I have a backlog of typing to do. G’day!
This is a fantastic video on the features and functionality of the current (2013) generation DB driver. It seems the Java and .NET drivers are the preferred platforms. I’m terribly interested in Java and have found the Datastax driver to be exceptionally reliable in my testing.
Fantastic video on data modeling in Cassandra! I think I like this even more than the last!! It is nice to see such a concentration of skill in a community!
This is part 3 of a series! Although they are independent, if you find it seems to go too fast, or you have to stop and look up things, go back and watch #2, or even all of them in order!
1: The Data Model is Dead! Long Live the Data Model http://www.youtube.com/watch?v=px6U2n74q3g
2: Become a Supermodeler https://www.youtube.com/watch?v=qphhxujn5Es
I love learning by others mistakes! It is so much more efficient than learning by your own mistakes! From configuration to design, this talk was incredibly useful!
Interesting stuff! Using Cassandra for data replication and not using hardware raid for that! He also explains tombstones and their effect on performance! “tracing on;” is my new catchphrase!! 🙂
Overcoming trepidation over eventual consistency aside, trying to model for Cassandra and similar NoSQL databases has been the steepest hill to conquer for me.
I have 10 years in very amateur RDB experience and am only now taking my first real database course in college (which up until half way through, I could have taught). I’m not some data pro, but the ideas around the aggregate data model are an entirely different beast to conquer compared to the relational model. I’m going to try to give a brief overview of the methodology for new-comers.
- Stop trying to port your tables to Cassandra. Quit it. It is likely a waste of time. Spend the time thinking about the requests that are made. The flexibility of the RDB is that you can join tables in very complex ways to force the data to fit whatever query you are attempting to accomplish. C* takes another route. You model your data for how you will use it, duplication be damned. Some denormalization should not be a major factor in the database design. If the pulls are intelligently designed, and compartmentalized, denormalization will be slight.
- AGAIN, stop fighting the denormalization. Hard disks are CHEAP, and SSD’s are like ultra-cheap RAM for your database. Linux supports SSD-cache backed raid arrays now so unless the sheer mass of your data is your motivator for using NoSQL, rather than the performance scalability and efficiency, then get over it. Stop trying to design in data objects and start designing in aggregates.
- Once you have started building tables for the various pulls you make, you will likely find the chief data duplication going on is in the form of UUID and TimeUUID’s. Likely they are simply indicators for relations. This is fine so long as the tables are designed to be task oriented so that for any given task, you make as few pulls as possible.
- Writes: Yes writes can become a chore. Chances are high that you are smart enough to compartmentalize your data code. By keeping all the code together, when you have to do an update to several tables for consistency, you won’t have to worry about something getting left behind. Avoid the urge to write a “quick little hack”. Everything tends to be used far longer than anticipated and it could cause a terrible headache down the road. Treat your database like you want it to treat your data! Fortunately for us, Cassandra is amazing at write speed so when you have to push some changes, you don’t push the load up too much.
A couple of great resources I’ve come across are:
Thanks for stopping by!!
Speaker: Sylvain Lebresne, Apache Cassandra Committer and Engineer at DataStax
This guy has some fantastic things to say! (forgot to post when I watched it!)