When SSD isn’t fast enough for you, the rules have changed. Databases have always wanted more memory, but the demands of applications like Cassandra and Couchbase take this to a new level.
The demands of NoSQL databases like Cassandra and Couchbase are difficult to fathom if you are used to traditional database design. But if you have already begun a deployment of one of these types of systems, you are well aware of their unique nature. For those unfamiliar with these types of databases, the first thing to understand is that organizations use such databases when traditional databases simply aren’t able to meet the performance requirements of their application.
Two common use cases come to mind: product catalogs and interpreting machine generated data. Product catalogs not only create the requirement to have millions of records and associations between those records, they require simultaneous access to all of those records by tens or hundreds of thousands of people, as well as continually updating online inventory for the millions of inventory items comprising the catalog. Machine generated data can simultaneously create millions of records for such databases, and analyzing trends found in such data requires very high performance.
The next thing to understand is that the architecture that such systems run on is very different than the traditional system. The main difference is that they are a distributed architecture that spreads the database out across many nodes – perhaps thousands of them. One of the reasons that this is the case is that for performance reasons the database assumes that the active working set will be in memory. Since the size of these databases is well beyond the memory capacity of any modern server, the distributed architecture is also required in order to give the application the amount of aggregated memory it requires.
Those designing infrastructure to support such applications are given two choices: over provision or spill to SSD. The over provisioning method works just fine: make sure that you have enough servers to give you enough RAM to support the peak working set. This will ensure that the application always has the entire working set in RAM. It will also ensure that your compute bill will be enormous, and will go wasted much of the time. One could argue that this goes against the logic of how the cloud is supposed to work.
The second option of spilling to flash is definitely less expensive. Flash may be more expensive than disk, but it is less expensive than memory. Unfortunately, it is also much slower. While flash is definitely faster than disk, it is 10000 times slower than memory. In addition, the traditional I/O software stack, used to write to SSD makes it even slower. It treats flash as fast hard disk instead of slow memory.
What if instead of writing data through the traditional I/O stack you could write through a I/O stack designed for memory based storage allowing you to write at near memory speeds? You could solve a number of problems. First, you could give your application the additional memory that it needs during peak loads, without spending the amount of money that RAM costs.
Flash also is more dense than RAM, allowing an environment to have many times the capacity of flash per server than it could DRAM. Flash density, assuming improved performance, removes the requirement to scale servers just to to scale memory.
Storage Switzerland’s recent webinar “Overcoming the Storage Challenges Cassandra and Couchbase Create”, now available on-demand, discusses the challenges that NoSQL environments like Cassandra and Couchbase create and provides you some answers as to how to overcome them.