NoSQL databases are now part of web-scale architecture. The question is when to use what? Below, I try to compare the NoSQL data stores that I have worked with. Hopefully, it would be useful for programmers exploring and deciding the technology for their web-scale application.
When to use NoSQL
Before deciding on to use NoSQL instead of a SQL technology, you should ask yourself following questions about your use case (includes ACID test of your application) :
- Transactions vs No transactions (Do you need atomicity?)
[Most NoSQL databases don’t support transactions]
- Consistent or eventual consistent (Are you okay with eventual consistency?)
[Most support configurable consistency mode. You should test your scale with the consistency mode your application requires. For example, your performance test holds no good when done on “eventual consistency” mode and you decide to use hard consistency for your application.]
- Vertical vs horizontal scaling (what’s your scale? your use case need infinite scale or needs are finite?)
[This sometimes boils down to what stage of business are you in. Don’t over-engineer of you are an early stage startup and growing < 5x a month. Postpone & focus on biz growth]
- Availability (No downtime? Hot failover?)
[Some NoSQL DBs support hot failovers, some not. More below.]
- Do you really need a NoSQL DB. Why RDBMS doesn’t work for you?
[Don’t use NoSQL just for the heck of it]
So, once you have decided to go for a NoSQL Data store. Next question should be key-value or document-oriented.
Key-Value vs Document-oriented
Key-value stores: If you have clear data structure defined such that all the data would have exactly one key, go for a key-value store. It’s like you have a big Hashtable, and people mostly use it for Cache stores or clearly key based data. However, things start going a little nasty when you need query the same data on basis of multiple keys!
Some key value stores are: Memcache, Redis, Aerospike.
Two important things about designing your data model around key-value store are:
- You need to know all use cases in advance and you could not change the query-able fields in your data without a redesign.
- Remember, if you are going to maintain multiple keys around same data in a key-value store, updates to multiple tables/buckets/collection/whatever are NOT atomic. You need to deal with this yourself.
Document-oriented: If you are just moving away from RDBMS and want to keep your data in as object way and as close to table-like structure as possible, document-structure is the way to go! Particularly useful when you are creating an app and don’t want to deal with RDBMS table design early-on (in prototyping stage) and your schema could change drastically over time. However note:
- Secondary indexes may not perform as well.
- Transactions are not available.
In-memory vs disk persistence (Cache or data-store) ?
Another key concern while deciding data stores is whether you are using it as data store of your application or you are using it as a cache over your data store to scale for your traffic needs.
Once you have decided the kind of use-case you have, here are some of the popular NoSQL stores you could use:
Comparing Key-value NoSQL databases
- In-memory cache
- No persistence
- TTL supported
- client-side clustering only (client stores value at multiple nodes). Horizontally scalable through client.
- Not good for large-size values/documents
- In-memory cache
- Disk supported – backup and rebuild from disk
- TTL supported
- Data structure support in addition to key-value
- Clustering support not mature enough yet. Vertically scalable.
- Horizontal scaling could be tricky.
- Both in-memory & on-disk
- Extremely fast (could support >1 Million TPS on a single node)
- Horizontally scalable. Server side clustering. Sharded & replicated data
- Automatic failovers
- Supports Secondary indexes.
- CAS, TTL support
- Enterprise class
When to use what? Memcache vs Redis vs Aerospike
If I am an early stage startup, I would rather prefer to go with Redis and avoid nuances of maintaining a cluster etc. If I have scaled above half a million TPS (transactions per second) where I need to scale horizontally I would go for Aerospike. I would use memcache (memcached) only when I am going really mean and want to even offload maintaining the servers – in which case I would go for hosted version of Memcached which is Amazon Elasticache.
Comparing document-oriented NoSQL databases
- Mature & stable – feature rich
- Supports failovers
- Horizontally scalable reads – read from replica/secondary
- Writes not scalable horizontally unless you use mongo shards
- Supports advanced querying
- Supports multiple secondary indexes
- Shards architecture becomes tricky, not scalable beyond a point where you need secondary indexes. Elementary shard deployment need 9 nodes at minimum.
- Document-level locks are a problem if you have a very high write-rate
- Sharded cluster instead of master-slave of mongodb
- Hot failover support
- Horizontally scalable
- Supports secondary indexes through views
- Learning curve bigger than mongoDB
- Claims to be faster
When to use what? MongoDB vs Couchbase
For most de-facto use cases, I would go for mongo unless my write-rate is extremely high (I would think again only when my writes are > 10% and I am doing more than few thousand transactions a second). Fast prototyping, schema-less design, on-the-fly indexes etc makes it a ideal choice for early stage traffic.
I would consider Couchbase only when I have scaled beyond a point where write-locks are becoming a problem and I do have secondary indexes and I need extremely high availability (* more on couchbase in coming posts).