When I started building my first app in 2011 MongoDB was the rage. So I build th...

threeseed · on July 8, 2017

Why did you use MongoDB if your domain model wasn't suited ?

I just don't understand people who complain about a technology and say it is useless for all of these use cases when they couldn't even spend a few hours to do a Spike/POC or some basic data domain design. MongoDB has very clear documentation about what you should or shouldn't use it for.

MongoDB is unique in that it is one of the few document stores available today. So if you have use cases such as 360 Customer View or where you need to fetch a lot of nested information with a single id it is blisteringly fast.

If you have a relational data model than use a relational database.

mst · on July 8, 2017

> Why did you use MongoDB if your domain model wasn't suited?

Because MongoDB's marketing sold it as the hot new datastore that made SQL legacy - "newer! faster! web scale!" not "only use this if your data isn't relational".

The reference documentation might be more accurate but the way it was publicised certainly wasn't.

CyberDildonics · on July 8, 2017

What does document store mean and what can that do that other databases can't?

silisili · on July 8, 2017

As someone being 'forced' to use Mongo for the first time, I'm curious as to when/where you hit that? I'm looking for pros/cons, but the internet is proving rather unhelpful as it seems to be people vehemently for it or against it.

dijit · on July 7, 2017

I hope you don't mind if I piggyback on this to echo this sentiment.

Although I dislike MySQL for its many gotchas (data corruption level stuff too!) I was looking for a _long_ _long_ time for a high consistency NoSQL database.. we basically need document storage of large binary data.

Ironically literally nothing in NoSQL land does write-through to disk, they just write to vfs and hope it works; additionally, those that support clustering opt for eventual consistency.. That just baffles my mind, so much potential for lost or corrupted data.

We ended up doing deterministic sharding on postgresql, it worked incredibly well, even in failure modes you hope never to see.. and no corruptions! :D

jethro_tell · on July 8, 2017

Isn't that the point of no_sql. You trade consistency for speed. Best used to mark temporary state in a web app. If you need atomic consistency in a database you should use a relational database that does atomic writes.

That has always been the case. Why would it be so shocking to you? Atomic writes have already been built. Sounds like you were standing there with a hammer looking for a nail.

dijit · on July 8, 2017

I always thought the term "document store" just meant arbitrary unstructured data that didn't need to be related.

That's how you do things like we did regarding deterministic sharding- it's very easy to scale this way if you don't ever "JOIN" tables etc;

threeseed · on July 8, 2017

Please be specific. NoSQL comprises hundreds of different databases with very different semantics.

Technically PostgreSQL is a NoSQL database and it along with Cassandra for example have modes in which they are fully and atomically consistent.

threeseed · on July 8, 2017

You are talking complete nonsense.

HBase, Cassandra, MongoDB, Riak, Couchbase etc all write through to disk with proper fsync flushes. And I've never heard of any database that has a model where it writes to a virtual file system - whatever that even means.

Please provide some specific examples.

dijit · on July 8, 2017

HBase, Cassandra, MongoDB, Riak, Couchbase And redis were the main candidates.

Hbase: http://mail-archives.apache.org/mod_mbox/hbase-issues/201307...

Cassandra: (fsync to WAL, not full fsync). https://wiki.apache.org/cassandra/Durability

MongoDB: ... too much wrong here to list, although I hear it's improving in being cluster aware etc.

Redis does support fsync as far as I remember but the write/delete pattern is incredibly sub-optimal, it runs basically out of a WAL by itself and runs very poorly if your dataset does not fit in memory.

linuxhansl · on July 8, 2017

As author of the cited post on HBase lemme clarify:

0. Here are more details on that: http://hadoop-hbase.blogspot.de/2013/07/protected-hbase-agai...

1. By default HBase "flushes" the WAL. Flush here means to make sure that at least 3 machines have change in memory (NOT on disk). A datacenter power outage can lose data.

2. As HDFS closes a block it is not by default forced to disk. So as HBase rewrites old data during compactions by default, old data can be lost during a power outage. Again, by default.

3. HDFS should be configured with sync-on-close, so that old data is forced to disk upon compactions (and sync-behind-writes for performance)

4. HBase now has an option to force and a WAL edit (and all previous edits) to disk (that's what I added in said jira).

5. This is post is 4 years old for chrissake :)... Don't base decisions on 4 year old information.

HBase _is_ a database and it will keep your data safe. Unfortunately it requires some configuration and some knowledge.

scrollaway · on July 8, 2017

Key-value document storage of large binary data... isn't that S3?

dijit · on July 10, 2017

Yeah, shame we were pushing too much for the 5GB/s cap.

Also, their consistency model is hidden and they've lost data before.. so not selling points.

imaginenore · on July 8, 2017

Redis