The growing irrelevance of MongoDB

kicolobo · on Jan 14, 2015

Actually, they posted thw wrong link. That article is not written by wwwdesigned, but by me. Here is the original text: http://www.itexto.com.br/devkico/en/?p=60

dang · on Jan 14, 2015

We've changed the URL from http://wwwdesigned.com/growing-irrelevance-mongodb/. Did that site publish your article without permission?

jraedisch · on Jan 14, 2015

I actually came there via two other aggregators (Twitter and Prismatic) and thought that I had found the source. Will be more careful next time - sorry about that. In any case it was an interesting read.

kicolobo · on Jan 14, 2015

Thanks!

collyw · on Jan 14, 2015

I have seen many people try and shoehorn various problems into MongoDB, when in most cases a relational database would have been better suited.

I have yet to see a real use case for Mongo unless you are building a Facebook clone. Can someone suggest when it is actually useful over a properly tuned relational database?

I guess I kind of reached the irrelevance stage just thinking about the sort of problems it would be suited to rather than actually building anything.

phpnode · on Jan 14, 2015

> I have yet to see a real use case for Mongo unless you are building a Facebook clone

People seem to keep making the mistake of building a social network in mongo. It's entirely unsuitable for that.

Source: I've helped 3 companies move their social networks from MongoDB to OrientDB when they figured out that MongoDB prevents them from shipping features that users expect, e.g. friend-of-a-friend style queries

filearts · on Jan 14, 2015

What has your experience been like in deploying and working with OrientDB? This is a bit of really cool tech that I've been keeping an eye on for some time but haven't gotten around to really playing with it.

What server language were you using and where were you deploying?

phpnode · on Jan 14, 2015

OrientDB is very cool software, I find it pretty hard to go back to traditional databases now that I've seen how powerful graphs are, but the cool thing about it is that it's still a document store at heart, so you get all the same advantages of mongo, but with the graph awesomeness on top. It's a fantastic tool, but there are a few quirks that can catch beginners out and the documentation is not stellar. It also requires some configuration tweaking to get the best performance for your workload.

I've been using node.js (I develop the official driver - https://github.com/codemix/oriento), but there's pretty good libraries emerging for other languages too. Most people I've worked with are deploying to AWS, one company was running on the bare metal.

filearts · on Jan 14, 2015

I actually JUST discovered oriento and was absolutely delighted to see a bluebird promise-based api. API looks fantastic. Thanks a ton for creating the lib.

Do you have any advise for someone thinking of deploying on GCE?

My use-case would be for an online code editor (Plunker, if you've heard of it) with users, projects, packages, collections (of projects), comments and project versions (stored as content-addressable git-compatible objects).

I'm also interested in understanding if there is any built-in compression mechanism because I will be storing a large volume of very similar text files. Any hints?

Have you used the lucene indexes much? If so, can you do any of the crazy faceting delivered by ElasticSearch?

phpnode · on Jan 14, 2015

> Do you have any advise for someone thinking of deploying on GCE?

No, sorry, I've never used that. However, generally - like all dbs, Orient is happiest when it has access to a lot of RAM. Also the Write Ahead Log can take up a significant amount of disk space, those are two things to be immediately aware of.

> I'm also interested in understanding if there is any built-in compression mechanism because I will be storing a large volume of very similar text files. Any hints?

Sadly it doesn't yet do leveldb style document compression, and I've seen no hints that it's on the horizon, but the OrientDB guys are pretty responsive and would probably be open to the suggestion.

> Have you used the lucene indexes much? If so, can you do any of the crazy faceting delivered by ElasticSearch?

I'm just starting to use the lucene indexes now so I can't give much feedback on those yet. It should be possible to do faceting using SQL to a limited degree, but I don't think there's native support for it yet. I think that will get improved in the next few versions because people are crying out for it.

filearts · on Jan 14, 2015

Thanks a lot for your contributions to the node module and your answers.

proveanegative · on Jan 14, 2015

Did they choose OrientDB instead of Neo4j for the license or for technical reasons?

phpnode · on Jan 14, 2015

actually both in many cases, Neo4j does have quite a hostile license, but its biggest downside is that it is graph only, whereas OrientDB (and ArangoDB which is also worth looking at) are "multi-model" - you can use them as pure document (or even key / value) stores, and the graph is just another way of viewing / representing that data. This is really powerful and means that you don't have the problem of duplicating data between different database systems or needing to do cross-data-store joins in your application.

Harrisburg · on Jan 14, 2015

Think of an Amazon like product catalogue.

I don't want to create a schema in a relational database for that kind of use case. A MongoDB fits as well.

Edit: Sorry for my short and unprecise comment. I regret posting it, loosing all my Karma. Yes, of course I would use a schema, but I may use hundreds to get my products presented with all attributes and variants available. There is a set of attributes that will remain for all products, others will vary, some will change and will be dropped. Some may be conditional, depending on size or color. I could model that in RDBMS as well, adding those properties in an additional JSON or something else.

Yes, I'm also mainly using relational databases and just wanted to give an example of a use-case for MongoDB. I regret, sorry.

ris · on Jan 14, 2015

"I don't want to create a schema..."

You are always going to end up creating a schema, whether it's explicit in your tables or implicit within your code. Otherwise you end up with completely heterogeneous data which is impossible to query in any useful way.

This is a red herring.

woah · on Jan 14, 2015

Not really. Maybe some products have an attribute that others don't. Maybe you want to query by that attribute. The query won't turn up products without that attribute. Maybe that's ok. Like if you wanted to find all shoes with black laces you can just query for that. Tractors don't even have laces and so they don't come up in the results.

In a sql db, every single thing needs to fit into a flat schema with nesting done by relations. Additionally, it's very rigid. Maybe you just don't want to set up a laces table, and 4000 other tables for every little attribute of every product. Of course this is not how you would do it in real life, there are better ways. Mongo is one of those ways.

Saying that Mongo is good for nothing is just as dumb as saying that Mongo is good for everything. In 2015, the Mongo backlash is just as tired as the Mongo hype.

xerophyte12932 · on Jan 14, 2015

I would probably solve this problem using Relational DB by having an ItemHeader table with all the common attributes. Then will make an ItemAttributes table where I'd use the foreign key of the first table and just have Key/value pairs for all the unique attributes you want to give to the item. Any sort of query on any of the attributes can be one via a simple join.

Though i suppose it gets trickier if attributes may have sub attributes etc....

icebraining · on Jan 14, 2015

In a sql db, every single thing needs to fit into a flat schema with nesting done by relations.

Nope, see Postgres (hstore and json).

mgkimsal · on Jan 14, 2015

why not "see sql server from 10 years ago with xml column type"?

icebraining · on Jan 14, 2015

Because one example is enough to dispel that myth, and I'm familiar with Postgres, and not with "sql server from 10 years ago with xml column type".

gasping · on Jan 14, 2015

Are you saying you're using a single MongoDB collection as a dumping ground for all your entities? I hope you're not building a real product that someone has to maintain.

woah · on Jan 14, 2015

I don't actually use Mongo at all, I prefer Postgres. But for loosely structured data, it has a use case. Just like dynamic vs typed languages, there are pros and cons.

annnnd · on Jan 14, 2015

Agreed. That said, optional schema enforcement would be really nice (in my experience, most of the data has a predefined format; so why not define schema and enforce it?).

ris · on Jan 14, 2015

"In 2015, the Mongo backlash is just as tired as the Mongo hype."

If you say so.

CmonDev · on Jan 14, 2015

You can always just stick JSON into a separate table for those cases. So that alone is not an argument.

rjaco31 · on Jan 14, 2015

Care to elaborate? Beside maybe for the suggestion system, I really don't see what's the gain compared to a relational database.

BonoboBoner · on Jan 14, 2015

Probably the EAV nightmare

arethuza · on Jan 14, 2015

Stick JSON in a field within your relational model - best of both worlds.

BonoboBoner · on Jan 14, 2015

I wish I could have a foreignkey within that json.

crdoconnor · on Jan 14, 2015

So what happens when you want to list all of the products in a category?

Harrisburg · on Jan 14, 2015

query = db.products.find({'type': 'Film', 'details.actor': 'Keanu Reeves'})

@see: http://docs.mongodb.org/ecosystem/use-cases/product-catalog/

I personally miss Joins in MongoDB - but other vendors support Joins.

ris · on Jan 14, 2015

There - you're consistently using a field named "type" across your objects - you're using a schema.

onion2k · on Jan 14, 2015

Not quite. The 'type' in the example is a label for what it contains - it's wholly semantic - as opposed to a field that would exert some sort of constraint such as describing the data type, collation, restraints, etc. Schemaless databases don't completely abandon all forms of organisation because that'd be unusable. They just make the organisation looser. For example, if the user wanted to they could add a record to the products in that example that doesn't have a 'type' (although in the case of that example that probably wouldn't be useful).

takeda · on Jan 14, 2015

The thing is that in order to use that /label/ you need to write logic to handle it in your code, ultimately you end up implementing schema, the only difference is that you enforce it in your application. This label becomes not much different than a column in database that is nullable.

Now things become more hairy when you realize that perhaps you want to keep more information about the actor so for example you want to change details.actor to details.actor.name. You will have two choices, either run through your database and modify all documents (which is something similar to RDBMS is doing) or write code to handle both cases. The second one seems easier, but as you'll have more changes it'll come back and bite you hard.

Later in the future you might realize that by repeating details about actor for every single movie you simply not only wasting a lot of resources (your database is bigger and slower) but also this affects integrity (in one movie perhaps you include actor's middle name, or maybe you have a typo).

At that point you'll start creating a collection that holds actors and then only store key of it in "details.actors". You will soon realize that you're basically reimplementing a relational database on top of Mongo, except not only it is way slower, your application is becoming more complex.

There are uses for NoSQL, but whenever you have to ask yourself which model you should go with pretty much always the answer will be: relational.

woah · on Jan 14, 2015

HA! You got him. A gold star for you sir.

collyw · on Jan 14, 2015

Have a category field or link in the table. Join to it. Joins are not evil, they are damn useful. And well done if your product has become large enough to have scaling problems that can't be solved by a bit of indexing on the joins.

crdoconnor · on Jan 14, 2015

Joins are not supported by Mongo. Or at least, not in any way that isn't tortuous.

rimantas · on Jan 14, 2015

I'd say elasticsearch is often used for this kind of tasks.

collyw · on Jan 14, 2015

I wouldn't personally say it fits better. And you are throwing away a ton of useful features that come with a relational database.

TheDong · on Jan 14, 2015

> Facebook clone

Diaspora is a social network that made the mistake of choosing mongodb for their first iteration. Here's a blog post where one of the involved developers explains why it's not good for that, even: http://www.sarahmei.com/blog/2013/11/11/why-you-should-never...

andrea_s · on Jan 14, 2015

To be fair, this is a poor technical choice that no reputable DBA should ever make... Not really the technology's fault.

lmm · on Jan 14, 2015

> I have yet to see a real use case for Mongo unless you are building a Facebook clone. Can someone suggest when it is actually useful over a properly tuned relational database?

Seamless auto-sharding, supported as a first-class use case. Particularly in a cloud environment where you can have your system automatically spin up more hosts as load increases (and take them down as load drops) with zero human intervention, which is really nice.

There are plenty of alternatives in that space nowadays (more than when mongo first came out), and reasonable people can disagree over whether to use mongo rather than e.g. cassandra, riak, or shonky third-party clustering addons for mysql/postgresql, but I've yet to see an affordable relational database with out-of-the-box auto-sharding support that comes anywhere close to what mongo offers.

(Also, first-class support for async-io clients. This is about client libraries rather than the server itself, and purely an artefact of when mongo was released, but if you're calling e.g. PostgreSQL from the JVM, most of the libraries are oriented towards blocking JDBC which is "good enough" (there are valiant efforts like https://github.com/mauricio/postgresql-async but you can't use them with established higher-level libraries). Whereas every mongo library offers a callback- or future-based API, and the higher-level libraries are built around this)

girvo · on Jan 14, 2015

> but I've yet to see an affordable relational database with out-of-the-box auto-sharding support that comes anywhere close to what mongo offers.

FoundationDB is a bit weird, but it handles automatic scaling-out, and has an ANSI SQL interface (backed onto a K-V store): https://foundationdb.com/key-value-store/white-papers/scalab...

Caveat: I've not used it for more than hobby stuff yet, but it's a fascinating bit of technology!

lmm · on Jan 14, 2015

Sounds interesting, but frankly I wouldn't trust anything so young in production yet. For all mongo's faults, at this point the pitfalls are reasonably well-known and there are enough serious organizations using it at scale to give a certain level of confidence.

enobrev · on Jan 14, 2015

I generally agree in regards to shoehorning data into MongoDB. I don't agree that a facebook clone fits appropriately. When thinking through a project, my mental model generally self-configures into third-normal form, which is to say I literally think in the manner of a relational database.

That said, I have found one specific use-case where it has incredibly worked well, which is as a JSON-based file system in which I can query into the document structure to find the records and data I need en-masse with very simple queries. A project I've been working on for a few years has its own custom schema that fits far better into a JSON hierarchy than a relational one. In this specific system there are very few records (compared to our primary RDBMS which has millions), and when not being explicitly maintained, the files are generally static. The single-collection database in question with a few hundred JSON objects would likely have to be split into at least 10 tables with thousands of records each.

When it comes to using said "files" in production, I index the objects into memory for quick retrieval. In order to populate said cache, MongoDB allows me to query into the JSON file structure and retrieve the necessary "files" very simply into however many indices I need. And that's where it has shined in my experience; Not as a live read-write datastore for production use, but as read-heavy storage for a relatively small amount of JSON data with a deep hierarchy that would require an annoying amount of joins and subqueries for otherwise simple queries.

I've also tried it out with our multi-system logging system using capped collections (JSON formatted messages transported and collected with rsyslog). But we grew beyond its limits almost immediately and ended up in far more manageable territory with kibana / elasticsearch.

nevi-me · on Jan 14, 2015

Hopefully a useful example, would love some comment.

I've been building https://rwt.to and https://movingjoburg.co.za with storage backed by MongoDB. Both are transit apps/websites in South Africa, could be more useful, except that we spend most of our time collecting and capturing data.

At the time I was very inexperienced (I guess I'm a bit better now?), but these are the reasons why I chose MongoDB.

1. Geospatial query support out of the box (this is during 2.0-2.2). It's now great with GeoJSON, I store routes and stations as GeoJSON, and querying them is very easy. Compare with PostgreSQL + PostGIS.

2. Some bus services have weird schedule structures, and I needed to be able to generate those schedules, I store schedules as a giant JSON file, and I can query that file to find when the next bus/train is. It's now possible with JSONB in PostgreSQL, but trying to fit that into a relational model wasn't worth the pain at the time, especially when there's always edge cases that I have to cater for at times.

3. To create a pricing engine on rwt.to, I needed a very flexible schema, because I have to cater for many cases (are there bus/train transfers, discounts, and quite different rules for those transfers and discounts). MongoDB gave me the ability to vary my schema in the instances where I needed to store different types of data. I'd rather contend with that than have a table with 100 columns to do the same. To head off on a tangent, in GTFS this data (fares) is computed and stored in the table, but I calculate fares at query-time because there's sometimes a lot of rules to consider, making calculating them and caching them like GTFS unpleasant.

4. Other reasons was because it was pleasant to work with MongoDB + Mongoose.js. This is very important as I'm the sole developer, and due to my line of work, I don't get all the time I need to work on both projects. For my core transit data, I won't reach >20GB even when I manage to collect all the data for South Africa, so I won't have to contend with the 'big data' issues that other users would face.

PS: Out of interest, if anyone would like to try rwt.to to see how it works, here are some example links: - https://rwt.to/*mrwF8w0u - https://rwt.to/*mrCLEZaE - https://rwt.to/*mrOkEDKH

khalidmbajwa · on Jan 14, 2015

We are building a music streaming website. Think of it as a Spotify for Pakistan and Mongo has been wonderful for the kind of Data Structure we have. Its perfect for a catalogue like structure where objects can very easily nest inside each other. Each Artist has multiple albums, and multiple artists have multiple songs. All of this nestles very neatly inside eachother, and when i want a song, i almost always need its album and artist data, so i can very easily get that. Mongo is blazing fast on such kind of nested data structures. The only slight problems is where you have playlists and need to reference songs for a certain Artist. That could be countered by Data Duplication, since the song artist data etc once entered will very rarely change. So for us it makes perfect sense. You really need to understand what your data model is, and what you want from it, and then Mongo will be your friend, other than that it will cause you a world of pain.

fragmede · on Jan 14, 2015

How does that hierarchical model work when songs or albums have more than one artist?

Why does a traditional relational db fail for such a well-defined data model? I mean, there's nothing about 'given an song (id), retrieve artist and album' that would make mongo inherently better.

annnnd · on Jan 14, 2015

Lately (and not so lately ;) there has been a lot of bad press about MongoDB. We use it extensively as part of our product (on single servers) and it fills this role nicely. It has a few drawbacks, most notably huge disk space requirements (MongoDB has no compression) which we are hoping to solve with TokuMX (haven't tried it yet). It has some other quirks too, but in general it just... works. And I love using a document DB because it lets you use relations just when they are needed. With relational DBs you often have to use complex joins even for things which should be incredibly simple. On the other hand, I do miss JOINs when I need them (though we solved that on app level). And I would appreciate a way to define schema... :) But would I exchange document DB for relational DB for that? No (for the current project).

Maybe I'm getting old, but I really want DB to just work (unlike HBase, at least a few years ago). Other than that, I'm flexible.

andrea_s · on Jan 14, 2015

The good news is, you can just wait a few weeks and get the possibility to use WiredTiger inside the upcoming 2.8.0. That will solve a few pain points that we've been dragging along for years (namely, document-level locking, disk requirements, multi-document transactions).

annnnd · on Jan 14, 2015

Whoa, I totally missed that news, thank you!!! :)

thawkins · on Jan 14, 2015

And compression

andrea_s · on Jan 14, 2015

Yes, I missed a word, it was supposed to read "disk (space) requirements" :)

ttty · on Jan 14, 2015

>And I would appreciate a way to define schema

Use mongoose!

>On the other hand, I do miss JOINs when I need them (though we solved that on app level).

How you do that? (maybe is different than me)

What about transactions?

annnnd · on Jan 14, 2015

>Use mongoose!

We don't use node.js. Also, we have app-level schema, but enforcing should (IMHO) be done on DB level so as to avoid any chance of invalid data.

Joins: easy, we specify which fields are connected (on app level), then fetching goes to connected collections and fetches data from there too. Not ideal, but it works. We made a similar solution for foreign keys.

Transactions: no need for any further guarantees. Our system mostly works on a single record at a time in all critical components. If this is not possible, we have app-level locking to avoid conflicts. No issues or wishes here.

avasdvasdv · on Jan 15, 2015

> If this is not possible, we have app-level locking to avoid conflicts.

app-level locking is a recipe for disaster if data matters and more than 1 process/user/client/etc can access your data. if the data doesn't matter then I guess you can do whatever you want.

web "developers" have a bad reputation for a reason. so many amateurs amongst web developers...

annnnd · on Jan 15, 2015

I will assume the last part is just a generic statement and is not directed towards me. You are a Java dev I presume?

As for app-level locking, I am talking about sacrificing performance, not safety. We just make sure that some piece of code runs exactly once. Since the need for this is very rare and the places are not performance critical, we can live with that. So no, we have no need for additional transactional guarantees on DB level.

avasdvasdv · on Jan 16, 2015

> I will assume the last part is just a generic statement and is not directed towards me.

It's most definitely directed at you.

> You are a Java dev I presume?

C#/C++. Though I've worked with java before amongst others.

> As for app-level locking, I am talking about sacrificing performance, not safety.

You are sacrificing both. Only idiots truly depend on app-level locking.

> We just make sure that some piece of code runs exactly once.

Amateur hour...

> Since the need for this is very rare and the places are not performance critical, we can live with that. So no, we have no need for additional transactional guarantees on DB level.

So it's a useless pointless trivial application...

annnnd · on Jan 26, 2015

Incredible, I am being judged by someone on Internet... How will I survive this? </sarcasm>

Another day, another idiot, I guess. (and yes, the term is definitely directed at you - if you still follow this throwaway account anyway)

collyw · on Jan 14, 2015

Postgres and MySQL just work as far as I am concerned.

annnnd · on Jan 14, 2015

I never said or implied that they don't. What is your point?

collyw · on Jan 14, 2015

OK, I took your last line to imply that somehow Mongo just works in a way that others don't.

deepGem · on Jan 14, 2015

MongoDB is not at all good for Social networks or anything resembling a graph. You are better off with Titan (horizontal scaling) or Neo4j (vertical scaling). Neo4j offers this great ability to query by a path, which no other database offers.

nextw33k · on Jan 14, 2015

> Can someone suggest when it is actually useful over a properly tuned relational database?

In your own question you kind of hint at it. You need a tuned database. NoSQL lowers the barrier to entry for fast persistent data storage with replication. NoSQL doesn't replace SQL databases, it's looking to optimise a use case where transactions are not required.

Personally I use Mongo a bit like a cache, sitting in front of a SQL DB. Things that need to be ACID are handed back and forth. Those that don't are kept in Mongo and I get the best of both worlds.

ddebernardy · on Jan 14, 2015

> NoSQL doesn't replace SQL databases, it's looking to optimise a use case where transactions are not required.

I'd be curious to know in what contexts anyone would not want transactions.

I readily see the case for memcached or equivalent when it comes to caching -- it is very valid, and indeed useful, when slightly outdated data is perfectly acceptable. I can even picture how you might be using MongoDB to do the same, even if you admittedly have me wondering how you're invalidating the mess.

For a persistent data store, however, I'm honestly at a loss. In my own experience, transactions are needed as soon as there's a remote possibility that a concurrent write may occur. Even embedded systems need them, when you're threading statements concurrently for performance reasons, or when you're subsequently merging local data with another node. (See CoreData/iCloud bugs for what occurs when you ignore ACID in the latter case.)

jbergens · on Jan 14, 2015

Document databases normally has atomic updates of a document. This may contain many parts and would require transactions in a relational db. Meaning you kind of get some transactions automatically. Other things are not possible to make transactions around and then you must try to create the app to handle the problems. And some document databases can actually use transactions, like FoundationDb.

pjmlp · on Jan 14, 2015

> Can someone suggest when it is actually useful over a properly tuned relational database?

Maybe when developers don't want to bother to learn SQL?

On my CS degree we got proper introduction to SQL and the relational algebra behind it, so for me it is just a tool, not a scary monster.

I really love the data integrity options I have at my disposal with relational databases.

So for me this all NoSQL fad never made much sense. Then again, I never had to deal with a Facebook like scale problem, only daily reports from mobile network operators across their whole network elements.

octix · on Jan 14, 2015

It's not about Facebook clone, they still use MySQL for some functionality (i believe), but not as RDBMS but more like NoSQL...

NoSQL means model your data in a different way, the way you'll be using it. For some it makes more sense, for some may not.

For example, compare implementation of a tagging system in mongo and any other RDBMS, which one is easier?

pjmlp · on Jan 14, 2015

> For example, compare implementation of a tagging system in mongo and any other RDBMS, which one is easier?

As I said, I never bothered with NoSQL so I cannot properly answer it.

andrea_s · on Jan 14, 2015

I'd say a graph database would work much better for building a Facebook clone compared to a document store like MongoDB... And a relational database would still be better than the document store.

I've been using Mongo happily to provide online analytics solutions, but the advantage is mostly from the development side, not really from the performance side.

On the other hand, this kind of approach is great for attribute matching, which is usually a nightmare to do properly with a RDBMS.

avasdvasdv · on Jan 15, 2015

> I have seen many people try and shoehorn various problems into MongoDB

MongoDB is easy and was used by web "developers" too lazy to learn SQL and RDBMs.

MongoDB is fine if you don't really care about the data. If you don't need ACID compliance and can afford to lose transactions.

collyw · on Jan 26, 2015

/dev/null would be faster in those cases

jbergens · on Jan 14, 2015

For some systems the development time is much shorter with a document db (or a graph db) than with a relational db. That can be more important than pure performance. Basic scalability is also usually better since it often easy to shard data across servers.

JulianMorrison · on Jan 14, 2015

I could see its "capped collections" being useful for log analysis.

Kiro · on Jan 14, 2015

I just like being able to insert a random JSON into a collection and query it by any of its properties. Not sure how I would do that with a relational database.

adamors · on Jan 14, 2015

PostgreSQL has full support for JSON [1]. So, quite easily in fact.

[1]: http://www.postgresql.org/docs/9.4/static/datatype-json.html

Kiro · on Jan 14, 2015

Thanks, I didn't know that. What's the benefit of using PostgreSQL though if the functionality is the same?

jerven · on Jan 14, 2015

The reliability and speed advantages of PostgreSQL over MongoDB. Plus the fact that sometimes KV/document is better and sometimes normalized relational. Is nice if one tool solves both.

mateuszf · on Jan 14, 2015

Full ACID transactions, reliability, access to trained DBA-s.

thorin · on Jan 14, 2015

You can do this in Oracle 12 and I think in Postgres. You might want to think about why you need to store unstructured data as part of your application though and query it. If it's expected to live a long time and be queried it should probably be structured. If not you could store it (a serialized object) as a blob if you are just caching some state.

https://docs.oracle.com/database/121/ADXDB/json.htm#ADXDB624...

Arnt · on Jan 14, 2015

I've used it and been happy. If you want something that's simple to use, know your schema partly but not entirely, and you don't mind losing a few writes, then mongo fits.

Relational databases place such emphasis on reliability. Mongo is generally reliable but only two or three nines, and it's simple.

For example, if you're collecting sub-cent line items for invoices, you might prefer to collect 99.x% of the line items simply over 100% at greater complication and expense, particularly if you can measure x.

gasping · on Jan 14, 2015

What? When would it ever be OK to lose parts of an invoice. Mongo apologists baffle me sometimes. To anyone considering MongoDB, don't be fooled, you probably want to stick with an RDBMS.

Arnt · on Jan 14, 2015

When the parts are cheap compared to the cost of storing them in a real database.

I even know someone who stores invoice items in memcached. Whenever memchached evicts something that hasn't made it onwards to real storage, they lose the ability to invoice the right customer for an ad impression.

mwj · on Jan 14, 2015

What are you talking about? We run a 800+ GB mongo cluster and have never lost data.

mietek · on Jan 14, 2015

This is satire, right? Right?…

Arnt · on Jan 14, 2015

Not at all. Let me run an example with simple and made-up numbers.

Consider two storage solutions used to gather line items to invoices. The line items are ten cents on average.

One system is properly ACID, and running it costs one cent per line item on the invoice. The other loses 1% of writes randomly, but its hardware/backup/ops requirements are lower, so it costs just 0.1c/item.

The ACID way gives you complete invoices, but you spend 10% of the invoiced amount on the invoices. The lossy way makes your invoices 1% smaller, randomly, but the loss+cost adds up to about 2% instead of 10%.

It's like returns on physical goods, really. A random percentage of customers will return goods, you can't control that but you can estimate and monitor it, and optimize if the numbers aren't good enough.

dyadic · on Jan 14, 2015

So, if I want to incorrectly invoice my customers then I should use MongoDB?

I don't understand how you can say any of this seriously.

Arnt · on Jan 14, 2015

What's the problem with serving customers and occasionally forgetting some small part of the invoice?

Overinvoicing would be bad. But underinvoicing is just another cost of business. If the cost is smaller than the alternative, fine.

nchelluri · on Jan 14, 2015

Unfortunately I didn't find much information in the article.

Basically I got out of it:

- he likes JS, and was comfortable with MEAN-stack (MongoDB/Express web framework/AngularJS/Node.js)

- he found that for document oriented purposes MongoDB could sometimes be a nice fit

- he found that replacing MongoDB with the drop-in replacement TokuMx (seems like a MySQL/MariaDB type of idea), he could get a big performance increase

- he found that with Postgres 9.2+ using the JSON/document storage there he could get some of the relational benefits and some of the document benefits, and ACID

- he likes ACID because of transactions/handling multiple documents at once

Ok, so I guess I was wrong, there is a bit there. But: this is all stuff that I've heard several times about MongoDB.

One of the things he got into was that he learned when MongoDB was appropriate and when it wasn't -- but IMO he didn't really go into detail here other than to say: joins and transactions. I wish there was more substance than that.

I also seem to recall reading recently that even supposedly ACID compliant databases have problems with transactions under load and that the only way to really get full ACID compliance is to treat transactions as serializable - perhaps I am wrong but I think this is the gist of what I've learned.

Personally, I'm working on a project right now that uses MongoDB and I definitely do miss joins, transactions, and schemas. I am not even remotely sure where one would benefit from a system that had documents with arbitrary fields in it. (I didn't pick the technologies for my project.)

I would really like to know when to choose which database - I feel like I know the bare basics of several, and none of them in depth. I really don't have time to learn them all. I feel like even when I'm building one application it's only some months after deployment that if I'm lucky or made a mistake that I will run into some actual performance constraint.

crdoconnor · on Jan 14, 2015

>I would really like to know when to choose which database

I have asked this question before - under what situations is Mongo not just equally good as Postgres - but actually better?

The only really coherent answer was that it was easier to configure replication (presumably because it chooses a lot of defaults for you). I'm not even sure that was a good thing given the number of obscure bugs that can arise from incorrectly configured replication.

Postgres even seems to be more performant at NoSQL use cases (using the JSON store) than Mongo, which is frankly embarrassing.

bigiain · on Jan 14, 2015

MongoDB 2.8 (in RC4 right now, due out "in early January" last I heard) has the WiredTiger storage engine, and as well as high performance and compression also gets document level locking and multi document transactions. The roadmap says this'll become the default storage engine in 3.0 (3rd quarter 2015). I think the blog author's claim that for basically those omissions "MongoDB can still beat TokuMX on a future release. But only in a future release. Today it can’t." observation is only true if you ignore the development releases completely, and will be incorrect within a week or two.

Sure, if you want SQL or traditional relational database, a traditional relational SQL database is a better choice. Using Postgres or Oracle for workloads better suited to tied hashes or BerkleyDB doesn't automatically become "right" though.

jraedisch · on Jan 14, 2015

Could you provide a source for "multi document transactions" support in 2.8? According to this comment they are supported by WiredTiger but won't be in the MongoDB API anytime soon: http://blog.mongodb.org/post/102461818738/announcing-mongodb...

dm_mongodb · on Jan 20, 2015

you are correct, that's not in 2.8. might want to follow this jira: https://jira.mongodb.org/browse/SERVER-11500

that said it will be a big release in other aspects, with pluggable storage engines now, and WiredTiger specifically.

gasping · on Jan 14, 2015

MongoDB will always be a relevant example of false advertising and over-marketing (to the point of 10gen probably opening themselves to litigation) and how we all need to stop drinking the kool-aid.

There is LITERALLY no reason to use MongoDB today. If you're thinking of using MongoDB, for the love of god just try PostgreSQL.

mwj · on Jan 14, 2015

this is a ridiculous comment.

we use a mongodb cluster for a multi-hundred GB document store that powers ~50 various worker instances, 2 sites and an API with sub 100ms response times. we have never lost data or experienced downtime worse than other dbs i've used in the past.

at the end of the day you just need to do your job properly and read the documentation, not blame the tool when you can't use it properly.

*disclaimer - this is not to say postgres isn't awesome.

Harrisburg · on Jan 14, 2015

I can't agree. MongoDB 2.8 with WiredTiger is a performant NoSQL solution.

The way you solve things in document dbs like MongoDB is the key - not the fact that you get competitve performance in RDBMS as well.

I like Postgres and the possibility to easily use JSON data. But I still prefer to build applications the NoSQL way...

octix · on Jan 14, 2015

It may be hype, but when MongoDB was released, what other RDBMS was offering the same functionality? I see Posgresql + JSON mentioned, but when was JSON support added?

I personally like MongoDB for: - flexible schema (less migration pain) - easy tags implementation - product attributes (list of name-value pairs) - GridFS - store binary files - nested documents for analytics (ex: a record for each day with a nested doc for each hour)

I wish MongoDB had - some support of join - multi-master replication

I personally used MongoDB as primary storage (yeay dot me), but currently prefer it as secondary storage.

As any product it evolves and I expect to see more improvements. I think MongoDB brought NoSQL to the masses :)

Just my 0.02. Thanks.

snird · on Jan 14, 2015

Everything you said is true, but I don't think the argument here is about the past, but rather about the future.

MongoDB was great and innovative compared to other solutions when it started, but now, it's dragged behind while other solutions are much better.

jbergens · on Jan 14, 2015

Then we should probably compare other NOSQL solutions to relational databases. Like CouchDb, CouchBase, FoundationDb, Neo4J, etc.

jnaour · on Jan 14, 2015

Probably this link: http://www.itexto.com.br/devkico/en/?p=60

jraedisch · on Jan 14, 2015

If transactions are the biggest problems, there are driver solutions like http://godoc.org/labix.org/v2/mgo/txn for that. I am not sure, how they compare to "real" transactions. Still missing "joining" in some cases though. Otherwise I am ok with MongoDB so far.

politician · on Jan 14, 2015

Jeez. Please do not mention client-library-managed transactions and then put "real" transactions in scare quotes. At least read the Wikipedia article first.

jraedisch · on Jan 14, 2015

I just wanted to distinguish between the two. I wasn't scared in any way but read up on scare quotes now - so thanks for that. http://en.wikipedia.org/wiki/Scare_quotes

stavros · on Jan 14, 2015

I didn't read it like that at all, I interpreted it as the gp distinguishing between the two.

stephen123 · on Jan 14, 2015

If another host connects to the db, and the transaction doesn't span to that second host. Its not as safe as sql transactions.

niemeyer · on Jan 14, 2015

The transaction does span across hosts with the linked package.

niemeyer · on Jan 14, 2015

There are further details about how it works in this blog post:

http://blog.labix.org/2012/08/22/multi-doc-transactions-for-...

While it does work, it's still a workaround for the lack of first-class transactions in MongoDB. It offers a more limited API and requires care on the developer side.

Despite being the author, I do hope it gets obsoleted by more convenient first-class transaction support in MongoDB itself at some point in the near future.

anilmujagic · on Jan 14, 2015

I never understood the hype around MongoDB. Maybe it is the fact that I started programming when SQL DBs were the prevalent/mainstream option, and got too much used to it... Anyways, Im glad people are recognizing the hype and realizing that RDBMS has enough to offer.

scarygliders · on Jan 14, 2015

Site seems to be overloaded or broken in some other way.

Google cache : http://webcache.googleusercontent.com/search?q=cache:wwwdesi...

FallDead · on Jan 15, 2015

Nice writeup, I always felt mongodb as a really fast product prototyping database . After you get to certain point you in web scale I would guess you would use one of these alternate solutions.

crzrcn · on Jan 14, 2015

The submission url is malformed :(.

tech-no-logical · on Jan 14, 2015

I think it should be

http://www.itexto.com.br/devkico/en/?p=60