Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I don't know what the article author is storing, but it's noted that 10GB of data is stored per node. That's quite a bit of data for a single table, and the 10GB is per shard of a table, not per 'database' (DynamoDB only has a notion of tables).

Amazon has deep dive talks on DynamoDB on YouTube[1] that go into lots of these details and how to avoid problems from them. It's not that different from understanding how to structure data for Cassandra, Mongo, etc. All the NoSQL systems require an understanding of how they work both to structure your data and ensure optimal performance.

For example, maybe one's consistency constraints are better met by DynamoDB instead of BigTable's varying consistency on different query types[2] (the author of this article didn't address consistency at all). With DynamoDB, you can get strongly consistent reads and queries all the time [3].

Overall, it seems like a kind of weak reason to make the strong statement about "probably shouldn't be using DynamoDB". Maybe a better title would be "Understanding drawbacks of large datasets in DynamoDB". I do hope the author understands the consistency changes they may experience in BigTable, as it could easily require large changes to the code-base if strong consistency was assumed on all queries.

[1] https://www.youtube.com/watch?v=bCW3lhsJKfw

[2] https://cloud.google.com/datastore/docs/articles/balancing-s...

[3] http://docs.aws.amazon.com/amazondynamodb/latest/developergu...

Edit: Fixed inconsistency in what the 10GB limit was referring to, not per table, but per node (shard) of the table.



Disclaimer: I am the PM of Google Cloud Bigtable.

It appears you're confusing Bigtable with Datastore (to be fair, Datastore is built on Megastore, which is built on Bigtable, so it's an understandable confusion), but let's be clear: Google Cloud Bigtable != Google Cloud Datastore.

The URL you cited about consistency models: https://cloud.google.com/datastore/docs/articles/balancing-s... is entirely about Datastore, but you referred to it as Bigtable.


Sorry about that, I had difficulty finding anything about the consistency model of BigTable. Upon further reading on SSTables and how BigTable distributes them across nodes, it appears there is full consistency but if a node goes down the data on it is inaccessible?


If a node goes down, another node will replace it quickly. The node (aka tablet server) doesn't own any data. The data is stored on lower level storage layer.


We store 100-1000 GB of data in each wide table on our untrendy on-prem SQL server boxes. 10GB is peanuts. In fact, I propose (given the limitations suggested by the author) that DynamoDB may have NO practical uses worth exploring. If NoSQL is about scale, and it can't scale, what's it good for?

I can understand having to optimize your key space, but in this case it necessitates extreme premature optimization.


From my understanding, node is virtual nodes. This article on Cassandra should give some info - https://docs.datastax.com/en/cassandra/2.1/cassandra/archite...


Each shard is 10GB. The unit of scale is a shard. You can have as many shards as you want.


It's one shard per node though, right? So you're talking about SERIOUS cost when you want to store _actual_ big data.


You don't pay per node. In fact, the concept of a node is not exposed at all. Each shard is technically on three nodes for high availability. You pay for provisioned capacity and data storage per GB.


DynamoDB is a multi-tenant service. There is no dedicated node for you. Each shard is replicated across 3 nodes and those nodes contain replicas of other shards/tables.


It's amazing how much you can put into a reply without directly addressing any of the concerns. Are you serious that dynamodb can't even hold 10GB? If you intended to defend the database I think you failed. Sqlite3 can store 10 gigs without sweating. What on earth can you use dynamo db for if it doesn't scale with the data? The way you describe it seems much WORSE than the OP.


I said it was per node (which is a shard), it's not a limit per table. Did you read the original article or my reply? Both of them state this is per node and how DynamoDB decides when to shard to more nodes.

Edit: Sorry, I did mention 10GB originally in relation to a table. That was incorrect of course.


I did. However, 10GB still seems extremely small. A commodity postgres, cassandra, or cockroachdb server can serve HUNDREDS of GB per node. Why is the size per node so small for dynamodb?

It seems like poor key space design.


While not exactly the same, the Dynamo paper outlines that a single host is composed of many virtual nodes. It is very likely that a physical DynamoDB host will have dozens of nodes. This is done so that the cluster can scale up or down independent of the number of hosts while avoiding a gross imbalance. (12 nodes on 11 hosts means one host has 100% more traffic. 34 nodes on 11 hosts means one host has 33% more traffic.)


It is important to note that "Dynamo" and "DynamoDB" are two very different things that happen to share many of the same letters. DynamoDB is not Dynamo.


This Cassandra article should provide some info on virtual nodes. https://docs.datastax.com/en/cassandra/2.1/cassandra/archite...


Agreed entirely, I do wonder what kind of internal constraints led Amazon to have these limits. Maybe in the future they'll go away, as so many other seemingly arbitrary limits do.


10GB is per partition, not per node.




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: