I don't know what the article author is storing, but it's noted that 10GB of dat...

mbrukman · on July 8, 2017

Disclaimer: I am the PM of Google Cloud Bigtable.

It appears you're confusing Bigtable with Datastore (to be fair, Datastore is built on Megastore, which is built on Bigtable, so it's an understandable confusion), but let's be clear: Google Cloud Bigtable != Google Cloud Datastore.

The URL you cited about consistency models: https://cloud.google.com/datastore/docs/articles/balancing-s... is entirely about Datastore, but you referred to it as Bigtable.

windlep · on July 8, 2017

Sorry about that, I had difficulty finding anything about the consistency model of BigTable. Upon further reading on SSTables and how BigTable distributes them across nodes, it appears there is full consistency but if a node goes down the data on it is inaccessible?

wora · on July 8, 2017

If a node goes down, another node will replace it quickly. The node (aka tablet server) doesn't own any data. The data is stored on lower level storage layer.

lwansbrough · on July 8, 2017

We store 100-1000 GB of data in each wide table on our untrendy on-prem SQL server boxes. 10GB is peanuts. In fact, I propose (given the limitations suggested by the author) that DynamoDB may have NO practical uses worth exploring. If NoSQL is about scale, and it can't scale, what's it good for?

I can understand having to optimize your key space, but in this case it necessitates extreme premature optimization.

manojlds · on July 8, 2017

From my understanding, node is virtual nodes. This article on Cassandra should give some info - https://docs.datastax.com/en/cassandra/2.1/cassandra/archite...

ryanworl · on July 8, 2017

Each shard is 10GB. The unit of scale is a shard. You can have as many shards as you want.

lwansbrough · on July 8, 2017

It's one shard per node though, right? So you're talking about SERIOUS cost when you want to store _actual_ big data.

ryanworl · on July 8, 2017

You don't pay per node. In fact, the concept of a node is not exposed at all. Each shard is technically on three nodes for high availability. You pay for provisioned capacity and data storage per GB.

pmohan6 · on July 8, 2017

DynamoDB is a multi-tenant service. There is no dedicated node for you. Each shard is replicated across 3 nodes and those nodes contain replicas of other shards/tables.

throwaway91111 · on July 8, 2017

It's amazing how much you can put into a reply without directly addressing any of the concerns. Are you serious that dynamodb can't even hold 10GB? If you intended to defend the database I think you failed. Sqlite3 can store 10 gigs without sweating. What on earth can you use dynamo db for if it doesn't scale with the data? The way you describe it seems much WORSE than the OP.

windlep · on July 8, 2017

I said it was per node (which is a shard), it's not a limit per table. Did you read the original article or my reply? Both of them state this is per node and how DynamoDB decides when to shard to more nodes.

Edit: Sorry, I did mention 10GB originally in relation to a table. That was incorrect of course.

throwaway91111 · on July 8, 2017

I did. However, 10GB still seems extremely small. A commodity postgres, cassandra, or cockroachdb server can serve HUNDREDS of GB per node. Why is the size per node so small for dynamodb?

It seems like poor key space design.

phamilton · on July 8, 2017

While not exactly the same, the Dynamo paper outlines that a single host is composed of many virtual nodes. It is very likely that a physical DynamoDB host will have dozens of nodes. This is done so that the cluster can scale up or down independent of the number of hosts while avoiding a gross imbalance. (12 nodes on 11 hosts means one host has 100% more traffic. 34 nodes on 11 hosts means one host has 33% more traffic.)

pzb · on July 8, 2017

It is important to note that "Dynamo" and "DynamoDB" are two very different things that happen to share many of the same letters. DynamoDB is not Dynamo.

manojlds · on July 8, 2017

This Cassandra article should provide some info on virtual nodes. https://docs.datastax.com/en/cassandra/2.1/cassandra/archite...

windlep · on July 8, 2017

Agreed entirely, I do wonder what kind of internal constraints led Amazon to have these limits. Maybe in the future they'll go away, as so many other seemingly arbitrary limits do.

zwily · on July 8, 2017

10GB is per partition, not per node.