Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

No system can achieve 100% consistency and 100% availability in the presence of partitions. It's kind of wacky that Aphyr compared those replicated consensus tools and eventually consistent stores with HBase. HBase does not use a consensus protocol for replication, it uses HDFS. HBase is not eventually consistent. There is a single authoritative server for reads and writes of a lexicographic range of keys (a region) which writes immutable store files and a WAL to HDFS. Partial availability may be achievable for reads with a significant amount of effort and latency and limiting of total cluster size by allowing non-authoritative regionservers to read the HDFS WAL + Storefiles, but this really isn't realistic. I've personally been burned by the client retry thing though, and there's not really a better solution when you consider the types of workloads HBase is actually used for and it's incredibly variable latency (it aims only for consistency and very high AVERAGE throughput, at the cost of extremely high HIGHEST latency). One solution here could be the use of a configurable filesystem queue for clients. This is how you build resilient high-throughput pipelines. HBase is used in some places for OLTP, but only when the readload is very very low. So the effort to make reads more highly available would be in vain.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: