Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I think it might help readers to include a narrative about an example application. Perhaps I’m in the minority but I tend to think of Bloom filters as a way to reliably know something isn’t in a set (e.g. so as to not run an expensive disk read). This data structure seems to view them the dual way: “this is maybe the right value for this key”.

I’ve seen that view work for visualizations like approximate CDFs and medians where I have some statement like “with probability p, the value differs from truth by less than e”. Is this data structure used in a similar way? My instinct is that visualizations having a low rate of being wrong is OK because the human will follow up that visualization with more tests. In the end you have lots of evidence supporting the conclusion.



Ah, we need to clarify the language! The B-field will always return the correct value for an inserted key.

False positives are only returned for keys that have not been inserted. This is akin to a Bloom filter falsely returning that a key is in the set).


I second that "The B-field will always return the correct or Indeterminate value for an inserted key." before listing classes of errors would clarify it by a lot.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: