I remember this turning up in a google interview back in the day. The interview ...

owyn · 2025-05-08T17:46:18 1746726378

Yeah, this was a google interview question for me too. I didn't know the algorithm and floundered around trying to solve the problem. I came up with the 1/n and k/n selection strategy but still didn't get the job lol. I think the guy who interviewed me was just killing time until lunch.

I like the visualizations in this article, really good explanation.

dekhn · 2025-05-08T18:15:27 1746728127

I didn't know about the algorithm until after I got hired there. It's actually really useful in a number of contexts, but my favorite was using it to find optimal split points for sharding lexicographically sorted string keys for mapping. Often you will have a sorted table, but the underlying distribution of keys isn't known, so uniform sharding will often cause imbalances where some mappers end up doing far more work than others. I don't know if there is a convenient open source class to do this.

wood_spirit · 2025-05-08T18:52:38 1746730358

Interesting idea, hadn’t that about that way to apply it.

I knew it from before my interview from a turbo pascal program I had seen that sampled dat tape backups of patient records from a hospital system. These samples were used for studies. That was a textbook example of it’s utility.

dekhn · 2025-05-08T19:15:33 1746731733

I guess the question in my mind is: would you expect a smart person who did not previously know this problem (or really much random sampling at all) to come up with the algorithm on the fly in an interview? And if the person had seen it before and memorized the answer, does that provide any signal of their ability to code?

samwho · 2025-05-08T21:07:13 1746738433

My gut instinct is no. I certainly don't think I'd be able to derive this algorithm from first principles in a 60 minute whiteboarding interview, and I worked at Google for 4 years.

wood_spirit · 2025-05-09T04:17:32 1746764252

They wanted to see your analytical thinking skills at work. To pass you only needed to be sensible. You didn’t fail the interview if you couldn’t invent reservoir sampling!

dekhn · 2025-05-09T16:16:58 1746807418

uh, no, people would get a fail on the question if they didn't correctly identify both the initial selection and sample acceptance criteria.

petters · 2025-05-09T02:53:02 1746759182

This also got me past one interview. I came up with k/n but now I think it's better to just generate a random float in [0, 1] and keep the k largest ones