The question is: what do you gain by categorizing them as self-identified black or white as opposed to not categorizing them at all or categorizing them, say, by the presence of some gene cluster or genetic marker associated with gout?
You gain many things.
If the True outcome is more common in group A than group B, you get a prior (in the Bayesian sense) which can be measured at low cost.
If you are attempting to identify the aforementioned genetic marker, you get to rule out genetic markers which are equally prevalent in both groups A and B. This narrows the search space considerably.
Throwing information (even imperfect information) away is stupid.
Throwing information (even imperfect information) away is stupid.
A falsifier to this proposition using the original question:
If I took thousands of people who self-identified as wearing white sneakers, and thousands who self-identified as wearing black sneakers and I looked at the occurrence of gout in those populations: what would I see?
Obviously one expects a stronger correlation with race based on the fact that members of a racial group are assumed to breed among themselves more frequently and concentrate relevant genetic factors -- if that's what we're looking for. But I suppose that's the point I'm trying to make: let's be clear what we're looking for.
Similarly for cultural factors like diet. But then again, why not cut straight to the dietary data?
In this case, you only get to rule out the genetic markers that are prevalent in both groups once you've established that they are indeed prevalent in both groups. And even then you're running a risk of false correlations for specific individuals because you started with a categorization freighted with a lot of non-scientific baggage.
If I took thousands of people who self-identified as wearing white sneakers, and thousands who self-identified as wearing black sneakers and I looked at the occurrence of gout in those populations: what would I see?
It is unlikely that you would observe a difference between gout incidence between the two groups, and the prior will give no information.
Similarly for cultural factors like diet. But then again, why not cut straight to the dietary data?
Cost. It's very easy for a doctor to ask about race and diet. Genetic markers require moderately expensive lab tests and repeat doctor visits.
Similarly, if you can use simple mechanical prodding to diagnose an injury with a reasonable degree of certainty, it might be worthwhile to skip the expensive MRI.
Further, the dietary or genetic marker associated with a particular diagnosis may be unknown and racial information may be the best available predictor.
You gain many things.
If the True outcome is more common in group A than group B, you get a prior (in the Bayesian sense) which can be measured at low cost.
If you are attempting to identify the aforementioned genetic marker, you get to rule out genetic markers which are equally prevalent in both groups A and B. This narrows the search space considerably.
Throwing information (even imperfect information) away is stupid.