It's mostly a problem of resolution, model size, and dataset quality, which can ...

It's mostly a problem of resolution, model size, and dataset quality, which can be mitigated with compositing. Larger models don't have problems with hands, and if they do, it can be solved by higher-order guidance (e.g. controlnets) and doing multiple supersampled passes on regions to avoid to fit too much detail in one generation. Even SD 1.5 (a notoriously tiny model) issues with faces and hands can be solved with multiple passes, which is what everyone does.