Debunked is a bit too strong. He qoutes from phi-4 repor that it is easier for the LLM to digest synthetic data. A bit like feeding broiler chickens other dead chickens.
Maybe one day we will have organic LLMs guaranteed to be fed only human generated content.
Maybe one day we will have organic LLMs guaranteed to be fed only human generated content.