Number of states that you get wrong isn’t a great measure though right? If I call if for trump at 51-49 and you call it for Biden at 75-25 and Biden wins by 51-49 I was more correct than you were, to count “states called” is the wrong way to evaluate analysis in the same way that fptp is unrepresentatove of the popular vote, ironically.
I agree with you, and that's why i made effort to differentiate between margins vs winning. In my opinion, 2020's polls were a lot worse than 2016's, but because of Biden's much bigger lead (in the polls and in key states), the polling errors did not cross the threshold to be "wrong" in the winner take all system. Finally, there is an argument that even a Biden 75-25 call gives the correct outcome because each state is a winner take all system (with the exception of NE and ME, iirc). From a stats standpoint, the delta in that example make it very inaccurate, but from a model/communication standpoint, getting each state "right" (in binary terms) is very valuable.
Trafalgar had completely different results than the rest of the consensus, and once counts are more finalized in non swing states (places like CA and NY take forever, but we don't pay attention because they are always blue), we can accurately compare the polling and real results between different pollsters and polling methodologies. Some experts have cautioned against using exit polls for this purpose (what is usually done for a quick read of polling accuracy), because exit polls are only measuring election day in person votes, and thus trend really red this year. Trafalgar had unconventional methodologies like a "shy trump voter" bias where they arbitrarily shifted their numbers to the right, with weirdly consistent results of Trump +3 in a bunch of close/blue states. Perhaps there is more complicated justification of these numbers in the background, but I'm concerned that even if their deltas end up better than the "normal" pollsters, they are generating an inferior product with overbaked data.
They tried to come up with approaches to get around the very effect the article talks about: Trump voters don't answer survey questions. The main differences in methodology were making fewer assumptions about Republican turnout, larger sample sizes, and different survey techniques.
Trafalgar clearly missed some things: traditionally republican collar counties breaking hard for Biden. (My county didn't vote for Obama either time, but voted for Biden by 12, after voting for the republican governor a couple of years ago by 38.) But 538 had some insane misses this year in critical states like Wisconsin (off by 7.7), Ohio (off by 7.3), Florida (off by 5.9), etc. Finalizing counts in NY and CA isn't going to change those numbers--and Trafalgar wasn't analyzing them anyway.
Like I said in an earlier comment, comparisons between polls and a model aren't completely fair. Part of the model's calculation is that 8.8 points or whatever requires an enormous polling error to swing for trump. I do agree that other polls should get criticism/improve their methodologies to avoid the ~5 point margin in FL, or the 7.7 margin you listed. And, I even think that they may not get as much criticism as their errors warrant because many of these states were off by 6+ points but still went for Biden. Still, Trafalgar has a unique methodology that should be understood better before they are extolled as the "best pollster". And, Trafalgar is not immune to similar polling errors, just in the other direction, and with a wrong result. The delta may be more important from a statistical methodology standpoint, but these polls are measuring winner take all states, and "The Trafalgar Group’s Robert Cahaly is an outlier among pollsters in that he thinks President Trump will carry Michigan, Pennsylvania, or both, and hence be reelected with roughly 280 electoral votes" is a pretty poor prediction based on their data. Does this mean that their data is necessarily poor? No, but it isn't a good sign.