Repeat after me: don’t do string operations without explicit locale. Don’t do st...

layer8 · on Aug 16, 2020

> I don’t know why so many languages have string functions that should take a locale but provide an overload that doesn’t and which uses the system locale as the default.

That‘s a relict from the past, before Unicode became prevalent, where systems used to ever only work in a single locale, and where users expected applications (running locally of course) to use the local system locale. Hence applying the system locale to everything was the standard behavior for applications. The C standard library was defined in that way, and since then every other runtime (usually based on C at some level) does the same.

price · on Aug 17, 2020

FTR, Python does not in fact do this. Python 2 did have this locale-dependent behavior, but Python 3 has never behaved this way. The workaround in the OP is, thankfully, quite obsolete.

If you call a case-related method like `lower` on a Python string, the behavior you get is based on tables which are built into Python, taken straight from the Unicode standard's data files, and completely independent of your system configuration.

It would nice to also have the option of explicitly using a particular locale. Here's a discussion from 2019 about potentially adding that option: https://bugs.python.org/issue37848 You'll be glad to see everyone there agrees the default should remain invariant.

madeofpalk · on Aug 16, 2020

I ran into this with C#/.NET on Windows - I tried to convert a string "1.3" to the float 1.3, and it failed on languages that use comma as their decimal separator.

That was a learning experience.

alkonaut · on Aug 16, 2020

Indeed. As a person from a comma country, I find these mistakes in most code bases I look at. It makes it frustrating to contribute to open source, for example.

Perhaps it’ll make you feel better about your parsing bug that even the C# compiler (Roslyn) code base had several of these issues.