Repeat after me: don’t do string operations without explicit locale. Don’t do string operations without explicit locale.
I don’t know why so many languages have string functions that should take a locale but provide an overload that doesn’t and which uses the system locale as the default. It can’t be what many developers actually want, yet it has become the norm. Worse, code using a default locale appears to work on the developers machine and in production, until someone parses a number in France or lowercases a string in Turkey, which is a late and expensive discovery of the bug.
The default shouldn’t be the system locale, it should be an invariant locale. And I’ll go so far as arguing this invariant locale should be invariant across systems (meaning it can’t just defer to a system C library either).
> I don’t know why so many languages have string functions that should take a locale but provide an overload that doesn’t and which uses the system locale as the default.
That‘s a relict from the past, before Unicode became prevalent, where systems used to ever only work in a single locale, and where users expected applications (running locally of course) to use the local system locale. Hence applying the system locale to everything was the standard behavior for applications. The C standard library was defined in that way, and since then every other runtime (usually based on C at some level) does the same.
FTR, Python does not in fact do this. Python 2 did have this locale-dependent behavior, but Python 3 has never behaved this way. The workaround in the OP is, thankfully, quite obsolete.
If you call a case-related method like `lower` on a Python string, the behavior you get is based on tables which are built into Python, taken straight from the Unicode standard's data files, and completely independent of your system configuration.
It would nice to also have the option of explicitly using a particular locale. Here's a discussion from 2019 about potentially adding that option:
https://bugs.python.org/issue37848
You'll be glad to see everyone there agrees the default should remain invariant.
I ran into this with C#/.NET on Windows - I tried to convert a string "1.3" to the float 1.3, and it failed on languages that use comma as their decimal separator.
Indeed. As a person from a comma country, I find these mistakes in most code bases I look at. It makes it frustrating to contribute to open source, for example.
Perhaps it’ll make you feel better about your parsing bug that even the C# compiler (Roslyn) code base had several of these issues.
I don’t know why so many languages have string functions that should take a locale but provide an overload that doesn’t and which uses the system locale as the default. It can’t be what many developers actually want, yet it has become the norm. Worse, code using a default locale appears to work on the developers machine and in production, until someone parses a number in France or lowercases a string in Turkey, which is a late and expensive discovery of the bug.
The default shouldn’t be the system locale, it should be an invariant locale. And I’ll go so far as arguing this invariant locale should be invariant across systems (meaning it can’t just defer to a system C library either).