The real problem is the programmer mindset of “good for this purpose, good for every purpose”
There so many better suited protocols out there, like avro, I don’t understand the Json obsession for so many poorly suited applications.
Reminds me of markdown vs anything else, I guess because github’s default readme is markdown we now have developers who don’t know anything else, like restructuredtext, “but I already know markdown, so..” .. so we see hundreds of markdown “flavors”, junktown parsers, and crazy pandoc wrapping scripts to do cross references and compile into chapters and this and that extension or tool...
when the things they seek are built right into RST — just too lazy to have to learn it, or to ask our team members to have to learn it?
It’s sad when we make the least effort. Think of business value, stretch your abilities to suit the needs of software design, do not limit design to what you are already comfortable with.
That's the big issue with these formats. There's an inevitable minimal complexity in any data format, and it has to be dealt with somewhere. The "simple" specifications optimize for ease of reading the spec, which is a kind of marketing trick to make it enticing, but in reality leaves so much ambiguity that you get the kinds of problems in the article. A tighter spec on the other hand reduces ambiguity but appears more complicated to anyone reading the spec.
I went the tighter spec route in https://concise-encoding.org because that's the only safe way to go, and also versioned the spec AND documents because I know that even with the care I'm taking there will be mistakes to fix in the spec after release.
I couldn't find an existing format that had the types I needed, and also an unambiguous text/binary twin format. Then once I got started, my pent-up wish list and gripes over the formats I've used in the past came to the surface ;-)
Ok this could still be a problem, though I doubt it would be in practice and the article states "some serializers refused to create binary representations of multiple keys".
> Key Collision: Character truncation and Comments
Not possible with binary formats.
> JSON Serialization Quirks
N/A
> Float and Integer Representation
Not an issue with binary formats.
> Permissive Parsing and Other Bugs
Mostly not an issue with binary formats.
Unfortunately they said they tested some binary formats but didn't present the results!
I don't believe in the spec at all, because as pointed out in the article, the spec is not precise enough. The spec also puts undue burden on the parser.
The spec should absolutely say whether the first or last key has precedence when duplicate keys appear. I don't think the spec should simply demand the parser to error on duplicate keys as that isn't helpful. If such an error exists, there should still be a way to parse the structure and get the data. A duplicate key shouldn't cause complete parse failure. It is recoverable.
To resolve the problem with people placing comments into dummy values ( or duplicate keys ), the JSON spec really needs to implement a feature set key / flag key. Something like:
{
"_flags":["comments"],
...
}
This would allow for extensions to be reasonably supported aside from the fact that none of the parsers will initially do anything with this extra key.
Another flag would be "plainstrings", which would tell the parser NOT to parse escapes in any way and just deliver string data as is. This is what my parser does, as there are too many pitfalls in parsing escapes consistently, and you then also have to code in how to write those escapes back out. This may be passing the back onwards, but I view it as division of responsibilities. Why should the parser be responsible for and required to understand unicode and all its complexities?
Another flag would be "type.[some type]" to indicate presence of a type and the need to be able to parse it. The way I implemented this in my parser is this:
{
key: [type name].[type representation]
}
For example, hex data:
{
key: x.4FE310B2
}
These sorts of things would address the issues brought by the article.
Who am I, though, kidding... Cooperation doesn't exist in the community, no one will change their parsers to do as I suggest, and I'm likely to get downvoted just sharing my ideas. Carry on uncaring world.
out of curiosity, how you manage to get the first key, instead of the last? Because it seems to me, it would be less logic/ easier for a parser to always return the last.
First key comes up with a 3rd party performance library for go.. If you are parsing a stream, etc, it is always possible to have situations where you can stop reading when you get a first answer, but then you don't really know if the stream is valid JSON at all.
I hadn't realized JSON was this susceptible to the same kind of issues.