Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

At a former job we processed files, many CSV, from hundreds of different sources and I was in charge of cleaning up the code that did this once. There were a few tiny binaries without source called csv2tsv and csv2tsv2. No documentation of any kind of course.

csv2tsv just handled quoted fields and I was able to replace it with a few lines of Python without issue.

The csv2tsv2 program was used for CSV files from exactly one company. We couldn't ask them for technical assistance - it's likely their systems had been written years ago and running without updates since then - so I tried to figure out what the binary was doing. The input file had some null characters in it, but they seemed to be used inconsistently, and I was never able to figure out what that binary was supposed to do.

I left that binary alone and kept using it for a few years before a new guy joined and took over the system from me. I mentioned this weird old binary to him and over the course of a week he poked at it now and again before figuring out what it was doing. He used to work at a bank and realized it was using the same quoting method that some old data format they'd used there did - something about doubling characters and a few other tricks.

There's nothing simple someone won't make complicated.



You are thinking of these: https://www.w3.org/Tools/csvtotab-vv

Agreed, they are excellent.




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: