TIL this exists
I like programming and anime.
I manage the bot /u/mahoro@lemmy.ml
TIL this exists
I also like the POSIX “seconds since 1970” standard, but I feel that should only be used in RAM when performing operations (time differences in timers etc.). It irks me when it’s used for serialising to text/JSON/XML/CSV.
I’ve seen bugs where programmers tried to represent date in epoch time in seconds or milliseconds in json. So something like “pay date” would be presented by a timestamp, and would get off-by-one errors because whatever time library the programmer was using would do time zone conversions on a timestamp then truncate the date portion.
If the programmer used ISO 8601 style formatting, I don’t think they would have included the timepart and the bug could have been avoided.
Use dates when you need dates and timestamps when you need timestamps!
Do you use it? When?
Parquet is really used for big data batch data processing. It’s columnar-based file format and is optimized for large, aggregation queries. It’s non-human readable so you need a library like apache arrow to read/write to it.
I would use parquet in the following circumstances (or combination of circumstances):
Since the data is columnar-based, doing queries like select sum(sales) from revenue
is much cheaper and faster if the underlying data is in parquet than csv.
The big advantage of csv is that it’s more portable. csv as a data file format has been around forever, so it is used in a lot of places where parquet can’t be used.
This is a classic piece, and I love the contradictions in the text. It encapsulates my feelings on good software and code that it almost becomes an art than a science.