Given the amount of competition and attention given to the performance of reading csv files one could be forgiven for thinking this was one of the hardest problems in data science.

DATA SCIENTIST: You see the problem comes when I try to calculate the….

SOFTWARE DEV: Wait! How many seconds did you just wait for that data? That’s absurd!

DS: Quite so. Anyhow as you’ll see if I…

SD: I mean how can you even tolerate that?! I’d lose my mind. I’m just going to go do some benchmarking…

@milesmcbain as an easily distracted PhD student I really appreciate it. Disruptions to the workflow otherwise risk falling into the lap of the internet...

@jroper I think you’re saying you enjoy reading csv file benchmarks? Not sure what to do with that dead cat. 😂

Follow

@milesmcbain haha no I just appreciate the results of the competition. I mostly use vroom and I was so happy when I found it.

@jroper ah I see. Well that’s great.

Allow a salty old data dog a word of advice: if you are working within a framework that allows you to ensure end to end reproducibility whilst caching the raw read and initial preprocessing steps, then raw read speeds tend to matter little since you almost never need to rerun that step as you work.

Since you’re an R user you have one of the best frameworks available for this style of work in {targets}.

Sign in to participate in the conversation
transportation.social

A Mastodon instance for transportation professionals!