Given the amount of competition and attention given to the performance of reading csv files one could be forgiven for thinking this was one of the hardest problems in data science.
DATA SCIENTIST: You see the problem comes when I try to calculate the….
SOFTWARE DEV: Wait! How many seconds did you just wait for that data? That’s absurd!
DS: Quite so. Anyhow as you’ll see if I…
SD: I mean how can you even tolerate that?! I’d lose my mind. I’m just going to go do some benchmarking…
@milesmcbain as an easily distracted PhD student I really appreciate it. Disruptions to the workflow otherwise risk falling into the lap of the internet...
@milesmcbain haha no I just appreciate the results of the competition. I mostly use vroom and I was so happy when I found it.
@milesmcbain Thank you, that looks great.
@jroper ah I see. Well that’s great.
Allow a salty old data dog a word of advice: if you are working within a framework that allows you to ensure end to end reproducibility whilst caching the raw read and initial preprocessing steps, then raw read speeds tend to matter little since you almost never need to rerun that step as you work.
Since you’re an R user you have one of the best frameworks available for this style of work in {targets}.