**MilesMcBain** @milesmcbain@fosstodon.org · Mar 28, 2023, 10:07

**MilesMcBain** @milesmcbain@fosstodon.org · Mar 28, 2023, 10:07

MilesMcBain @milesmcbain@fosstodon.org

Mar 28, 2023, 10:07

MilesMcBain @milesmcbain@fosstodon.org

Given the amount of competition and attention given to the performance of reading csv files one could be forgiven for thinking this was one of the hardest problems in data science.

DATA SCIENTIST: You see the problem comes when I try to calculate the….

SOFTWARE DEV: Wait! How many seconds did you just wait for that data? That’s absurd!

DS: Quite so. Anyhow as you’ll see if I…

SD: I mean how can you even tolerate that?! I’d lose my mind. I’m just going to go do some benchmarking…

**Josephine Roper** @jroper@transportation.social · Mar 28, 2023, 11:11

**Josephine Roper** @jroper@transportation.social · Mar 28, 2023, 11:11

Mar 28, 2023, 11:11

Josephine Roper @jroper@transportation.social

@milesmcbain as an easily distracted PhD student I really appreciate it. Disruptions to the workflow otherwise risk falling into the lap of the internet...

**MilesMcBain** @milesmcbain@fosstodon.org · Mar 29, 2023, 00:46

**MilesMcBain** @milesmcbain@fosstodon.org · Mar 29, 2023, 00:46

Mar 29, 2023, 00:46

MilesMcBain @milesmcbain@fosstodon.org

@jroper I think you’re saying you enjoy reading csv file benchmarks? Not sure what to do with that dead cat. 😂

**Josephine Roper** @jroper@transportation.social · 2023-03-29T01:41:24Z

Josephine Roper @jroper@transportation.social

@milesmcbain haha no I just appreciate the results of the competition. I mostly use vroom and I was so happy when I found it.

Mar 29, 2023, 01:41 · · Mastodon for iOS · · ·

**MilesMcBain** @milesmcbain@fosstodon.org · Mar 29, 2023, 02:07 *

**MilesMcBain** @milesmcbain@fosstodon.org · Mar 29, 2023, 02:07 *

Mar 29, 2023, 02:07 *

MilesMcBain @milesmcbain@fosstodon.org

@jroper ah I see. Well that’s great.

Allow a salty old data dog a word of advice: if you are working within a framework that allows you to ensure end to end reproducibility whilst caching the raw read and initial preprocessing steps, then raw read speeds tend to matter little since you almost never need to rerun that step as you work.

Since you’re an R user you have one of the best frameworks available for this style of work in {targets}.

**Josephine Roper** @jroper@transportation.social · Mar 29, 2023, 02:49

**Josephine Roper** @jroper@transportation.social · Mar 29, 2023, 02:49

Mar 29, 2023, 02:49

Josephine Roper @jroper@transportation.social

@milesmcbain Thank you, that looks great.

Resources

Developers

What is Mastodon?

transportation.social

More…