Writes idiomatic tidyverse R for data analysis - dplyr wrangling pipelines, tidyr reshaping, explicit joins, layered ggplot2 visualization, and broom-tidied statistical models - with reproducibility practices baked in. Use when someone asks "write this analysis in R", "how do I pivot this data frame", "fit a regression per group in R", or wants messy base-R scripts converted to clean pipe-based tidyverse code. Do NOT use for Python-based dataframe work - use pandas-expert instead; for interpreting results for stakeholders use sql-to-insights.
Click to play with sound.
---
name: R for Analysis
description: Writes idiomatic tidyverse R for data analysis - dplyr wrangling pipelines, tidyr reshaping, explicit joins, layered ggplot2 visualization, and broom-tidied statistical models - with reproducibility practices baked in. Use when someone asks "write this analysis in R", "how do I pivot this data frame", "fit a regression per group in R", or wants messy base-R scripts converted to clean pipe-based tidyverse code. Do NOT use for Python-based dataframe work - use pandas-expert instead; for interpreting results for stakeholders use sql-to-insights.
---
# R for Analysis
Produce R analysis code that another analyst can re-run a year later and get the same answer. The costly mistake this prevents: interactive one-off scripts with silent type inference, implicit grouping, and unseeded randomness - code that works once on one laptop and never again.
## Inputs to collect
Before writing code, establish:
1. The data source and its shape - file, database, or in-memory; roughly how many rows (millions of rows change the tool choice: consider data.table or duckdb backends past ~10M rows).
2. The analytical question in one sentence. If the user cannot state it, help them state it first - code without a question produces plots without a point.
3. The output target - exploratory notebook, Quarto report, or a table/figure for a deck. This decides how much polish the code needs.
Label any assumption about column meanings as a guess and confirm against a `glimpse()` of the real data.
## Operating procedure
Work in this order - each step protects the ones after it.
### Step 1: Reproducible setup
… install to load the full skill