Kyler Patrick, 2026.
{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
Delete any other auto-generated example code when you open a new file, but always keep the setup chunk.
The quickest way to insert a code chunk is with a keyboard shortcut:
| OS | Shortcut |
|---|---|
| Windows | Ctrl + Alt + I |
| Mac | Cmd + Option + I |
Alternatively, you can manually type the chunk delimiters:
```{r}
print("Example Code Chunk")
```
## [1] "Example Code Chunk"
Text outside of code chunks is treated as plain text. Use asterisks for emphasis:
| Syntax | Result |
|---|---|
| *italics* | italics |
| **bold** | bold |
Use dollar signs to write math inline or as a displayed equation:
| Syntax | Use | Example |
|---|---|---|
| $...$ | Inline math | Write beta: $β$ |
| $$...$$ | Displayed equation | $$\frac{1}{2} + \frac{1}{2} = 1$$ |
| x^{n} | Superscript | 2² = 4 |
To type a literal dollar sign in markdown (not math), escape it with a backslash: \$. Don't do this inside code boxes.
| Type | Description | Example |
|---|---|---|
| Character | Text / string data. Created with "" or ''. Any non-numeric entry in loaded data becomes a character. | "hello" |
| Integer | Whole numbers. Can be converted to/from character or numeric. | 5L |
| Numeric | Non-whole numbers. Can be converted to/from character or numeric. | 3.14 |
Spaces between lines and between operators improve readability and make debugging easier.
ExampleDF <- as.data.frame(ExampleData)
GuideLm1 <- lm(Var1 ~ Var2 + Var3, Data = Example)
# Harder to read and debug:
Example<-as.data.frame(ExampleData)
GuideLm1<-lm(Var1~Var2+Var3,Data=Example)
Since variable names can't contain spaces, use one consistent substitution method — either underscores or CamelCase — and never mix them:
# CamelCase:
ExampleVariable <- ExampleData$Variable1
# Underscores:
Example_Variable <- ExampleData$Variable1
If a variable name is too long, remove vowels — but do it consistently for everything:
ExmplVrble <- ExampleVariable
Pick a consistent prefix for model names (lm, reg, or model) and number them sequentially:
lm1 <- lm(data$y ~ data$x1 + data$x2)
lm2 <- lm(data$y ~ data$x3 + data$x4)
Use # to add comments or prevent code from running. To comment/uncomment multiple lines at once, highlight them and press Ctrl+Shift+C.
# This is a comment — it won't run
# summary(data) <-- this line is "commented out"
summary(data) # runs normally
Create one main folder for the course, then subfolders for data files and markdown files. If you're using a lab desktop, use a flash drive so you can access files on any machine.
Assign any object (data frames, models, variables, tables) using <-:
Object1 <- summary(ExampleData$x1)
Use the $ operator to pull a specific column from a data frame:
ExampleData$x1
Use install.packages() to install a package. The name must be spelled correctly and surrounded by quotes. Only install each package once.
install.packages("stargazer")
| Package | Purpose |
|---|---|
| stargazer | Publication-quality summary and model tables |
| jtools | Model tables with robust SE support |
| lmtest | Hypothesis tests for regression models |
| sandwich | Robust standard error estimation |
| tidyverse | Includes ggplot2, dplyr, readr, tidyr, purrr, tibble, stringr, forcats, lubridate |
Use library() to load a package each time you open R. No quotes needed.
library(stargazer)
## Please cite as:
## Hlavac, Marek (2022). stargazer: Well-Formatted Regression and Summary Statistics Tables.
## R package version 5.2.3. https://CRAN.R-project.org/package=stargazer
While still learning R, avoid putting all your library() calls at the top of the document. Instead, load each package right before you need it — this helps you learn what each package actually does.
The examples below use the wage1 dataset from the wooldridge package. Run this to follow along:
# install.packages("wooldridge") # run once, then comment out
library(wooldridge)
data("wage1")
ExampleData <- wage1
View(ExampleData)
# load("C:/Users/kkpat/Desktop/Econometrics/RData/attend.Rdata")
# ExampleData1 <- attend
# View(ExampleData1)
library(tidyverse)
ExampleData2 <- read_csv("C:/Users/kkpat/Desktop/STAT320/Data/SelectedVars.csv")
View(ExampleData2)
To save the tibble as an .RData file (saves to the same folder as your R Markdown):
save(ExampleData, file = "ExampleData.RData")
Use na.omit() on the full dataset or a specific variable:
ExampleDataNoNA <- na.omit(ExampleData)
ExampleData$WageNoNA <- na.omit(ExampleData$wage)
Use subset() to select specific columns from a large dataset:
ExampleDataSubset <- subset(ExampleData, select = c(wage, educ, exper))
View(ExampleDataSubset)
Use ifelse() to convert a categorical variable into a binary (0/1) dummy:
# Creates a dummy = 1 if "old", = 0 if "new"
ExampleData$x1Dummy <- ifelse(DataName$x1 == "old", 1, 0)
summary(ExampleData$wage)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.530 3.330 4.650 5.896 6.880 24.980
Model1 <- lm(wage1$wage ~ wage1$educ + wage1$exper)
summary(Model1)
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -3.39054 0.76657 -4.423 1.18e-05 ***
## wage1$educ 0.64427 0.05381 11.974 < 2e-16 ***
## wage1$exper 0.07010 0.01098 6.385 3.78e-10 ***
##
## Multiple R-squared: 0.2252, Adjusted R-squared: 0.2222
## F-statistic: 75.99 on 2 and 523 DF, p-value: < 2.2e-16
Stargazer produces publication-quality tables for both data and models. Use type = "text" for working in R; switch to type = "html" with an out = "Table.htm" argument for final papers.
library(stargazer)
stargazer(as.data.frame(ExampleData), type = "text", title = "Data Summary")
stargazer(Model1, type = "text", title = "Model Summary")
stargazer(Model1, Model2, type = "html", out = "ModelTable1.htm",
title = "Model 1 and 2 Summary")
library(stargazer)
library(sandwich)
RobustSE <- sqrt(diag(vcovHC(Model1, type = "HC1")))
stargazer(Model1, Model1,
se = list(NULL, RobustSE),
type = "text",
title = "Without and With Robust Standard Errors",
notes = "Robust standard errors on the right",
notes.append = TRUE)
A simpler alternative that outputs text-style model tables.
library(jtools)
export_summs(Model1)
library(jtools)
Model1Robust <- summ(Model1, robust = TRUE)
export_summs(Model1, Model1Robust,
model.names = c("Standard SE", "Robust SE"))
Use lm() to fit an Ordinary Least Squares regression. Two equivalent formats:
Reg1 <- lm(ExampleData$wage ~ ExampleData$educ
+ ExampleData$exper
+ ExampleData$female)
Reg1 <- lm(wage ~ educ + exper + female, data = ExampleData)
# Variable names must exactly match the dataset columns
The tilde ~ (meaning "is modeled by") is typed with Shift + ` (the key left of 1).
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1.73448 0.75362 -2.302 0.0218 *
## ExampleData$educ 0.60258 0.05112 11.788 < 2e-16 ***
## ExampleData$exper 0.06424 0.01040 6.177 1.32e-09 ***
## ExampleData$female -2.15552 0.27031 -7.974 9.74e-15 ***
##
## Multiple R-squared: 0.3093, Adjusted R-squared: 0.3053
To estimate a log-linear model, first create the log of the variable as a new column, then run lm() as usual.
# Step 1: Create the log variable
ExampleData$Logwage <- log(ExampleData$wage)
# Step 2: Run the regression on the log outcome
Reg2 <- lm(ExampleData$Logwage ~ ExampleData$educ
+ ExampleData$exper
+ ExampleData$female)
summary(Reg2)
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.480836 0.105016 4.579 5.86e-06 ***
## ExampleData$educ 0.091290 0.007123 12.816 < 2e-16 ***
## ExampleData$exper 0.009414 0.001449 6.496 1.93e-10 ***
## ExampleData$female -0.343597 0.037667 -9.122 < 2e-16 ***
##
## Multiple R-squared: 0.3526, Adjusted R-squared: 0.3488
Use plot(x, y) — list the x variable first, then y:
plot(ExampleData$exper, ExampleData$wage)
Calculate the correlation coefficient between two variables with cor():
cor(ExampleData$exper, ExampleData$wage)
## [1] 0.1129034
Use hist() with a single variable to see its distribution:
hist(ExampleData$wage)
The ggplot2 package (part of the tidyverse) is far more powerful and flexible for creating publication-quality visualizations. It supports scatter plots with best-fit lines, histograms, violin plots, box-and-whisker plots, line charts, pie charts, and much more.
The official ggplot2 documentation and cheat sheet are available at: https://ggplot2.tidyverse.org/
library(ggplot2)
# Scatter plot with a best-fit line
ggplot(ExampleData, aes(x = educ, y = wage)) +
geom_point(alpha = 0.4) +
geom_smooth(method = "lm", se = TRUE) +
labs(title = "Wage vs Education",
x = "Years of Education",
y = "Hourly Wage ($)") +
theme_minimal()