ECON 314 · Reference Guide

R Guide

Kyler Patrick, 2026.

Sections
01
R Markdown
Create .rmd files, insert code chunks, format text with LaTeX
02
Coding Tips
Data types, clean code practices, naming objects & variables
03
Packages
Installing and loading packages; recommended starter packages
04
Loading Data
Load .RData and .csv files; clean data, remove NAs, create dummies
05
Summary Tables
summary(), stargazer, and export_summs for professional tables
06
Regression Models
Linear models with lm(), logarithmic models, robust standard errors
07
Visualizations
Base R plots, histograms, correlation, and intro to ggplot2
A. Creating an R Markdown File (.rmd)
  1. Open RStudio. Click File → New File → R Markdown…
  2. Name the file and select your output type: HTML, PDF, or Word. Word is best for most users as it converts easily. HTML works without LaTeX or Microsoft Office. PDF requires LaTeX (which can be difficult to install). The output can be changed at the top of the file next to output:.
  3. Every new R Markdown file starts with a setup chunk — do not delete it:
R
{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
⚠ Important

Delete any other auto-generated example code when you open a new file, but always keep the setup chunk.

  1. Save the file: File → Save As… and choose your location.
  2. R does not auto-save. Save frequently with File → Save (or Ctrl+S), especially after writing major code chunks.
B. Inserting a Code Chunk

The quickest way to insert a code chunk is with a keyboard shortcut:

OSShortcut
WindowsCtrl + Alt + I
MacCmd + Option + I

Alternatively, you can manually type the chunk delimiters:

R Markdown
```{r}
print("Example Code Chunk")
```
Output
## [1] "Example Code Chunk"
C. Text Formatting in R Markdown
Emphasis

Text outside of code chunks is treated as plain text. Use asterisks for emphasis:

SyntaxResult
*italics*italics
**bold**bold
Math & Equations (LaTeX Syntax)

Use dollar signs to write math inline or as a displayed equation:

SyntaxUseExample
$...$Inline mathWrite beta: $β$
$$...$$Displayed equation$$\frac{1}{2} + \frac{1}{2} = 1$$
x^{n}Superscript2² = 4
💡 Tip

To type a literal dollar sign in markdown (not math), escape it with a backslash: \$. Don't do this inside code boxes.

A. Types of Data
TypeDescriptionExample
CharacterText / string data. Created with "" or ''. Any non-numeric entry in loaded data becomes a character."hello"
IntegerWhole numbers. Can be converted to/from character or numeric.5L
NumericNon-whole numbers. Can be converted to/from character or numeric.3.14
B. Clean Coding Practices
1. Use Spacing

Spaces between lines and between operators improve readability and make debugging easier.

R — Good
ExampleDF <- as.data.frame(ExampleData)

GuideLm1 <- lm(Var1 ~ Var2 + Var3, Data = Example)
R — Avoid this
# Harder to read and debug:
Example<-as.data.frame(ExampleData)
GuideLm1<-lm(Var1~Var2+Var3,Data=Example)
2. Sensible Variable Names

Since variable names can't contain spaces, use one consistent substitution method — either underscores or CamelCase — and never mix them:

R
# CamelCase:
ExampleVariable <- ExampleData$Variable1

# Underscores:
Example_Variable <- ExampleData$Variable1
3. Abbreviating Long Names

If a variable name is too long, remove vowels — but do it consistently for everything:

R
ExmplVrble <- ExampleVariable
4. Naming Models

Pick a consistent prefix for model names (lm, reg, or model) and number them sequentially:

R
lm1 <- lm(data$y ~ data$x1 + data$x2)
lm2 <- lm(data$y ~ data$x3 + data$x4)
5. Comments

Use # to add comments or prevent code from running. To comment/uncomment multiple lines at once, highlight them and press Ctrl+Shift+C.

R
# This is a comment — it won't run
# summary(data)  <-- this line is "commented out"
summary(data)  # runs normally
C. Saving Files

Create one main folder for the course, then subfolders for data files and markdown files. If you're using a lab desktop, use a flash drive so you can access files on any machine.

D. Naming & Accessing Objects
Creating named objects

Assign any object (data frames, models, variables, tables) using <-:

R
Object1 <- summary(ExampleData$x1)
Selecting a specific variable

Use the $ operator to pull a specific column from a data frame:

R
ExampleData$x1
Installing Packages

Use install.packages() to install a package. The name must be spelled correctly and surrounded by quotes. Only install each package once.

R
install.packages("stargazer")
💡 Recommended Starter Packages for ECON 314
PackagePurpose
stargazerPublication-quality summary and model tables
jtoolsModel tables with robust SE support
lmtestHypothesis tests for regression models
sandwichRobust standard error estimation
tidyverseIncludes ggplot2, dplyr, readr, tidyr, purrr, tibble, stringr, forcats, lubridate
Loading Packages

Use library() to load a package each time you open R. No quotes needed.

R
library(stargazer)
Output
## Please cite as:
## Hlavac, Marek (2022). stargazer: Well-Formatted Regression and Summary Statistics Tables.
## R package version 5.2.3. https://CRAN.R-project.org/package=stargazer
💡 Tip for Beginners

While still learning R, avoid putting all your library() calls at the top of the document. Instead, load each package right before you need it — this helps you learn what each package actually does.

📦 Example Dataset

The examples below use the wage1 dataset from the wooldridge package. Run this to follow along:

R
# install.packages("wooldridge")  # run once, then comment out
library(wooldridge)
data("wage1")
ExampleData <- wage1
View(ExampleData)
A. Loading a .RData File
  1. Copy the file path. On Windows: click the file then Ctrl+Shift+C, or right-click → "Copy as path". On Mac: Control-click → hold Option → "Copy as Pathname".
  2. Switch any backslashes (\) to forward slashes (/) in the path. Then use load() and View():
R
# load("C:/Users/kkpat/Desktop/Econometrics/RData/attend.Rdata")
# ExampleData1 <- attend
# View(ExampleData1)
B. Loading a .csv File
  1. Convert Excel files (.xlsx/.xls) to .csv by saving as CSV in Excel first.
  2. Load tidyverse or readr, then use read_csv(). The result is a tibble — a modernized data frame:
R
library(tidyverse)
ExampleData2 <- read_csv("C:/Users/kkpat/Desktop/STAT320/Data/SelectedVars.csv")
View(ExampleData2)

To save the tibble as an .RData file (saves to the same folder as your R Markdown):

R
save(ExampleData, file = "ExampleData.RData")
C. Cleaning Data
Remove NA values

Use na.omit() on the full dataset or a specific variable:

R
ExampleDataNoNA <- na.omit(ExampleData)

ExampleData$WageNoNA <- na.omit(ExampleData$wage)
Create a subset of variables

Use subset() to select specific columns from a large dataset:

R
ExampleDataSubset <- subset(ExampleData, select = c(wage, educ, exper))
View(ExampleDataSubset)
Create a numeric dummy variable

Use ifelse() to convert a categorical variable into a binary (0/1) dummy:

R
# Creates a dummy = 1 if "old", = 0 if "new"
ExampleData$x1Dummy <- ifelse(DataName$x1 == "old", 1, 0)
A. The summary() Command
For data — five number summary + mean
R
summary(ExampleData$wage)
Output
##  Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
## 0.530   3.330   4.650   5.896   6.880  24.980
For models — full regression summary
R
Model1 <- lm(wage1$wage ~ wage1$educ + wage1$exper)
summary(Model1)
Output
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)
## (Intercept)  -3.39054    0.76657  -4.423 1.18e-05 ***
## wage1$educ    0.64427    0.05381  11.974  < 2e-16 ***
## wage1$exper   0.07010    0.01098   6.385 3.78e-10 ***
##
## Multiple R-squared: 0.2252,  Adjusted R-squared: 0.2222
## F-statistic: 75.99 on 2 and 523 DF,  p-value: < 2.2e-16
B. Stargazer

Stargazer produces publication-quality tables for both data and models. Use type = "text" for working in R; switch to type = "html" with an out = "Table.htm" argument for final papers.

Data summary table
R
library(stargazer)
stargazer(as.data.frame(ExampleData), type = "text", title = "Data Summary")
Model summary table
R
stargazer(Model1, type = "text", title = "Model Summary")
Multiple models side by side
R
stargazer(Model1, Model2, type = "html", out = "ModelTable1.htm",
          title = "Model 1 and 2 Summary")
Robust standard errors comparison
R
library(stargazer)
library(sandwich)

RobustSE <- sqrt(diag(vcovHC(Model1, type = "HC1")))

stargazer(Model1, Model1,
          se = list(NULL, RobustSE),
          type = "text",
          title = "Without and With Robust Standard Errors",
          notes = "Robust standard errors on the right",
          notes.append = TRUE)
C. export_summs (jtools)

A simpler alternative that outputs text-style model tables.

Basic model table
R
library(jtools)
export_summs(Model1)
Standard SE vs Robust SE comparison
R
library(jtools)
Model1Robust <- summ(Model1, robust = TRUE)

export_summs(Model1, Model1Robust,
             model.names = c("Standard SE", "Robust SE"))
A. Linear Models — lm()

Use lm() to fit an Ordinary Least Squares regression. Two equivalent formats:

R — Format 1 ($ notation)
Reg1 <- lm(ExampleData$wage ~ ExampleData$educ
                              + ExampleData$exper
                              + ExampleData$female)
R — Format 2 (data = argument)
Reg1 <- lm(wage ~ educ + exper + female, data = ExampleData)
# Variable names must exactly match the dataset columns
💡 Tip

The tilde ~ (meaning "is modeled by") is typed with Shift + ` (the key left of 1).

summary(Reg1) Output
## Coefficients:
##                    Estimate Std. Error t value  Pr(>|t|)
## (Intercept)       -1.73448    0.75362  -2.302   0.0218 *
## ExampleData$educ   0.60258    0.05112  11.788  < 2e-16 ***
## ExampleData$exper  0.06424    0.01040   6.177 1.32e-09 ***
## ExampleData$female -2.15552   0.27031  -7.974 9.74e-15 ***
##
## Multiple R-squared: 0.3093,  Adjusted R-squared: 0.3053
B. Logarithmic Models

To estimate a log-linear model, first create the log of the variable as a new column, then run lm() as usual.

R
# Step 1: Create the log variable
ExampleData$Logwage <- log(ExampleData$wage)

# Step 2: Run the regression on the log outcome
Reg2 <- lm(ExampleData$Logwage ~ ExampleData$educ
                                  + ExampleData$exper
                                  + ExampleData$female)
summary(Reg2)
Output
## Coefficients:
##                     Estimate Std. Error t value  Pr(>|t|)
## (Intercept)         0.480836   0.105016   4.579 5.86e-06 ***
## ExampleData$educ    0.091290   0.007123  12.816  < 2e-16 ***
## ExampleData$exper   0.009414   0.001449   6.496 1.93e-10 ***
## ExampleData$female -0.343597   0.037667  -9.122  < 2e-16 ***
##
## Multiple R-squared: 0.3526,  Adjusted R-squared: 0.3488
A. Simple Base R Plots
Scatter plot

Use plot(x, y) — list the x variable first, then y:

R
plot(ExampleData$exper, ExampleData$wage)
Correlation

Calculate the correlation coefficient between two variables with cor():

R
cor(ExampleData$exper, ExampleData$wage)
Output
## [1] 0.1129034
Histogram

Use hist() with a single variable to see its distribution:

R
hist(ExampleData$wage)
B. ggplot2

The ggplot2 package (part of the tidyverse) is far more powerful and flexible for creating publication-quality visualizations. It supports scatter plots with best-fit lines, histograms, violin plots, box-and-whisker plots, line charts, pie charts, and much more.

📖 Resource

The official ggplot2 documentation and cheat sheet are available at: https://ggplot2.tidyverse.org/

Quick ggplot2 starter example
R
library(ggplot2)

# Scatter plot with a best-fit line
ggplot(ExampleData, aes(x = educ, y = wage)) +
  geom_point(alpha = 0.4) +
  geom_smooth(method = "lm", se = TRUE) +
  labs(title = "Wage vs Education",
       x = "Years of Education",
       y = "Hourly Wage ($)") +
  theme_minimal()