Essential R Commands¶

Section	Command	Explanation	Link
R Basics	`+ - * /`	Addition, subtraction, muliplication, division	Here
R Basics	`^`	Exponentiation	Here
R Basics	`<-`	Assignment	Here
R Basics	`class(var)`	Determine the data type of `var`	Here
R Basics	`c()`	Create an atomic vector with multiple elements	Here
R Basics	`length(x)`	Determine the length of the atomic vector `x`	Here
R Basics	`sum(x)`	Sum the values of the atomic vector `x`	Here
R Basics	`mean(x)`	Determine the mean of the atomic vector `x`	Here
R Basics	`min(x)` & `max(x)`	Determine the miniomum and maximum (respectively) of the atomic vector `x`	Here
Data Frames	`install.packages("packageName")`	Install a new package called `packageName`	Here
Data Frames	`library(packageName)`	Load a pre-installed package called `packageName`	Here
Data Frames	`tidyverse::read_csv(file)`	Read in data from a csv file (`file`)	Here
Data Frames	`readxl::read_excel(file)`	Read in data from an Excel file (`file`)	Here
Data Frames	`nrow(df)`	Count the number of rows of the `df` dataframe	Here
Data Frames	`ncol(df)`	Count the number of columns of the `df` dataframe	Here
Data Frames	`dim(df)`	Determine the dimensions of the `df` dataframe	Here
Data Frames	`head(df)` & `tail(df)`	Output the first and last few rows (respectively) of the `df` dataframe	Here
Data Frames	`str(df)`	Output the structure of the `df` dataframe	Here
Data Frames	`df$var`	Return an atomic vector of the `var` column from the `df` dataframe	Here
Data Frames	`tidyverse::parse_number(x)`	Convert the atomic vector `x` to a numeric	Here
Data Frames	`tidyverse::parse_date(x)`	Convert the atomic vector `x` to a date	Here
Data Frames	`tidyverse::arrange(df, var1, var2, var3, ...)`	Sort the dataframe `df` by the columns `var1`, `var2`, `var3`, etc.	Here
Data Frames	`tidyverse::filter(df, cond1, cond2, cond3, ...)`	Filter the dataframe `df` by the logical condition `cond1`, `cond2`, `cond3`, etc.	Here
Data Frames	`tidyverse::select(df, var1, var2, var3, ...)`	Select columns `var1`, `var2`, `var3`, etc. from the dataframe `df`	Here
Exploring Data	`median(x)`	Determine the median of the atomic vector `x`	Here
Exploring Data	`quantile(x, probs = c(0, 0.25, 0.5, 0.75, 1))`	Determine the specified quantiles (in `probs`) of the atomic vector `x`	Here
Exploring Data	`sd(x)` & `var(x)`	Calculate the standard deviation and variance (respectively) of the atomic vector `x`	Here
Exploring Data	`table(x)` & `prop.table(table(x))`	Tabulate the count and proportion (respectively) of the atomic vector `x`	Here
Exploring Data	`table(x)` & `prop.table(table(x))`	Tabulate the count and proportion (respectively) of the atomic vector `x`	Here
Exploring Data	`hist(x)`	Create a histogram of the atomic vector `x`	Here
Exploring Data	`boxplot(x)`	Create a boxplot of the atomic vector `x`	Here
Exploring Data	`boxplot(y ~ x)`	Create a side-by-side boxplot of the atomic vector `y` over the values of `x`	Here
Exploring Data	`plot(x, y)`	Create a scatter plot of `x` and `y`	Here
Exploring Data	`barplot(x)`	Create a barplot of the atomic vector `x`	Here
Exploring Data	`pie(table(x))`	Create a pie chart of the atomic vector `x`	Here
Wrangling & Visualization with the tidyverse	`%>%`	Chain multiple tidyverse operations together	Here
Wrangling & Visualization with the tidyverse	`tidyverse::summarise(df, stat1 = ..., stat2 = ..., ...)`	Calculate summary statistics (`stat1`, `stat2`, etc.) from the dataframe `df`	Here
Wrangling & Visualization with the tidyverse	`tidyverse::group_by(var)`	Group tidyverse operations by the `var` variable	Here
Statistical Inference	`t.test(x, mu = 0)`	Conduct a one-sample t-test of means	Here
Statistical Inference	`binom.test(x, n, p = 0.5)`	Conduct a one-sample proportions test	Here
Statistical Inference	`t.test(y ~ x)`	Conduct a two-sample t-test of means, where `y` contains the outcome variable and `x` indicates group membership	Here
Statistical Inference	`prop.test(x = c(x1, x2), n = c(n1, n2))`	Conduct a two-sample proportions test	Here
Causal Inference	`pwr::pwr.t.test(d, power)`	Perform a power calculation based on the effect size (`d`) and power of the test (`power`)	Here
Linear Regression	`cor(x, y)`	Calculate the correlation between the atomic vectors `x` and `y`	Here
Linear Regression	`cor.test(x, y)`	Calculate the correlation between the atomic vectors `x` and `y` with a 95% confidence interval	Here
Linear Regression	`fit <- lm(y ~ x1 + x2 + ..., data = df)`	Create a linear regression model called `fit` based on the `y`, `x1`, `x2`, etc. variables from the dataframe `df`	Here
Linear Regression	`confint(fit)`	Calculate 95% confidence intervals for the regression coefficients in the model `fit`	Here
Linear Regression	`summary(fit)`	Summarize the regression model `fit`	Here
Linear Regression	`predict(fit, newData)`	Apply the regression model `fit` to new observations stored in `newData`	Here
Logistic Regression	`fit <- glm(y ~ x1 + x2 + ..., data = df, family = "binomial")`	Create a logistic regression model called `fit` based on the `y`, `x1`, `x2`, etc. variables from the dataframe `df`	Here
Logistic Regression	`predict(fit, newData, type = "response")`	Apply the logistic regression model `fit` to new observations stored in `newData`	Here
Logistic Regression	`MLmetrics::LogLoss(y_pred, y_true)`	Calculate the log loss of a model whose predictions are stored in `y_pred` based on the true values stored in `y_true`	Here
Tree Models	`fit <- rpart::rpart(y ~ x1 + x2 + ..., data = df)`	Create a decision tree model called `fit` based on the `y`, `x1`, `x2`, etc. variables from the dataframe `df`	Here
Tree Models	`rpart.plot::rpart.plot(fit)`	Visualize the decision tree model `fit`	Here
Model Evaluation	`caTools::sample.split(df$y)`	Define a random split of the dataframe `df` based on the outcome variable `y`	Here
Model Evaluation	`MLmetrics::Accuracy(y_pred, y_true)`	Calculate the accuracy of a model whose predictions are stored in `y_pred` based on the true values stored in `y_true`	Here
Model Evaluation	`MLmetrics::ConfusionMatrix(y_pred, y_true)`	Calculate the confusion matrix of a model whose predictions are stored in `y_pred` based on the true values stored in `y_true`	Here
Model Evaluation	`MLmetrics::AUC(y_pred, y_true)`	Calculate the area under the curve (AUC) of a model whose predictions are stored in `y_pred` based on the true values stored in `y_true`	Here

Data Science for Managers

Essential R Commands¶