Data Frames

In the previous chapter, we saw how to create atomic vectors that store one or more values of the same type. However, data will often be more complex involving rows and columns that contain multiple data types. This is referred to as tabular data. Below is an example of a tabular data set with information on 1,000 employees from a software company. (For convenience, we only display the first six rows of the data set.)

By convention, the observations (i.e., the employees) form the rows of the data set, and the variables (i.e., the characteristics of the employees we are measuring) form the columns. The dimensions of the data set are typically written as \(n\) x \(m\), where \(n\) is the number of observations (or rows) and \(m\) is the number of variables (or columns).

../../_images/tabular_data.png

Fig. 4 Example of Tabular Data

In R, this type of data is stored in a data frame. In this chapter we will explore the basics of data frames, which we will rely on heavily throughout the course.