Dataframes and Tibbles are frequently employed data structures in R for storing and manipulating data. They facilitate the organization, exploration, and analysis of data.
Summary of Chapter 4 – Reading Data into R
In Chapter 4, we continue our exploration of data frames in R, focusing on reading various file formats into a data frame, managing the working directory, merging data frames, and the concept of tibbles, all with the overall objective of effectively reading data into R, for further analysis.
R uses the concept of a working directory to manage files. By default, R looks for and saves files in the working directory. To check the current working directory, we use the getwd()
function. We can change the working directory using the setwd()
function. It’s advisable to choose an easily accessible working directory and avoid using spaces, special characters, and non-ASCII characters in file paths.
We can read various file formats into a data frame. For a CSV file, we use the read.csv()
function. Reading an Excel file into a data frame is accomplished using the read_excel
function from the readxl
package. We also explored reading data from Google Sheets using the gsheet2tbl
function in the gsheet
package.
When dealing with multiple data sources, we may need to merge or join two data frames. R provides the merge()
function to merge data frames based on a common key.
The final concept introduced in this chapter is tibbles. A tibble is a modern reimagining of the data frame, part of the tidyverse collection of packages. Tibbles are created and manipulated using the dplyr
package. Tibbles have several distinct characteristics, such as unique, non-empty column names, better subsetting behavior, and improved output for large datasets. We reviewed how to create a tibble using the tibble()
function, and convert a data frame to a tibble using the as_tibble()
function, and vice versa with as.data.frame()
.
In conclusion, this chapter built upon the concept of data frames and tibbles, showing how they can be used as a versatile tool for reading and storing data in various formats and getting ready to further explore the data.
References
[1] RDocumentation. (n.d.). data.frame function. Retrieved from
https://www.rdocumentation.org/packages/base/versions/3.6.2/topics/data.frame”
“Tutorialspoint. (n.d.). R - Data Frames. Retrieved from https://www.tutorialspoint.com/r/r_data_frames.htm
Statistics Globe. (n.d.). What is a Data Frame in R? (3 Examples) | data.frame Object Explained. Retrieved from https://statisticsglobe.com/what-is-data-frame-r
[2] McNeil, D. R. (1977). Interactive Data Analysis. Wiley. In women: Average Heights and Weights for American Women. Retrieved from https://rdocumentation.org/packages/datasets/versions/3.6.2/topics/women
Becker, R. A., Chambers, J. M., & Wilks, A. R. (1988). The New S Language. Wadsworth & Brooks/Cole. In iris: Edgar Anderson’s Iris Data. Retrieved from https://www.rdocumentation.org/packages/datasets/versions/3.6.2/topics/iris
Krasser, R. (2023, October 11). Explore mtcars. The Comprehensive R Archive Network. Retrieved from https://cran.r-project.org/web/packages/explore/vignettes/explore_mtcars.html
RDocumentation. (n.d.). mtcars: Motor Trend Car Road Tests. Retrieved from https://www.rdocumentation.org/packages/datasets/versions/3.6.2/topics/mtcars
Henderson, V., & Velleman, P. (1981). Motor Trend Car Road Tests. Retrieved from https://web.mit.edu/r/current/lib/R/library/datasets/html/mtcars.html
Wickham, H. (2016). Ggplot2: Elegant graphics for data analysis (2nd ed.). Springer International Publishing.
[3] PsyTeachR. (n.d.). Chapter 1: Importing data from different file formats into R. Retrieved from https://psyteachr.github.io
Boston University School of Public Health. (n.d.). Reading and Writing Data to and from R. Retrieved from http://sphweb.bumc.bu.edu
RStudio. (n.d.). An Introduction to R - 7 Reading data from files. Retrieved from https://rstudio.github.io
Andrews, M. J. (n.d.). Read Multiple Files into a Single Data Frame. Retrieved from http://www.mjandrews.org
ProgrammingR. (n.d.). A Guide To File Readers And Data Conversion In R. Retrieved from http://www.programmingr.com
FreeCodeCamp. (n.d.). How to Work With Data Frames and CSV Files in R — A Detailed Introduction with Examples. Retrieved from http://www.freecodecamp.org
R Core Team. (2021). getwd(): working directory; setwd(dir): change working directory. In R: A language and environment for statistical computing. R Foundation for Statistical Computing. Retrieved from https://stat.ethz.ch/R-manual/R-devel/library/base/html/getwd.html
[4] gsheet2tbl. (n.d.). In gsheet: Download Google Sheets Using Just the URL. Retrieved from https://rdrr.io/cran/gsheet/man/gsheet2tbl.html.
[5] Tibble overview and usage. (n.d.). In tibble: Simple Data Frames. Retrieved from tibble.tidyverse.org.
Analytics Steps. (n.d.). Tibbles in R Programming. Retrieved from www.analyticssteps.com.
Tibble creation and name repair. (n.d.). In tibble: Build a data frame. Retrieved from tibble.tidyverse.org.
CRAN. (n.d.). Tibbles vs data frames: Printing and subsetting. Retrieved from cran.r-project.org.
Basic R Programming
[6] Chambers, J. M. (2008). Software for Data Analysis: Programming with R (Vol. 2, No. 1). New York: Springer.
Crawley, M. J. (2012). The R Book. John Wiley & Sons.
Gardener, M. (2012). Beginning R: The Statistical Programming Language. John Wiley & Sons.
Grolemund, G. (2014). Hands-On Programming with R: Write Your Own Functions and Simulations. O’Reilly Media, Inc.
Kabacoff, R. (2022). R in Action: Data Analysis and Graphics with R and Tidyverse. Simon and Schuster.
Peng, R. D. (2016). R Programming for Data Science (pp. 86-181). Victoria, BC, Canada: Leanpub.
R Core Team. (2020). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. Retrieved from https://www.R-project.org/.
Tippmann, S. (2015). Programming Tools: Adventures with R. Nature, 517(7532), 109-110.
Wickham, H., Çetinkaya-Rundel, M., & Grolemund, G. (2023). R for Data Science. O’Reilly Media, Inc.