Multivariate Statistical Analysis Cookbook with R
2020-12-10
Chapter 1 Introduction
Welcome to Yile’s Multivariate Statistical Analysis Cookbook! This cookbook is designed for beginners and future me who might need Multivariate Statistical Analysis (MSA) as tools to analyze neuroimaging or behavioral data. The original idea of writing it is coming from my final project in Advanced Research Methods class (RM3),instructed by Dr. Herve Abdi, which is the most useful and hardcore (if you allow me to use this word lol) class I have ever taken at UTD. In this class, we are required to write a book to introduce all Eight analysis with R codes skills we learned from this class. After this class, I plan to keep updating this book as a source base for myself and anyone in need. This book will be accessible on My Personal Website. Also, it is welcomed to send email to me if you have any comment to this book ylwwayne@gmail.com.
1.1 Main Packages
For happy computing, I listed the packages required for our analysis in the chunk below. Before conducting any analysis, we should keep all these packages installed and updated in your local R environment.
# Clean Start----
rm(list = ls())
graphics.off()
# Packages----
library(bookdown)
library(MExPosition)
library(RCurl)
library(dplyr)
library(ggplot2)
library(ggplotify)
library(grid)
library(gridExtra)
library(PTCA4CATA)
library(ExPosition)
library(data4PCCAR)
library(stringr)
library(readxl)
library(RColorBrewer)
library(DistatisR)
library(TExPosition)
library(InPosition)
library(TInPosition)
library(corrplot)
library(tidyverse)
library(gtable)
library(prettyGraphs)
library(superheat)
library(knitr)
library(psych)
library(pheatmap)
library(factoextra)
1.2 MSA Introduction
Let’s cut right to the chase now. What’s MSA? If only one sentence is allowed to give a definition about MSA, probably I will say it is a tool that helps us explore the linear relationship between observations and multiplevariables. Compared to analysis of variance (ANOVA), which is designed to analyze the differences among groups means based on the law of total variance, MSA will provide me more details of how these groups are different and which variables contribute more to the group separation.
Well, it still looks like an abstract concept. What should we know from the ground? When I am studying a new concept, my favorite part is when instructor gave example on it. Let me try to give an example here: As we know, if doing research is to cut down the tree then observe its inside texture, the statistical analysis should be the tool (saw), our data should be the tree and the answer of the research question should be the texture of this tree.
Thus, our question is: how to find out the best saw for different trees? There are different kinds of data sets in various studies. In the table below, it shows that different methods is corresponding to different data types (quantitative data and qualitative data) and data tables. Still, when I am talking about two or more data table, generally it points to data tables in different modalities (brain imaging vs behavioral test, or IQ test vs Physical activity).
Data Types | One Data Table | Two Data Table | Multiple Data Table |
---|---|---|---|
Quantitative | PCA | BADA, PLS-C | MFA |
Qualitative | CA, MCA | DiCA, PLS-C | DiSTATIS |
Table.1: Different MSA methods corresponding to number of different Data Tables.
1.3 MSA data set
In this book, we will go through all these methods with vivid examples, which include graphs, codes and illustration for each methods. I will use a public data set called “collective action” as our main example, and use two food-related data sets as special examples for CA and DiSTATIS. All these data sets are accessible in my github, so it is welcomed if anyone want to practice MSA with these data sets following this book.
1.4 Whoo-hoo!
Let’s begin our journey at MSA!