---
title: "E. Supervised multiblock analysis"
output:
rmarkdown::html_vignette:
toc: true
vignette: >
%\VignetteIndexEntry{E. Supervised multiblock analysis}
%\VignetteEncoding{UTF-8}
%\VignetteEngine{knitr::rmarkdown}
editor_options:
chunk_output_type: console
---
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.width=6,
fig.height=4
)
# Legge denne i YAML på toppen for å skrive ut til tex
#output:
# pdf_document:
# keep_tex: true
# Original:
# rmarkdown::html_vignette:
# toc: true
```
```{r setup}
library(multiblock)
```
# Supervised methods
The following supervised methods are available in the _multiblock_ package (function names in parentheses):
* MB-PLS - Multiblock Partial Least Squares (_mbpls_)
* sMB-PLS - Sparse Multiblock Partial Least Squares (_smbpls_)
* SO-PLS - Sequential and Orthogonalised PLS (_sopls_)
* PO-PLS - Parallel and Orthogonalised PLS (_popls_)
* ROSA - Response Oriented Sequential Alternation (_rosa_)
* mbRDA - Multiblock Redundancy Analysis (_mbrda_)
The following sections will describe how to format your data for analysis and invoke all methods from the list above.
# Formatting data for multiblock analyses
Data blocks are best stored as named lists for use with the formula interface of R. The following is an example with sample data in one data block and one response block.
```{r}
# Random data
n <- 30; p <- 90
X <- matrix(rnorm(n*p), nrow=n)
y <- X %*% rnorm(p) + 10
# Split X into three blocks in a named list
ABC <- list(A = X[,1:20], B = X[,21:50], C = X[,51:90], y = y)
# Model using names of blocks (see below for full SO-PLS example)
so.abc <- sopls(y ~ A + B + C, data = ABC, ncomp = c(4,3,4))
```
# Multiblock Partial Least Squares - MB-PLS
Multiblock PLS is presented briefly using the _potato_ data.
## Modelling
A multi-response two-block MB-PLS model with up to 10 components in total is cross-validated with 10 random segments.
```{r}
data(potato)
mb <- mbpls(X=potato[c('Chemical','Compression')], Y=potato[['Sensory']], ncomp=10,
max_comps=10, validation="CV", segments=10)
print(mb)
```
## Summaries and plotting
MB-PLS is implemented as a block-wise weighted concatenated ordinary PLSR. Therefore, all methods available for _plsr_ are available for the global part of the MB-PL. In addition one can extrac _blockScores_ and _blockLoadings_.
```{r}
Tb1 <- scores(mb, block=1)
scoreplot(mb, block = 1, labels = "names")
Pb2 <- loadings(mb, block=2)
loadingplot(mb, block = 1, labels = "names")
```
# Sparse Multiblock Partial Least Squares - sMB-PLS
Sparse MB-PLS is presented briefly using the _potato_ data.
## Modelling
A multi-response two-block sMB-PLS model with up to 10 components in total is cross-validated with 10 random segments.
Here, the Soft-Threshold version is used (Truncation version also available) with parameter _shrink = 0.6_ which means
the loading weights have 60% of the largest values subtracted before setting negative values to 0.
```{r}
data(potato)
smb <- smbpls(X=potato[c('Chemical','Compression')], Y=potato[['Sensory']], ncomp = 10,
max_comps=10, shrink = 0.6, validation="CV", segments=10)
print(smb)
```
## Plotting
We demonstrate the effect of shrinkage on scores and sparseness in loading weights by plotting results for three values
of the shrinkage parameter. In the loading weight plots we can follow the shrinkage toward the origin of each variable,
while the score plots show the effect on the sample scores.
```{r}
old.par <- par(mfrow = c(3,2), mar = c(3.5,3.5,1.5,1), mgp = c(2,1,0))
for(shrink in c(0.2, 0.5, 0.8)){
smb <- smbpls(X=potato[c('Chemical','Compression')], Y=potato[['Sensory']], ncomp = 10,
max_comps=10, shrink = shrink)
scoreplot(smb, labels = "names", main = paste0("Superscores, shrink=", shrink))
loadingweightplot(smb, labels = "names", main = paste0("Super-loading weights, shrink=", shrink))
}
par(old.par)
```
# Sequential and orthogonalised PLS - SO-PLS
The following example uses the _potato_ data to showcase some of the functions available for SO-PLS analyses.
## Modelling
A multi-response two-block SO-PLS model with up to 10 components in total is cross-validated with 10 random segments.
```{r}
# Load potato data and fit SO-PLS model
so.pot <- sopls(Sensory ~ Chemical + Compression, data=potato,
ncomp=c(10,10), max_comps=10, validation="CV", segments=10)
print(so.pot)
summary(so.pot)
```
## Måge plot
A full Måge plot for all combinations of components for all blocks is produced. This
can be used for a global search for the best fitting cross-validated model.
Each point in the figure below is accompanied by a sequence of four numbers referring to the number of components used for each of the four blocks. Horizontal location is given by the total number of components used across all blocks, while vertical location indicates validated explained variance in percentage.
```{r}
# Load Wine data and model with SO-PLS
data(wine)
ncomp <- unlist(lapply(wine, ncol))[-5]
so.wine <- sopls(`Global quality` ~ ., data=wine, ncomp=ncomp,
max_comps=6, validation="CV", segments=10)
maage(so.wine)
```
A sequential Måge plot can be used for a sequential search for
the optimal model.
```{r}
# Sequential search for optimal number of components per block
old.par <- par(mfrow=c(2,2), mar=c(3,3,0.5,1), mgp=c(2,0.7,0))
maageSeq(so.wine)
maageSeq(so.wine, 2)
maageSeq(so.wine, c(2,1))
maageSeq(so.wine, c(2,1,1))
par(old.par)
```
## Loadings
One set of loadings is printed and two sets are plotted to show how to select specific components from specific blocks. When extracting or plotting loadings for the second or later blocks, one must specify how many components have been used in the previous block(s) (_ncomp_) as this will affect the choice of loadings. In addition one can specify which components in the current block should be extracted (_comps_).
```{r}
# Display loadings for first block
loadings(so.pot, block = 1)
```
```{r}
# Plot loadings from block 1 and 2
old.par <- par(mfrow=c(1,2))
loadingplot(so.pot, comps = c(2,3), block = 1, main = "Block 1", labels = "names", cex = 0.8)
loadingplot(so.pot, ncomp = 4, block = 2, main = "Block 2", labels = "names", cex = 0.8)
par(old.par)
```
## Scores
One set of scores is printed and two sets are plotted to show how to select specific components from specific blocks. Specification of component use in preceding blocks follows the same pattern as with loadings.
```{r}
# Display scores for first block
scores(so.pot, block = 1)
```
```{r}
# Plot scores from block 1 and 2
old.par <- par(mfrow=c(1,2))
scoreplot(so.pot, comps = c(2,3), block = 1, main = "Block 1", labels = "names")
scoreplot(so.pot, ncomp = 4, block = 2, main = "Block 2", labels = "names")
par(old.par)
```
## Prediction
A three block model is fitted using a single response, 5 components and a subset of the data. The remaining data are used as test set for prediction.
```{r}
# Modify data to contain a single response
potato1 <- potato; potato1$Sensory <- potato1$Sensory[,1]
# Model 20 first objects with SO-PLS
so.pot20 <- sopls(Sensory ~ ., data = potato1[c(1:3,9)], ncomp = 5, subset = 1:20)
# Predict remaining objects
testset <- potato1[-(1:20),]; # testset$Sensory <- NULL
predict(so.pot20, testset, comps=c(2,1,2))
```
## Validation
Compute validation statistics; explained variance - R$^2$, and Root Mean Squared Error - RMSE(P/CV).
```{r}
# Cross-validation
R2(so.pot, ncomp = c(5,5))
R2(so.pot, ncomp = c(5,5), individual = TRUE)
# Training
R2(so.pot, 'train', ncomp = c(5,5))
# Test data
R2(so.pot20, newdata = testset, ncomp = c(2,1,2))
```
```{r}
# Cross-validation
RMSEP(so.pot, ncomp = c(5,5))
RMSEP(so.pot, ncomp = c(5,5), individual = TRUE)
# Training
RMSEP(so.pot, 'train', ncomp = c(5,5))
# Test data
RMSEP(so.pot20, newdata = testset, ncomp = c(2,1,2))
```
## Principal Components of Predictions
A PCA is computed from the cross-validated predictions to get an overview of the SO-PLS model across all involved blocks. The blocks are projected onto the scores to form block-loadings to see how these relate to the solution.
```{r}
# PCP from so.pot object
PCP <- pcp(so.pot, c(3,2))
summary(PCP)
scoreplot(PCP)
corrplot(PCP)
```
## CVANOVA
A CVANOVA model compares absolute or squared cross-validated residuals from two or more prediction models using ANOVA with _Model_ and _Object_ as effects. Tukey's pair-wise testing is automatically computed in this implementation.
```{r}
# CVANOVA
so.pot1 <- sopls(Sensory[,1] ~ Chemical + Compression + NIRraw, data=potato,
ncomp=c(10,10,10), max_comps=10, validation="CV", segments=10)
cva <- cvanova(so.pot1, "2,1,2")
summary(cva)
old.par <- par(mar = c(4,6,4,2))
plot(cva)
par(old.par)
```
# Parallel and Orthgonalised Partial Least Squares - PO-PLS
PO-PLS is presented briefly using the _potato_ data. It is a method for separating predictive information into common, local and distinct parts.
## Modelling
There are many choices
with regard to numbers of components and possible local and common components.
Using automatic selection, the user selects the highest number of blocks to
combine into local/common components, minimum explained variance and minimum
squared correlation to the response. Manual selection can be done by setting
the number of initial components from the blocks and maximum number of
local/common components.
```{r}
# Automatic analysis
pot.po.auto <- popls(potato[1:3], potato[['Sensory']][,1], commons = 2)
# Explained variance
pot.po.auto$explVar
```
```{r}
# Manual choice of up to 5 components for each block and 1, 0, and 2 blocks,
# respectively from the (1,2), (1,3) and (2,3) combinations of blocks.
pot.po.man <- popls(potato[1:3], potato[['Sensory']][,1], commons = 2,
auto=FALSE, manual.par = list(ncomp=c(5,5,5),
ncommon=c(1,0,2)))
# Explained variance
pot.po.man$explVar
```
## Scores and loadings
Scores and loadings are stored per block. Common scores/loadings are found
in each of the blocks' list of components.
```{r}
# Score plot for local (2,3) components
scoreplot(pot.po.man, block = 3, labels = "names")
# Corresponding loadings
loadingplot(pot.po.man, block = 3, labels="names", scatter = FALSE)
```
# Response Oriented Sequential Alternation - ROSA
The following example uses the _potato_ data to showcase some of the functions available for ROSA analyses.
## Modelling
A multi-response two-block ROSA model with up to 10 components in total is cross-validated with 10 random segments.
```{r}
# Model all eight potato blocks with ROSA
ros.pot <- rosa(Sensory ~ ., data = potato1, ncomp = 10, validation = "CV", segments = 5)
print(ros.pot)
summary(ros.pot)
```
## Loadings
Extract loadings (not used further) and plot two first vectors of loadings.
```{r}
loads <- loadings(ros.pot)
loadingplot(ros.pot, comps = 1:2, scatter = FALSE)
```
## Scores
Extract scores (not used further) and plot two first vectors of scores.
```{r}
sco <- scores(ros.pot)
scoreplot(ros.pot, comps = 1:2, labels = "names")
```
## Prediction
A three block model is fitted using a single response, 5 components and a subset of the data. The remaining data are used as test set for prediction.
```{r}
# Model 20 first objects of three potato blocks
rosT <- rosa(Sensory ~ ., data = potato1[c(1:3,9)], ncomp = 5, subset = 1:20)
testset <- potato1[-(1:20),]; # testset$Sensory <- NULL
predict(rosT, testset, comps=2)
```
## Validation
Compute validation statistics; explained variance - R$^2$ and Root Mean Squared Error - RMSE(P/CV).
```{r}
# Cross-validation
R2(ros.pot)
# Training
R2(ros.pot, 'train')
# Test data
R2(rosT, 'test', newdata = testset)
```
```{r}
# Cross-validation
RMSEP(ros.pot)
# Training
RMSEP(ros.pot, 'train')
# Test data
RMSEP(rosT, newdata = testset)
```
## Image plots
These are plots for evaluation of the block selection process in ROSA. Correlation plots show how the different candidate scores (one candidate for each block for each component) correlate to the winning block's scores. Residual response plots show how different choices of candidate scores would affect the RMSE of the residual response. One can for instance use these plots to decide on a different block selection order than the one proposed automatically by ROSA.
```{r}
# Correlation to winning scores
image(ros.pot)
# Residual response given candidate scores
image(ros.pot, "residual")
```
# Multiblock Redundancy Analysis - mbRDA
The following example uses the _potato_ data to showcase some of the functions available for mbRDA analyses.
## Modelling
This implementation uses a wrapper for the _mbpcaiv_ function in the _ade4_ package to perform mbRDA. A multi-response 5 component model is fitted.
```{r}
# Convert data.frame with AsIs objects to list of matrices
potatoList <- lapply(potato, unclass)
# Perform mbRDA with two blocks explaining sensory attributes
mbr <- mbrda(X = potatoList[c('Chemical','Compression')], Y = potatoList[['Sensory']], ncomp = 5)
print(mbr)
```
## Loadings and scores
The _mbpcaiv_ wrapper extracts key elements for inspection using the same format as the rest of this package. The full fitted _mbpcaiv_ object is also available, e.g. through _mbr$mbpcaivObject_.
```{r}
# Extract and view loadings
lo_mbr <- loadings(mbr)
print(head(lo_mbr))
# Plot scores
scoreplot(mbr, labels = "names")
```