In this section we attempt to show how the stride data in Baghdadi et al. (2018) and Baghdadi et al. (2019) can be used to do unconditional forecasting of Rated Perceived Exertion (RPE) using univariate and multivariate time series methods. RPE forecasting is an important concern because it can help ergonomists with fatigue-related ergonomic interventions. We have extracted the stride data for their 15 participants using the following GitHub repository. The reader should notice that the data was preprocessed by normalizing the length of the time series by percentile and then denoising by median filter. Since these are not a main goal of our research, we do not include the preprocessing part here, and we directly use the preprocessed data in Baghdadi et al. (2018) and Baghdadi et al. (2019). Alternatively, the interested reader is adviced to see the supplementary Matlab files in their GitHub Repository in conjunction with this work.
The code snippet below presents the R packages used in our analysis. Additionally, the reader should note that we have used R version 3.6.2 in our analysis. The analysis is performed on a 64-bit Windows Operating System, with Intel(R) Xeron(R) W-2145 CPU @ 3.70 GHz 3.70 GHz and 64.00 GB RAM.
rm(list = ls()) # clear global environment
graphics.off() # close all graphics
#p_load is equivalent to combining both install.packages() and library()
pacman::p_load(dplyr,forecast, knitr, R.matlab, tsDyn, ggplot2)
The stride data (stride length, stride height and stride duration) and RPE data are extracted from the mat files in Baghdadi et al. (2018) and Baghdadi et al. (2019). The extracted data are multivariate time series with four variables (stride length, stride height, stride duration and RPE), and the length of the time series for each participant is 99 (from 1 percentile to 99 percentile). Different participants have the same time length, because walking time is rescaled from nominal time spent during walking to 0-100% scale (see Baghdadi et al. (2019) for more details). Moreover, notice that RPE is collected in a lower frequency compared to stride variables (6th,12th,…,96th percentiles) and its general changing pattern is nondecreasing. We use low-frequency observations which leaves us with 16 observations per participant. Please see the comments within the code chunk for more details.
N_sub = 15 #number of subjects
N_var = 4 #number of variables
lag_order = 1
arima_dif = 1
T_start = 6 #time of first low-frequency observation
T_inc = 6 #time between two low-frequency observations
T_end = 96 #time of last low-frequency observation
LF_Range = 1:(T_end/T_inc) #time range for low-frequency data (1,2,...,16)
LF_T_start = 1 #time of first observation in low-frequency time domain
LF_F_start = 8 #the time at which the forecasting procedure starts (again in low-frequency time domain)
LF_T_end = T_end/T_inc #time of last observation in low-frequency time domain
Sub_range = 1:N_sub #range of subjects
n_ahead_range = 1:6 #range of k in k-ahead forecasting
comb = expand.grid(Sub_range,n_ahead_range) #different combinations of subject number and k (in k-ahead forecasting)
column_names = sprintf("Subject%s/%s_ahead",comb[,1],comb[,2])
AR_Results = ARIMA_Results = VAR_Results = VECM_Results = Naive_Results = matrix(data=NA,nrow=LF_T_end,ncol=dim(comb)[1])
sub_names = sprintf("Participant %s",Sub_range)
JoTest_res = matrix(data=NA,N_sub,2)
rownames(JoTest_res) = sub_names
colnames(JoTest_res) = c("Cointegration Rank","p-value")
HF_Subject_Data <- sprintf("../Case 1 Data/Data_%s.mat",seq(1:N_sub))%>% #High-frequency subject data
lapply(readMat) %>%
lapply(unlist) %>%
lapply(matrix, ncol = N_var, byrow = FALSE,dimnames = NULL) %>%
lapply(data.frame)
Subject_Data <- HF_Subject_Data%>% #low-frequency subject data
lapply(slice, seq(T_start,T_end,by=T_inc))
sInfo = sessionInfo()
sInfo
## R version 3.6.3 (2020-02-29)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 10 x64 (build 18362)
##
## Matrix products: default
##
## locale:
## [1] LC_COLLATE=English_United States.1252
## [2] LC_CTYPE=English_United States.1252
## [3] LC_MONETARY=English_United States.1252
## [4] LC_NUMERIC=C
## [5] LC_TIME=English_United States.1252
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] ggplot2_3.3.0 tsDyn_10-1.2 R.matlab_3.6.2 knitr_1.28 forecast_8.12
## [6] dplyr_0.8.5
##
## loaded via a namespace (and not attached):
## [1] deSolve_1.28 zoo_1.8-7 tidyselect_1.0.0
## [4] xfun_0.13 purrr_0.3.4 urca_1.3-0
## [7] splines_3.6.3 lattice_0.20-41 vars_1.5-3
## [10] colorspace_1.4-1 vctrs_0.2.4 htmltools_0.4.0
## [13] mgcv_1.8-31 yaml_2.2.1 rlang_0.4.5
## [16] R.oo_1.23.0 pillar_1.4.3 withr_2.2.0
## [19] glue_1.4.0 R.utils_2.9.2 TTR_0.23-6
## [22] foreach_1.5.0 lifecycle_0.2.0 quantmod_0.4.17
## [25] stringr_1.4.0 timeDate_3043.102 munsell_0.5.0
## [28] tseriesChaos_0.1-13.1 gtable_0.3.0 R.methodsS3_1.8.0
## [31] codetools_0.2-16 evaluate_0.14 tseries_0.10-47
## [34] strucchange_1.5-2 lmtest_0.9-37 parallel_3.6.3
## [37] curl_4.3 xts_0.12-0 Rcpp_1.0.4.6
## [40] scales_1.1.0 fracdiff_1.5-1 mnormt_1.5-6
## [43] digest_0.6.25 stringi_1.4.6 grid_3.6.3
## [46] quadprog_1.5-8 tools_3.6.3 sandwich_2.5-1
## [49] magrittr_1.5 tibble_3.0.1 pacman_0.5.1
## [52] crayon_1.3.4 pkgconfig_2.0.3 ellipsis_0.3.0
## [55] MASS_7.3-51.5 Matrix_1.2-18 iterators_1.0.12
## [58] assertthat_0.2.1 rmarkdown_2.1 R6_2.4.1
## [61] nnet_7.3-13 nlme_3.1-147 compiler_3.6.3
In this section, stride data (high-frequency) and RPE data (low-frequency) are illustrated.
for (num_subject in Sub_range)
{
Raw_Data<-ts(data = HF_Subject_Data[[num_subject]], start = 1, end = dim(HF_Subject_Data[[num_subject]])[1], frequency = 1, names =c("Stride Length (ft.)","Stride Height (ft.)","Stride Duration (s)","RPE") )
cat(paste0("### Participant ",num_subject, " {-} \n"))
Plot = autoplot(Raw_Data,facets = T)+geom_point(shape=1)+theme_bw()+ ggtitle(paste0("Participant ",toString(num_subject)))+ theme(plot.title = element_text(hjust = 0.5))
print(Plot)
cat("\n \n")
}
In this section we do rolling forecasting for RPE. Suppose that we want to do a forecast at time \(t\) (in the code we call this current time). If we do k-step ahead forecasting, we will get the forecast of RPE at time \(t+k\). Implementing rolling forecast starts from \(t=8\) and we analyze different values of \(k\) \((k=1,..,6)\). In unconditional forecasting, the stride and RPE data are known for times \({1,2,..,t}\) and we want to forecast RPE at time \(t+k\). Autoregressive (AR) model, autoregressive integrated moving average (ARIMA), vector autoregression (VAR) model and vector error correction model (VECM) are used for forecasting. Since the RPE scale is 6-20, in cases that an RPE higher than 20 is forecasted we change it to 20. In addition, due to limited number of observations (16 observations), obtaining a VAR model with a lag order higher than 1 is not possible. Therefore, to be consistent,we use AR(1), ARIMA(1,1,0), VAR(1) and Vector Error Correction Model without lagged differences.
The equation for AR(1) model is as follows: \[y_t=\mu+\phi y_{t-1} +\epsilon_t,\] where \(y_t\) represents a univariate time series (in our case it is RPE at time \(t\)), \(\mu\) is an intercept, \(\phi\) is a scalar coefficient, and \(\epsilon_t\) is a white noise (\(\epsilon_t\) is assumed to be uncorrelated random variables in which \(E(\epsilon_t)=0\) and \(Var(\epsilon_t)=\sigma^2\)) and forecasting is done in a recursive procedure: \[\text{F}_t(y_{t+1})=\hat{\mu}+\hat{\phi} y_t\] \[\text{F}_t(y_{t+2})=\hat{\mu}+\hat{\phi} \text{F}_t(y_{t+1})\] \[\vdots\] \[\text{F}_t(y_{t+k})=\hat{\mu}+\hat{\phi} \text{F}_t(y_{t+k-1}).\] There are 16 observations and starting from \(t=8\) RPE is continiously forecasted.
AR_Forecast <- function(num_subject,n_ahead) #this function uses AR(1) to do k-ahead forecasting for a subject
{AR_RPE <- matrix(data = NA,nrow = LF_T_end,ncol = 1)
for (cur_t in LF_F_start:(LF_T_end-n_ahead)){
tryCatch({
DataX <- Subject_Data[[num_subject]]
Data_in <- DataX[1:cur_t,]
mod_ar <- Arima(Data_in[,N_var],order = c(lag_order, 0, 0))
AR_RPE[cur_t+n_ahead,] <- matrix(data = predict(mod_ar,n.ahead = n_ahead)[["pred"]][n_ahead],ncol = 1, byrow = FALSE,dimnames = NULL)
}, error = function(e) {})}
return(AR_RPE)}
AR_Results = mapply(AR_Forecast,comb[,1],comb[,2])
AR_Results [AR_Results>20] <- 20
colnames(AR_Results) = column_names
We get RPEs forecasted by AR for \(t=9,10,...,16\).
Fitting an ARIMA(1,1,0) model is euqivalent to fitting an AR(1) model to the first difference of \(y_t\) (\(\Delta y_t = y_t - y_{t-1}\)). ARIMA(1,1,0) model is as follows: \[\Delta y_t=\phi \Delta y_{t-1}+\epsilon_t,\] then, the forecasting procedure should be done similar to AR.
ARIMA_Forecast <- function(num_subject,n_ahead)
{ARIMA_RPE <- matrix(data = NA,nrow = LF_T_end,ncol = 1)
for (cur_t in LF_F_start:(LF_T_end-n_ahead)){
tryCatch({
DataX <- Subject_Data[[num_subject]]
Data_in <- DataX[1:cur_t,]
mod_arima <- Arima(Data_in[,N_var],order = c(lag_order, arima_dif, 0))
ARIMA_RPE[cur_t+n_ahead,] <- forecast(mod_arima, h = n_ahead)[["mean"]][n_ahead]
}, error = function(e) {})}
return(ARIMA_RPE)}
ARIMA_Results = mapply(ARIMA_Forecast,comb[,1],comb[,2])
ARIMA_Results [ARIMA_Results>20] <- 20
colnames(ARIMA_Results) = column_names
Now, we have RPEs forecasted by ARIMA for \(t=9,10,...,16\).
The equation for VAR model of order 1 is as follows: \[\mathbf{y}_t=\mathbf{\nu}+\mathbf{A} \mathbf{y}_{t-1}+\mathbf{u}_t,\]
Where \(\mathbf{y}_t=(y_{1t},...,y_{Nt})^t\) is a \(N\times 1\) random vector, the \(\mathbf{A}\) is an \(N\times N\) coefficient matrix and \(\mathbf{u}_t\) is a N-dimensional white noise. VAR(1) euqation can be used recursively to determine the k-step ahead forecasting (see Lütkepohl (2005)): \[F_t(\mathbf{y}_{t+1})=\mathbf{\hat{\nu}}+\mathbf{\hat{A}} \mathbf{y}_t\] \[F_t(\mathbf{y}_{t+2})=\mathbf{\hat{\nu}}+\mathbf{\hat{A}} F_t(\mathbf{y}_{t+1})\] \[\vdots\] \[F_t(\mathbf{y}_{t+k})=\mathbf{\hat{\nu}}+\mathbf{\hat{A}} F_t(\mathbf{y}_{t+k-1})\] using these recursions the k-step ahead forecasting can be performed.
In our case \(N=4\) (there are four variables). As mentioned earlier, there are 16 observations \((\mathbf{y}_t \:, \: t=1,...,16)\), and starting from \(t=8\), we continiously do forecasting for time horizon of \(t=9,10,...,16\). tsDyn package (see Di Narzo, Aznarte, and Stigler (2015)) is used to train VAR models.
VAR_Forecast <- function(num_subject,n_ahead) #this function uses VAR(1) to do k-ahead forecasting for a subject
{ VAR_RPE <- matrix(data = NA,nrow = LF_T_end,ncol = 1)
for (cur_t in LF_F_start:(LF_T_end-n_ahead)){
tryCatch({
DataX <- Subject_Data[[num_subject]]
Data_in <- DataX[1:cur_t,] #Data_in is used to extract model
mod_var <- lineVar(Data_in, lag=lag_order) #extracting VAR(1) model
VAR_RPE[cur_t+n_ahead,]<- predict(mod_var,n.ahead = n_ahead)[n_ahead,N_var] #rolling forecasting
}, error = function(e) {})}
return(VAR_RPE)
}
VAR_Results = mapply(VAR_Forecast,comb[,1],comb[,2])
VAR_Results [VAR_Results>20] <- 20
colnames(VAR_Results) = column_names
Now, we have RPEs forecasted by VAR for \(t=9,10,...,16\).
Vector Error Correction Model (VECM) without lagged differences is as follows: \[\Delta \mathbf{y}_t=\boldsymbol{\nu}+\boldsymbol{\Pi} \mathbf{y}_{t-1}+\mathbf{u}_t,\] Variables are defined as those in VAR model, \(\boldsymbol{\nu}\) is an intercept and \(\boldsymbol{\Pi}\) is a coefficient matrix. Continious forecasting is done in a recursive procedure as mentioned in the VAR model. tsDyn package (see Di Narzo, Aznarte, and Stigler (2015)) is used to train VECM. Before doing forecasting, we compute the cointegration rank of each participant by using Johansen Test.
for(num_subject in Sub_range)
{
mod_vecm <- VECM(Subject_Data[[num_subject]], lag = lag_order-1, estim="ML")
Rank <- rank.test(mod_vecm, type="trace")
JoTest_res[num_subject,1] = Rank[["r"]]
JoTest_res[num_subject,2] = Rank[["pval"]]
name <- paste("Jotest_Sub", num_subject, sep = "_")
assign(name, Rank)
}
print(JoTest_res)
## Cointegration Rank p-value
## Participant 1 1 0.665051061
## Participant 2 3 0.306670807
## Participant 3 0 0.163782632
## Participant 4 1 0.084062062
## Participant 5 1 0.236655798
## Participant 6 0 0.643731336
## Participant 7 0 0.141302131
## Participant 8 1 0.098627851
## Participant 9 1 0.234347783
## Participant 10 1 0.079146866
## Participant 11 3 0.545322380
## Participant 12 0 0.166863864
## Participant 13 2 0.080673884
## Participant 14 0 0.051590489
## Participant 15 4 0.008643088
VECM_Forecast <- function(num_subject,n_ahead) #this function uses VECM to do k-ahead forecasting for a subject
{VECM_RPE <- matrix(data = NA,nrow=LF_T_end,ncol = 1)
for (cur_t in LF_F_start:(LF_T_end-n_ahead)){
tryCatch({
DataX <- Subject_Data[[num_subject]]
Data_in <- DataX[1:cur_t,]
mod_vecm <- VECM(Data_in, lag = lag_order-1)
VECM_RPE[cur_t+n_ahead,] <- predict(mod_vecm,n.ahead = n_ahead)[n_ahead,N_var]
}, error = function(e) {})}
return(VECM_RPE)
}
VECM_Results = mapply(VECM_Forecast,comb[,1],comb[,2])
VECM_Results [VECM_Results>20] <- 20
colnames(VECM_Results) = column_names
We get RPEs forecasted by VECM for \(t=9,10,...,16\).
Naive_Forecast <- function(num_subject,n_ahead)
{Naive_RPE <- matrix(data = NA,nrow = LF_T_end,ncol = 1)
for (cur_t in LF_F_start:(LF_T_end-n_ahead)){
tryCatch({
DataX <- Subject_Data[[num_subject]]
Data_in <- DataX[1:cur_t,]
Naive_RPE[cur_t+n_ahead,] = Data_in[cur_t,N_var]
}, error = function(e) {})}
return(Naive_RPE)}
Naive_Results = mapply(Naive_Forecast,comb[,1],comb[,2])
colnames(Naive_Results) = column_names
In this section, we plot the forecasts that we got using different methods in conjunction with true RPEs. These plots are presented for different participants and \(k\) values.
for (num_subject in Sub_range)
{
cat(paste0("### Participant ",num_subject, " {-} \n"))
F_Data = matrix(data = NA,nrow = LF_T_end*6*length(n_ahead_range),ncol = 4)
colnames(F_Data) = c("RPE","Time","k","Method")
AR = AR_Results[,seq(num_subject,length(n_ahead_range)*N_sub,by=N_sub)]
VAR = VAR_Results[,seq(num_subject,length(n_ahead_range)*N_sub,by=N_sub)]
VECM = VECM_Results[,seq(num_subject,length(n_ahead_range)*N_sub,by=N_sub)]
ARIMA = ARIMA_Results[,seq(num_subject,length(n_ahead_range)*N_sub,by=N_sub)]
Naive = Naive_Results[,seq(num_subject,length(n_ahead_range)*N_sub,by=N_sub)]
True = matrix(rep(Subject_Data[[num_subject]][,4],each=length(n_ahead_range)), ncol=length(n_ahead_range), byrow=TRUE)
F_Data[,1] = c(as.vector(AR),as.vector(VAR),as.vector(VECM),as.vector(ARIMA),as.vector(Naive),as.vector(True))
F_Data[,2] = rep(1:LF_T_end,(dim(F_Data)[1])/LF_T_end)
F_Data[,3] = rep(c(rep("1-step ahead",LF_T_end),rep("2-step ahead",LF_T_end),rep("3-step ahead",LF_T_end),rep("4-step ahead",LF_T_end),rep("5-step ahead",LF_T_end),rep("6-step ahead",LF_T_end)),6)
F_Data[,4] = c(rep("AR",length(n_ahead_range)*LF_T_end),rep("VAR",length(n_ahead_range)*LF_T_end),rep("VECM",length(n_ahead_range)*LF_T_end),rep("ARIMA",length(n_ahead_range)*LF_T_end),rep("Naive",length(n_ahead_range)*LF_T_end),rep("True RPE",length(n_ahead_range)*LF_T_end))
F_Data2 = as.data.frame(F_Data,colnames=T)
F_Data2$RPE = as.numeric(as.character(F_Data2$RPE))
F_Data2$Time = as.numeric(as.character(F_Data2$Time))
p = ggplot(data = F_Data2, mapping = aes(x = Time,y = RPE, color = Method))+scale_colour_manual(values = c("purple", "blue","orange", "darkgrey","darkorange4","red"))+geom_line(aes(linetype = Method),size = 0.5)+scale_linetype_manual(values=c("twodash","longdash","dotted", "solid","dashed","dotdash"))+facet_wrap(vars(k))+ylim(6,20)+geom_point(aes(shape = Method),size = 1.5)+scale_shape_manual(values=c(1,3,0,20,4,8))+theme_bw()+ ggtitle(paste0("Participant ",toString(num_subject)))+ theme(plot.title = element_text(hjust = 0.5),legend.position="top")
print(p)
cat("\n \n")
}
We compare the performance of different methods based on MAE box-plots and bar plots. We exclude Participant 2 and Participant 6 in the summary here, because in a few cases AR and VAR are not able to give prediction for these participants (one reason is that RPE of Participant 6 does not change from \(t=1\) to \(t=8\) and the reason for excluding Participant 2 is that the autoregressive coefficients could not be reliably estimated due to nonstationarity).
Comb_Subject_Data = do.call(cbind, Subject_Data)
True_RPE = Comb_Subject_Data [,seq(N_var,N_var*N_sub,N_var)]
colnames(True_RPE) = sprintf("Subject %s",1:N_sub)
Rep_True_RPE = do.call(cbind, replicate(length(n_ahead_range), True_RPE, simplify=FALSE))
AR_abs_error = abs(Rep_True_RPE-AR_Results)
AR_MAE = matrix(data = colMeans(AR_abs_error,na.rm=TRUE),nrow = length(n_ahead_range),byrow = T)
VAR_abs_error = abs(Rep_True_RPE-VAR_Results)
VAR_MAE = matrix(data = colMeans(VAR_abs_error,na.rm=TRUE),nrow = length(n_ahead_range),byrow = T)
VECM_abs_error = abs(Rep_True_RPE-VECM_Results)
VECM_MAE = matrix(data = colMeans(VECM_abs_error,na.rm=TRUE),nrow = length(n_ahead_range),byrow = T)
ARIMA_abs_error = abs(Rep_True_RPE-ARIMA_Results)
ARIMA_MAE = matrix(data = colMeans(ARIMA_abs_error,na.rm=TRUE),nrow = length(n_ahead_range),byrow = T)
Naive_abs_error = abs(Rep_True_RPE-Naive_Results)
Naive_MAE = matrix(data = colMeans(Naive_abs_error,na.rm=TRUE),nrow = length(n_ahead_range),byrow = T)
AR_MAE[,2] = VAR_MAE[,2] = VECM_MAE[,2] = ARIMA_MAE[,2] = Naive_MAE[,2] = NA #exclude subject 2 from results
AR_MAE[,6] = VAR_MAE[,6] = VECM_MAE[,6] = ARIMA_MAE[,6] = Naive_MAE[,6] = NA #exclude subject 6 from results
rownames(AR_MAE) = rownames(VAR_MAE) = rownames(VECM_MAE) = rownames(ARIMA_MAE) = sprintf("%s_ahead",n_ahead_range)
colnames(AR_MAE) = colnames(VAR_MAE) = colnames(VECM_MAE) = colnames(ARIMA_MAE) = sprintf("Subject %s",Sub_range)
for (n_ahead in n_ahead_range) {
cat(paste0("#### ",n_ahead,"-ahead", " {-} \n"))
MAE <- c(AR_MAE[n_ahead,],VAR_MAE[n_ahead,],VECM_MAE[n_ahead,],ARIMA_MAE[n_ahead,],Naive_MAE[n_ahead,])
Method <- c(rep("AR",N_sub),rep("VAR",N_sub),rep("VECM",N_sub),rep("ARIMA",N_sub),rep("Naive",N_sub))
df = data.frame(MAE,Method)
p <- ggplot(df, aes(x = Method, y = MAE))+geom_boxplot()+ggtitle(paste0(toString(n_ahead),"-step ahead"))+ theme(plot.title = element_text(hjust = 0.5))+theme_bw()+ theme(plot.title = element_text(hjust = 0.5))+ylim(0,5)
print(p)
cat("\n \n")
}
final_sub_range = c(1,3,4,5,7,8,9,10,11,12,13,14,15)
for (num_subject in final_sub_range) {
cat(paste0("#### Participant ",num_subject, " {-} \n"))
df = data.frame(MAE = c(AR_MAE[,num_subject],VAR_MAE[,num_subject],VECM_MAE[,num_subject],ARIMA_MAE[,num_subject],Naive_MAE[,num_subject]),k_step_ahead = rep(sprintf("%s_step ahead",n_ahead_range),5),Method = c(rep("AR",length(n_ahead_range)),rep("VAR",length(n_ahead_range)),rep("VECM",length(n_ahead_range)),rep("ARIMA",length(n_ahead_range)),rep("Naive",length(n_ahead_range))))
p = ggplot(df, aes(Method, MAE)) + geom_col() + facet_wrap(~k_step_ahead,scales = "free")+theme_bw()+ ggtitle(paste0("Participant ",toString(num_subject)))+ theme(plot.title = element_text(hjust = 0.5))
print(p)
cat("\n \n")
}
In this section, we get the median MAE for different participants.
AR_Overall = apply(AR_MAE[,c(-2,-6)],1,median)
ARIMA_Overall = apply(ARIMA_MAE[,c(-2,-6)],1,median)
VAR_Overall = apply(VAR_MAE[,c(-2,-6)],1,median)
VECM_Overall = apply(VECM_MAE[,c(-2,-6)],1,median)
Naive_Overall = apply(Naive_MAE[,c(-2,-6)],1,median)
Overall_MAE = rbind(AR_Overall,ARIMA_Overall,Naive_Overall,VAR_Overall,VECM_Overall)
print(Overall_MAE)
## 1_ahead 2_ahead 3_ahead 4_ahead 5_ahead 6_ahead
## AR_Overall 0.4946520 0.8359781 1.2198418 1.4962388 1.831497 2.274216
## ARIMA_Overall 0.4619919 0.6017053 0.8333333 1.1363935 1.276622 1.666667
## Naive_Overall 0.3750000 0.5714286 0.8333333 1.0000000 1.250000 1.666667
## VAR_Overall 0.5880567 0.7635780 0.9253968 0.8900874 1.198960 1.648872
## VECM_Overall 0.4841007 0.5852558 0.7269426 0.9578534 1.063142 1.238339
This code-chunk saves the Johansent Test, MAE and forecast results.
save(AR_MAE,VAR_MAE,VECM_MAE,ARIMA_MAE,Naive_MAE,file = "../Case 1 Results/MAE Results.Rdata")
save(Overall_MAE,file = "../Case 1 Results/Overall MAE.Rdata")
save(AR_Results,VAR_Results,VECM_Results,ARIMA_Results,file = "../Case 1 Results/Forecast Results.Rdata")
save(Jotest_Sub_1,Jotest_Sub_2,Jotest_Sub_3,Jotest_Sub_4,Jotest_Sub_5,Jotest_Sub_6,Jotest_Sub_7,Jotest_Sub_8,Jotest_Sub_9,Jotest_Sub_10,Jotest_Sub_11,Jotest_Sub_12,Jotest_Sub_13,Jotest_Sub_14,Jotest_Sub_15,file = "../Case 1 Results/Jotest_Results.Rdata")
rm(list = ls())
In this section, in order to address the broader issue of generalizability of the models for other datasets and applications, we have included the implementation of the method on the dataset collected as described in Karvekar (2019). The means of inducing fatigue was different for this new study, however the studies were similar in the use of a single leg-mounted sensor to capture gait parameters and the RPE to capture subjective fatigue. For Lab Study 2, there were 24 participants (larger than the 15 in the original dataset (Lab Study 1)), and the number of data points per participant ranged from 13 to 38 (median of 20.5, on average larger than the 16 for Lab Study 1). In Lab Study 2, the gait parameters used for modeling were mean acceleration, maximum acceleration, stride duration, and RPE (2 parameters that were the same as Lab Study 1).
In Lab Study 2, there was one RPE value reported for each period of squatting exercise. Therefore, we have chosen to extract the stride parameters values related to the last stride for each walking period.
N_sub = 24 #number of subjects
N_var = 4 #number of variables
Sub_range = 1:N_sub #range of subjects
lag_order = 1
n_ahead_range = 1:6 #range of k in k-ahead forecasting
arima_dif = 1
last_row_group = function(x) #this function gets the last row (features related to last stride) in each trial
{
temp = x%>%
group_by(X1) %>%
slice(rows = n()) %>%
ungroup()
temp = temp[complete.cases(temp),]
return(temp)
}
Total_Subject_Data <- sprintf("../Case 2 Data/ID%sGaitFeatures.mat",seq(1:N_sub))%>%
lapply(readMat) %>%
lapply(unlist) %>%
lapply(matrix, ncol = N_var+1, byrow = FALSE,dimnames = NULL) %>%
lapply(data.frame)
Subject_Data = Total_Subject_Data%>%
lapply(last_row_group)%>%
lapply(function(x) { x["X1"] <- NULL; x })%>%
lapply(function(x) { x = x[c(2,3,4,1)]; x })%>%
lapply(function(x) {colnames(x) = c("Mean Acc", "Max Acc", "Stride Duration", "RPE"); x})
In this section, stride data (mean acceleration, maximum acceleration and stride duration) and RPE data are illustrated.
for (num_subject in Sub_range)
{
Raw_Data<-ts(data = Subject_Data[[num_subject]], start = 1, end = dim(Subject_Data[[num_subject]])[1], frequency = 1, names =c("Mean Acceleation","Maximum Acceleation","Stride Duration","RPE") )
cat(paste0("### Participant ",num_subject, " {-} \n"))
Plot = autoplot(Raw_Data,facets = T)+geom_point(shape=1)+theme_bw()+ ggtitle(paste0("Participant ",toString(num_subject)))+ theme(plot.title = element_text(hjust = 0.5))
print(Plot)
cat("\n \n")
}
comb = expand.grid(Sub_range,n_ahead_range) #different combinations of subject number and k (in k-ahead forecasting)
AR_Results = ARIMA_Results = VAR_Results = VECM_Results = Naive_Results = matrix(data=NA,nrow=max(unlist(lapply(Subject_Data, function(x) {temp = dim(x)[1]; temp}))),ncol=dim(comb)[1])
column_names = sprintf("Subject%s/%s_ahead",comb[,1],comb[,2])
sub_names = sprintf("Participant %s",Sub_range)
JoTest_res = matrix(data=NA,N_sub,2)
rownames(JoTest_res) = sub_names
colnames(JoTest_res) = c("Cointegration Rank","p-value")
AR_Forecast <- function(num_subject,n_ahead) #this function uses AR(1) to do k-ahead forecasting for a subject
{AR_RPE <- matrix(data = NA,nrow = dim(Subject_Data[[num_subject]])[1],ncol = 1)
for (cur_t in ceiling(dim(Subject_Data[[num_subject]])[1]/2):(dim(Subject_Data[[num_subject]])[1]-n_ahead)){
tryCatch({
DataX <- Subject_Data[[num_subject]]
Data_in <- DataX[1:cur_t,]
mod_ar <- arima(Data_in[,N_var],order = c(lag_order, 0, 0))
AR_RPE[cur_t+n_ahead,] = predict(mod_ar,n.ahead = n_ahead)[["pred"]][n_ahead]
}, error = function(e) {})}
return(AR_RPE)}
AR_Results_Temp = mapply(AR_Forecast,comb[,1],comb[,2])
for (i in 1:dim(AR_Results)[2])
{
AR_Results[1:dim(AR_Results_Temp[[i]])[1],i] = AR_Results_Temp[[i]]
}
AR_Results [AR_Results>20] <- 20
colnames(AR_Results) = column_names
ARIMA_Forecast <- function(num_subject,n_ahead) #this function uses ARIMA(1,1,0) to do k-ahead forecasting for a subject
{ARIMA_RPE <- matrix(data = NA,nrow = dim(Subject_Data[[num_subject]])[1],ncol = 1)
for (cur_t in ceiling(dim(Subject_Data[[num_subject]])[1]/2):(dim(Subject_Data[[num_subject]])[1]-n_ahead)){
tryCatch({
DataX <- Subject_Data[[num_subject]]
Data_in <- DataX[1:cur_t,]
mod_arima <- arima(Data_in[,N_var],order = c(lag_order, arima_dif, 0))
ARIMA_RPE[cur_t+n_ahead,] = predict(mod_arima,n.ahead = n_ahead)[["pred"]][n_ahead]
}, error = function(e) {})}
return(ARIMA_RPE)}
ARIMA_Results_Temp = mapply(ARIMA_Forecast,comb[,1],comb[,2])
for (i in 1:dim(ARIMA_Results)[2])
{
ARIMA_Results[1:dim(ARIMA_Results_Temp[[i]])[1],i] = ARIMA_Results_Temp[[i]]
}
ARIMA_Results [ARIMA_Results>20] <- 20
colnames(ARIMA_Results) = column_names
VAR_Forecast <- function(num_subject,n_ahead) #this function uses VAR(1) to do k-ahead forecasting for a subject
{ VAR_RPE <- matrix(data = NA,nrow = dim(Subject_Data[[num_subject]])[1],ncol = 1)
for (cur_t in ceiling(dim(Subject_Data[[num_subject]])[1]/2):(dim(Subject_Data[[num_subject]])[1]-n_ahead)){
tryCatch({
DataX <- Subject_Data[[num_subject]]
Data_in <- DataX[1:cur_t,] #Data_in is used to extract model
mod_var <- lineVar(Data_in, lag=lag_order) #extracting VAR(1) model
VAR_RPE[cur_t+n_ahead,]<- predict(mod_var,n.ahead = n_ahead)[n_ahead,N_var] #rolling forecasting
}, error = function(e) {})}
return(VAR_RPE)
}
VAR_Results_Temp = mapply(VAR_Forecast,comb[,1],comb[,2])
for (i in 1:dim(VAR_Results)[2])
{
VAR_Results[1:dim(VAR_Results_Temp[[i]])[1],i] = VAR_Results_Temp[[i]]
}
VAR_Results [VAR_Results>20] <- 20
colnames(VAR_Results) = column_names
for(num_subject in Sub_range)
{
mod_vecm <- VECM(Subject_Data[[num_subject]], lag = lag_order-1, estim="ML")
Rank <- rank.test(mod_vecm, type="trace")
JoTest_res[num_subject,1] = Rank[["r"]]
JoTest_res[num_subject,2] = Rank[["pval"]]
name <- paste("Jotest_Sub", num_subject, sep = "_")
assign(name, Rank)
}
print(JoTest_res)
## Cointegration Rank p-value
## Participant 1 1 0.0905324136
## Participant 2 1 0.0574431029
## Participant 3 1 0.0729322267
## Participant 4 3 0.0778001914
## Participant 5 3 0.1761079773
## Participant 6 0 0.0987738753
## Participant 7 3 0.2290918119
## Participant 8 2 0.0656952595
## Participant 9 0 0.1410623653
## Participant 10 2 0.0726911525
## Participant 11 3 0.8665957557
## Participant 12 1 0.2837927971
## Participant 13 4 0.0072458175
## Participant 14 4 0.0407114292
## Participant 15 4 0.0032197348
## Participant 16 4 0.0078930766
## Participant 17 2 0.1312174201
## Participant 18 1 0.0697589130
## Participant 19 2 0.0816570884
## Participant 20 1 0.2404449069
## Participant 21 4 0.0099915981
## Participant 22 0 0.1780896462
## Participant 23 4 0.0042683360
## Participant 24 4 0.0003056992
VECM_Forecast <- function(num_subject,n_ahead) #this function uses VECM to do k-ahead forecasting for a subject
{VECM_RPE <- matrix(data = NA,nrow = dim(Subject_Data[[num_subject]])[1],ncol = 1)
for (cur_t in ceiling(dim(Subject_Data[[num_subject]])[1]/2):(dim(Subject_Data[[num_subject]])[1]-n_ahead)){
tryCatch({
DataX <- Subject_Data[[num_subject]]
Data_in <- DataX[1:cur_t,]
mod_vecm <- VECM(Data_in, lag = lag_order-1)
VECM_RPE[cur_t+n_ahead,] <- predict(mod_vecm,n.ahead = n_ahead)[n_ahead,N_var]
}, error = function(e) {})}
return(VECM_RPE)
}
VECM_Results_Temp = mapply(VECM_Forecast,comb[,1],comb[,2])
for (i in 1:dim(VECM_Results)[2])
{
VECM_Results[1:dim(VECM_Results_Temp[[i]])[1],i] = VECM_Results_Temp[[i]]
}
VECM_Results [VECM_Results>20] <- 20
colnames(VECM_Results) = column_names
Naive_Forecast <- function(num_subject,n_ahead)
{Naive_RPE <- matrix(data = NA,nrow = dim(Subject_Data[[num_subject]])[1],ncol = 1)
for (cur_t in ceiling(dim(Subject_Data[[num_subject]])[1]/2):(dim(Subject_Data[[num_subject]])[1]-n_ahead)){
DataX <- Subject_Data[[num_subject]]
Data_in <- as.matrix(DataX[1:cur_t,])
Naive_RPE[cur_t+n_ahead,] = Data_in[cur_t,N_var]
}
return(Naive_RPE)}
Naive_Results_Temp = mapply(Naive_Forecast,comb[,1],comb[,2])
for (i in 1:dim(Naive_Results)[2])
{
Naive_Results[1:dim(Naive_Results_Temp[[i]])[1],i] = Naive_Results_Temp[[i]]
}
colnames(Naive_Results) = column_names
In this section, we plot the forecasts that we got using different methods in conjunction with true RPEs. These plots are presented for different participants and \(k\) values.
for (num_subject in Sub_range)
{
cat(paste0("### Participant ",num_subject, " {-} \n"))
F_Data = matrix(data = NA,nrow = (dim(Subject_Data[[num_subject]])[1])*6*length(n_ahead_range),ncol = 4)
colnames(F_Data) = c("RPE","Time","k","Method")
AR = AR_Results[1:dim(Subject_Data[[num_subject]])[1],seq(num_subject,length(n_ahead_range)*N_sub,by=N_sub)]
VAR = VAR_Results[1:dim(Subject_Data[[num_subject]])[1],seq(num_subject,length(n_ahead_range)*N_sub,by=N_sub)]
VECM = VECM_Results[1:dim(Subject_Data[[num_subject]])[1],seq(num_subject,length(n_ahead_range)*N_sub,by=N_sub)]
ARIMA = ARIMA_Results[1:dim(Subject_Data[[num_subject]])[1],seq(num_subject,length(n_ahead_range)*N_sub,by=N_sub)]
Naive = Naive_Results[1:dim(Subject_Data[[num_subject]])[1],seq(num_subject,length(n_ahead_range)*N_sub,by=N_sub)]
True = matrix(unlist(rep(Subject_Data[[num_subject]][,4],each=length(n_ahead_range))), ncol=length(n_ahead_range), byrow=F)
#True = rbind(True, matrix(data = NA, ncol = dim(True)[2], nrow = dim(AR)[1]-dim(True)[1]))
F_Data[,1] = c(as.vector(AR),as.vector(VAR),as.vector(VECM),as.vector(ARIMA),as.vector(Naive),as.vector(True))
F_Data[,2] = rep(1:dim(AR)[1],(dim(F_Data)[1])/dim(AR)[1])
F_Data[,3] = rep(c(rep("1-step ahead",dim(AR)[1]),rep("2-step ahead",dim(AR)[1]),rep("3-step ahead",dim(AR)[1]),rep("4-step ahead",dim(AR)[1]),rep("5-step ahead",dim(AR)[1]),rep("6-step ahead",dim(AR)[1])),6)
F_Data[,4] = c(rep("AR",length(n_ahead_range)*dim(AR)[1]),rep("VAR",length(n_ahead_range)*dim(AR)[1]),rep("VECM",length(n_ahead_range)*dim(AR)[1]),rep("ARIMA",length(n_ahead_range)*dim(AR)[1]),rep("Naive",length(n_ahead_range)*dim(AR)[1]),rep("True RPE",length(n_ahead_range)*dim(AR)[1]))
F_Data2 = as.data.frame(F_Data,colnames=T)
F_Data2$RPE = as.numeric(as.character(F_Data2$RPE))
F_Data2$Time = as.numeric(as.character(F_Data2$Time))
p = ggplot(data = F_Data2, mapping = aes(x = Time,y = RPE, color = Method))+scale_colour_manual(values = c("purple", "blue","orange","darkgrey","darkorange4","red"))+geom_line(aes(linetype = Method),size = 0.5)+scale_linetype_manual(values=c("twodash","longdash","dotted", "solid","dashed","dotdash"))+facet_wrap(vars(k))+ylim(6,20)+geom_point(aes(shape = Method),size = 1.5)+scale_shape_manual(values=c(1,3,0,20,4,8))+theme_bw()+ ggtitle(paste0("Participant ",toString(num_subject)))+ theme(plot.title = element_text(hjust = 0.5),legend.position="top")
print(p)
cat("\n \n")
}
Subject_Data2 = vector(mode = "list", length = length(Subject_Data))
for (i in 1:length(Subject_Data))
{
Subject_Data2[[i]] = rbind(data.matrix(Subject_Data[[i]]), matrix(data = NA, ncol = dim(Subject_Data[[i]])[2], nrow = dim(AR_Results)[1]-dim(Subject_Data[[i]])[1]))
}
Comb_Subject_Data = do.call(cbind, Subject_Data2)
True_RPE = Comb_Subject_Data [,seq(N_var,N_var*N_sub,N_var)]
colnames(True_RPE) = sprintf("Subject %s",1:N_sub)
Rep_True_RPE = do.call(cbind, replicate(length(n_ahead_range), True_RPE, simplify=FALSE))
AR_abs_error = abs(Rep_True_RPE-AR_Results)
AR_MAE = matrix(data = colMeans(AR_abs_error,na.rm=TRUE),nrow = length(n_ahead_range),byrow = T)
VAR_abs_error = abs(Rep_True_RPE-VAR_Results)
VAR_MAE = matrix(data = colMeans(VAR_abs_error,na.rm=TRUE),nrow = length(n_ahead_range),byrow = T)
VECM_abs_error = abs(Rep_True_RPE-VECM_Results)
VECM_MAE = matrix(data = colMeans(VECM_abs_error,na.rm=TRUE),nrow = length(n_ahead_range),byrow = T)
ARIMA_abs_error = abs(Rep_True_RPE-ARIMA_Results)
ARIMA_MAE = matrix(data = colMeans(ARIMA_abs_error,na.rm=TRUE),nrow = length(n_ahead_range),byrow = T)
Naive_abs_error = abs(Rep_True_RPE-Naive_Results)
Naive_MAE = matrix(data = colMeans(Naive_abs_error,na.rm=TRUE),nrow = length(n_ahead_range),byrow = T)
rownames(AR_MAE) = rownames(VAR_MAE) = rownames(VECM_MAE) = rownames(ARIMA_MAE) = rownames(Naive_MAE) = sprintf("%s_ahead",n_ahead_range)
colnames(AR_MAE) = colnames(VAR_MAE) = colnames(VECM_MAE) = colnames(ARIMA_MAE) = colnames(Naive_MAE) = sprintf("Subject %s",Sub_range)
AR_MAE[,5] = VAR_MAE[,5] = VECM_MAE[,5] = ARIMA_MAE[,5] = Naive_MAE[,5] = NA #exclude subject 5 from results
for (n_ahead in n_ahead_range) {
cat(paste0("#### ",n_ahead,"-ahead", " {-} \n"))
MAE <- c(AR_MAE[n_ahead,],VAR_MAE[n_ahead,],VECM_MAE[n_ahead,],ARIMA_MAE[n_ahead,],Naive_MAE[n_ahead,])
Method <- c(rep("AR",N_sub),rep("VAR",N_sub),rep("VECM",N_sub),rep("ARIMA",N_sub),rep("Naive",N_sub))
df = data.frame(MAE,Method)
p <- ggplot(df, aes(x = Method, y = MAE))+geom_boxplot()+ggtitle(paste0(toString(n_ahead),"-step ahead"))+ theme(plot.title = element_text(hjust = 0.5))+theme_bw()+ theme(plot.title = element_text(hjust = 0.5))+ylim(0,7.5)
print(p)
cat("\n \n")
}
final_sub_range = Sub_range
for (num_subject in final_sub_range) {
cat(paste0("#### Participant ",num_subject, " {-} \n"))
df = data.frame(MAE = c(AR_MAE[,num_subject],VAR_MAE[,num_subject],VECM_MAE[,num_subject],ARIMA_MAE[,num_subject],Naive_MAE[,num_subject]),k_step_ahead = rep(sprintf("%s_step ahead",n_ahead_range),5),Method = c(rep("AR",length(n_ahead_range)),rep("VAR",length(n_ahead_range)),rep("VECM",length(n_ahead_range)),rep("ARIMA",length(n_ahead_range)),rep("Naive",length(n_ahead_range))))
p = ggplot(df, aes(Method, MAE)) + geom_col() + facet_wrap(~k_step_ahead,scales = "free")+theme_bw()+ ggtitle(paste0("Participant ",toString(num_subject)))+ theme(plot.title = element_text(hjust = 0.5))
print(p)
cat("\n \n")
}
In this section, we get the median MAE for different participants.
AR_Overall = apply(AR_MAE[,c(-5)],1,median)
ARIMA_Overall = apply(ARIMA_MAE[,c(-5)],1,median)
VAR_Overall = apply(VAR_MAE[,c(-5)],1,median)
VECM_Overall = apply(VECM_MAE[,c(-5)],1,median)
Naive_Overall = apply(Naive_MAE[,c(-5)],1,median)
Overall_MAE = rbind(AR_Overall,ARIMA_Overall,VAR_Overall,VECM_Overall,Naive_Overall)
print(Overall_MAE)
## 1_ahead 2_ahead 3_ahead 4_ahead 5_ahead 6_ahead
## AR_Overall 0.4722264 0.9880662 1.4750729 1.9470114 2.477296 2.966448
## ARIMA_Overall 0.3936752 0.6121799 0.9218704 1.3125396 1.553370 1.963500
## VAR_Overall 0.5813540 0.7603938 1.0291440 1.2084988 1.706998 1.909327
## VECM_Overall 0.4306467 0.5713606 0.7524496 0.9117993 1.135911 1.215929
## Naive_Overall 0.3888889 0.7500000 1.1428571 1.5636364 1.990000 2.400000
This code-chunk saves the Johansent Test, MAE and forecast results.
save(AR_MAE,VAR_MAE,VECM_MAE,ARIMA_MAE,Naive_MAE,file = "../Case 2 Results/MAE Results.Rdata")
save(Overall_MAE,file = "../Case 2 Results/Overall MAE.Rdata")
save(AR_Results,VAR_Results,VECM_Results,ARIMA_Results,Naive_Results,file = "../Case 2 Results/Forecast Results.Rdata")
save(Jotest_Sub_1,Jotest_Sub_2,Jotest_Sub_3,Jotest_Sub_4,Jotest_Sub_5,Jotest_Sub_6,Jotest_Sub_7,Jotest_Sub_8,Jotest_Sub_9,Jotest_Sub_10,Jotest_Sub_11,Jotest_Sub_12,Jotest_Sub_13,Jotest_Sub_14,Jotest_Sub_15,Jotest_Sub_16,Jotest_Sub_17,Jotest_Sub_18,Jotest_Sub_19,Jotest_Sub_20,Jotest_Sub_21,Jotest_Sub_22,Jotest_Sub_23,Jotest_Sub_24,file = "../Case 2 Results/Jotest_Results.Rdata")
rm(list = ls())
Baghdadi, Amir, Lora A Cavuoto, Allison Jones-Farmer, Steven E Rigdon, Ehsan T Esfahani, and Fadel M Megahed. 2019. “Monitoring Worker Fatigue Using Wearable Devices: A Case Study to Detect Changes in Gait Parameters.” Journal of Quality Technology.
Baghdadi, Amir, Fadel M Megahed, Ehsan T Esfahani, and Lora A Cavuoto. 2018. “A Machine Learning Approach to Detect Changes in Gait Parameters Following a Fatiguing Occupational Task.” Ergonomics 61 (8): 1116–29.
Di Narzo, Antonio Fabio, Jose Luis Aznarte, and Matthieu Stigler. 2015. “Package ’tsDyn’.”
Karvekar, Swapnali Babasaheb. 2019. “Smartphone-Based Human Fatigue Detection in an Industrial Environment Using Gait Analysis.”
Lütkepohl, Helmut. 2005. New Introduction to Multiple Time Series Analysis. Springer Science & Business Media.
Email: sahandha@buffalo.edu↩
Email: hongyues@buffalo.edu | Phone: +1-716-645-4715 | Website: University at Buffalo Official ↩
Email: megahefm@miamioh.edu | Phone: +1-513-529-4185 | Website: Miami University Official ↩
Email: farmerl2@miamioh.edu | Phone: +1-513-529-4823 | Website: Miami University Official ↩
Email: exreie@rit.edu | Phone: +1-585-475-7260 | Website: Rochester Institute of Technology Official ↩
Email: loracavu@buffalo.edu | Phone: +1-716-645-4696 | Website: University at Buffalo Official ↩