Title: | Influence Measures and Diagnostic Plots for Multivariate Linear Models |
---|---|
Description: | Computes regression deletion diagnostics for multivariate linear models and provides some associated diagnostic plots. The diagnostic measures include hat-values (leverages), generalized Cook's distance, and generalized squared 'studentized' residuals. Several types of plots to detect influential observations are provided. |
Authors: | Michael Friendly [aut, cre] |
Maintainer: | Michael Friendly <[email protected]> |
License: | GPL-2 |
Version: | 0.9.1 |
Built: | 2024-11-10 05:31:09 UTC |
Source: | https://github.com/friendly/mvinfluence |
This function is used internally in the package to convert the result of mlm.influence()
to a data frame.
It is not normally called by the user.
## S3 method for class 'inflmlm' as.data.frame(x, ..., FUN = det, funnames = TRUE)
## S3 method for class 'inflmlm' as.data.frame(x, ..., FUN = det, funnames = TRUE)
x |
An |
... |
ignored |
FUN |
in the case where the subset size, |
funnames |
logical. Should the |
A data frame containing the influence statistics
# none
# none
The functions cooks.distance.mlm
and hatvalues.mlm
are
designed as extractor functions for regression deletion diagnostics for
multivariate linear models following Barrett & Ling (1992). These are close
analogs of methods for univariate and generalized linear models handled by
the influence.measures
in the stats
package.
## S3 method for class 'mlm' cooks.distance(model, infl = mlm.influence(model, do.coef = FALSE), ...)
## S3 method for class 'mlm' cooks.distance(model, infl = mlm.influence(model, do.coef = FALSE), ...)
model |
A |
infl |
A |
... |
Ignored |
In addition, the functions provide diagnostics for deletion of subsets of
observations of size m>1
.
A vector of Cook's distances
Barrett, B. E. and Ling, R. F. (1992). General Classes of Influence Measures for Multivariate Regression. Journal of the American Statistical Association, 87(417), 184-191.
data(Rohwer, package="heplots") Rohwer2 <- subset(Rohwer, subset=group==2) rownames(Rohwer2)<- 1:nrow(Rohwer2) Rohwer.mod <- lm(cbind(SAT, PPVT, Raven) ~ n+s+ns+na+ss, data=Rohwer2) hatvalues(Rohwer.mod) cooks.distance(Rohwer.mod)
data(Rohwer, package="heplots") Rohwer2 <- subset(Rohwer, subset=group==2) rownames(Rohwer2)<- 1:nrow(Rohwer2) Rohwer.mod <- lm(cbind(SAT, PPVT, Raven) ~ n+s+ns+na+ss, data=Rohwer2) hatvalues(Rohwer.mod) cooks.distance(Rohwer.mod)
A small data set on the use of fertilizer (x) in relation to the amount of grain (y1) and straw (y2) produced.
A data frame with 8 observations on the following 3 variables.
amount of grain produced
amount of straw produced
amount of fertilizer applied
The first observation is an obvious outlier and influential observation.
Anderson, T. W. (1984). An Introduction to Multivariate Statistical Analysis, New York: Wiley, p. 369.
Hossain, A. and Naik, D. N. (1989). Detection of influential observations in multivariate regression. Journal of Applied Statistics, 16 (1), 25-37.
data(Fertilizer) # simple plots plot(Fertilizer, col=c('red', rep("blue",7)), cex=c(2,rep(1.2,7)), pch=as.character(1:8)) # A biplot shows the data in 2D. It gives another view of how case 1 stands out in data space biplot(prcomp(Fertilizer)) # fit the mlm mod <- lm(cbind(grain, straw) ~ fertilizer, data=Fertilizer) Anova(mod) # influence plots (m=1) influencePlot(mod) influencePlot(mod, type='LR') influencePlot(mod, type='stres')
data(Fertilizer) # simple plots plot(Fertilizer, col=c('red', rep("blue",7)), cex=c(2,rep(1.2,7)), pch=as.character(1:8)) # A biplot shows the data in 2D. It gives another view of how case 1 stands out in data space biplot(prcomp(Fertilizer)) # fit the mlm mod <- lm(cbind(grain, straw) ~ fertilizer, data=Fertilizer) Anova(mod) # influence plots (m=1) influencePlot(mod) influencePlot(mod, type='LR') influencePlot(mod, type='stres')
The functions cooks.distance.mlm
and hatvalues.mlm
are
designed as extractor functions for regression deletion diagnostics for
multivariate linear models following Barrett & Ling (1992). These are close
analogs of methods for univariate and generalized linear models handled by
the influence.measures
in the stats
package.
## S3 method for class 'mlm' hatvalues(model, m = 1, infl, ...)
## S3 method for class 'mlm' hatvalues(model, m = 1, infl, ...)
model |
An object of class |
m |
The size of subsets to be considered |
infl |
An |
... |
Other arguments, for compatibility with the generic; ignored. |
Hat values are a component of influence diagnostics, measuring the leverage or outlyingness of observations in the space of the predictor variables.
The usual
case considers observations one at a time (m=1
), where the hatvalue is
proportional to the squared Mahalanobis distance, of each observation
from the centroid of all observations. This function extends that definition
to calculate a comparable quantity for subsets of size
m>1
.
A vector of hatvalues
Barrett, B. E. and Ling, R. F. (1992). General Classes of Influence Measures for Multivariate Regression. Journal of the American Statistical Association, 87(417), 184-191.
data(Rohwer, package="heplots") Rohwer2 <- subset(Rohwer, subset=group==2) rownames(Rohwer2)<- 1:nrow(Rohwer2) Rohwer.mod <- lm(cbind(SAT, PPVT, Raven) ~ n+s+ns+na+ss, data=Rohwer2) options(digits=3) hatvalues(Rohwer.mod) cooks.distance(Rohwer.mod)
data(Rohwer, package="heplots") Rohwer2 <- subset(Rohwer, subset=group==2) rownames(Rohwer2)<- 1:nrow(Rohwer2) Rohwer.mod <- lm(cbind(SAT, PPVT, Raven) ~ n+s+ns+na+ss, data=Rohwer2) options(digits=3) hatvalues(Rohwer.mod) cooks.distance(Rohwer.mod)
Provides index plots of some diagnostic measures for a multivariate linear model: Cook's distance, a generalized (squared) studentized residual, hat-values (leverages), and Mahalanobis squared distances of the residuals.
## S3 method for class 'mlm' infIndexPlot( model, infl = mlm.influence(model, do.coef = FALSE), FUN = det, vars = c("Cook", "Studentized", "hat", "DSQ"), main = paste("Diagnostic Plots for", deparse(substitute(model))), pch = 19, labels, id.method = "y", id.n = if (id.method[1] == "identify") Inf else 0, id.cex = 1, id.col = palette()[1], id.location = "lr", grid = TRUE, ... )
## S3 method for class 'mlm' infIndexPlot( model, infl = mlm.influence(model, do.coef = FALSE), FUN = det, vars = c("Cook", "Studentized", "hat", "DSQ"), main = paste("Diagnostic Plots for", deparse(substitute(model))), pch = 19, labels, id.method = "y", id.n = if (id.method[1] == "identify") Inf else 0, id.cex = 1, id.col = palette()[1], id.location = "lr", grid = TRUE, ... )
model |
A multivariate linear model object of class |
infl |
influence measure structure as returned by
|
FUN |
For |
vars |
All the quantities listed in this argument are plotted. Use
|
main |
main title for graph |
pch |
Plotting character for points |
id.method , labels , id.n , id.cex , id.col , id.location
|
Arguments for the
labeling of points. The default is |
grid |
If TRUE, the default, a light-gray background grid is put on the graph |
... |
Arguments passed to |
This function produces index plots of the various influence measures
calculated by influence.mlm
, and in addition, the measure
based on the Mahalanobis squared distances of the residuals from the origin.
None. Used for its side effect of producing a graph.
Michael Friendly; borrows code from car::infIndexPlot
Barrett, B. E. and Ling, R. F. (1992). General Classes of Influence Measures for Multivariate Regression. Journal of the American Statistical Association, 87(417), 184-191.
Barrett, B. E. (2003). Understanding Influence in Multivariate Regression Communications in Statistics - Theory and Methods, 32, 667-680.
influencePlot.mlm
,
Mahalanobis
, infIndexPlot
,
# iris data data(iris) iris.mod <- lm(as.matrix(iris[,1:4]) ~ Species, data=iris) infIndexPlot(iris.mod, col=iris$Species, id.n=3) # Sake data data(Sake, package="heplots") Sake.mod <- lm(cbind(taste,smell) ~ ., data=Sake) infIndexPlot(Sake.mod, id.n=3) # Rohwer data data(Rohwer, package="heplots") Rohwer2 <- subset(Rohwer, subset=group==2) rownames(Rohwer2)<- 1:nrow(Rohwer2) rohwer.mlm <- lm(cbind(SAT, PPVT, Raven) ~ n + s + ns + na + ss, data=Rohwer2) infIndexPlot(rohwer.mlm, id.n=3)
# iris data data(iris) iris.mod <- lm(as.matrix(iris[,1:4]) ~ Species, data=iris) infIndexPlot(iris.mod, col=iris$Species, id.n=3) # Sake data data(Sake, package="heplots") Sake.mod <- lm(cbind(taste,smell) ~ ., data=Sake) infIndexPlot(Sake.mod, id.n=3) # Rohwer data data(Rohwer, package="heplots") Rohwer2 <- subset(Rohwer, subset=group==2) rownames(Rohwer2)<- 1:nrow(Rohwer2) rohwer.mlm <- lm(cbind(SAT, PPVT, Raven) ~ n + s + ns + na + ss, data=Rohwer2) infIndexPlot(rohwer.mlm, id.n=3)
This collection of functions is designed to compute regression deletion
diagnostics for multivariate linear models following Barrett & Ling (1992)
that are close analogs of methods for univariate and generalized linear
models handled by the influence.measures
in the
stats package.
## S3 method for class 'mlm' influence(model, do.coef = TRUE, m = 1, ...)
## S3 method for class 'mlm' influence(model, do.coef = TRUE, m = 1, ...)
model |
An |
do.coef |
logical. Should the coefficients be returned in the
|
m |
Size of the subsets for deletion diagnostics |
... |
Other arguments passed to methods |
In addition, the functions provide diagnostics for deletion of subsets of
observations of size m>1
.
influence.mlm
is a simple wrapper for the computational function,
mlm.influence
designed to provide an S3 method for class
"mlm"
objects.
There are still infelicities in the methods for the m>1
case in the
current implementation. In particular, for m>1
, you must call
influence.mlm
directly, rather than using the S3 generic
influence()
.
influence.mlm
returns an S3 object of class inflmlm
, a
list with the following components
m |
Deletion subset size |
H |
Hat values, |
Q |
Residuals, |
CookD |
Cook's distance values |
L |
Leverage components |
R |
Residual components |
subsets |
Indices of the observations in the subsets of size |
labels |
Observation labels |
call |
Model call for the |
Beta |
Deletion regression coefficients– included if |
Michael Friendly
Barrett, B. E. and Ling, R. F. (1992). General Classes of Influence Measures for Multivariate Regression. Journal of the American Statistical Association, 87(417), 184-191.
influencePlot.mlm
, mlm.influence
# Rohwer data data(Rohwer, package="heplots") Rohwer2 <- subset(Rohwer, subset=group==2) rownames(Rohwer2)<- 1:nrow(Rohwer2) Rohwer.mod <- lm(cbind(SAT, PPVT, Raven) ~ n+s+ns+na+ss, data=Rohwer2) # m=1 diagnostics influence(Rohwer.mod) |> head() # try an m=2 case ## res2 <- influence.mlm(Rohwer.mod, m=2, do.coef=FALSE) ## res2.df <- as.data.frame(res2) ## head(res2.df) ## scatterplotMatrix(log(res2.df)) influencePlot(Rohwer.mod, id.n=4, type="cookd") # Sake data data(Sake, package="heplots") Sake.mod <- lm(cbind(taste,smell) ~ ., data=Sake) influence(Sake.mod) influencePlot(Sake.mod, id.n=3, type="cookd")
# Rohwer data data(Rohwer, package="heplots") Rohwer2 <- subset(Rohwer, subset=group==2) rownames(Rohwer2)<- 1:nrow(Rohwer2) Rohwer.mod <- lm(cbind(SAT, PPVT, Raven) ~ n+s+ns+na+ss, data=Rohwer2) # m=1 diagnostics influence(Rohwer.mod) |> head() # try an m=2 case ## res2 <- influence.mlm(Rohwer.mod, m=2, do.coef=FALSE) ## res2.df <- as.data.frame(res2) ## head(res2.df) ## scatterplotMatrix(log(res2.df)) influencePlot(Rohwer.mod, id.n=4, type="cookd") # Sake data data(Sake, package="heplots") Sake.mod <- lm(cbind(taste,smell) ~ ., data=Sake) influence(Sake.mod) influencePlot(Sake.mod, id.n=3, type="cookd")
This function creates various types of “bubble” plots of influence measures with the areas of the circles representing the observations proportional to generalized Cook's distances.
## S3 method for class 'mlm' influencePlot( model, scale = 12, type = c("stres", "LR", "cookd"), infl = mlm.influence(model, do.coef = FALSE), FUN = det, fill = TRUE, fill.col = "red", fill.alpha.max = 0.5, labels, id.method = "noteworthy", id.n = if (id.method[1] == "identify") Inf else 0, id.cex = 1, id.col = palette()[1], ref.col = "gray", ref.lty = 2, ref.lab = TRUE, ... )
## S3 method for class 'mlm' influencePlot( model, scale = 12, type = c("stres", "LR", "cookd"), infl = mlm.influence(model, do.coef = FALSE), FUN = det, fill = TRUE, fill.col = "red", fill.alpha.max = 0.5, labels, id.method = "noteworthy", id.n = if (id.method[1] == "identify") Inf else 0, id.cex = 1, id.col = palette()[1], ref.col = "gray", ref.lty = 2, ref.lab = TRUE, ... )
model |
An |
scale |
a factor to adjust the radii of the circles, in relation to
|
type |
Type of plot: one of |
infl |
influence measure structure as returned by
|
FUN |
For |
fill , fill.col , fill.alpha.max
|
|
labels , id.method , id.n , id.cex , id.col
|
settings for labeling points;
see |
ref.col , ref.lty , ref.lab
|
arguments for reference lines. Incompletely implemented in this version |
... |
other arguments passed down |
type="stres"
plots squared (internally) Studentized residuals against
hat values;
type="cookd"
plots Cook's distance against hat values;
type="LR"
plots residual components against leverage components, with
the attractive property that contours of constant Cook's distance fall on diagonal
lines with slope = -1. Adjacent reference lines represent multiples of influence.
The id.method="noteworthy"
setting also requires setting
id.n>0
to have any effect. Using id.method="noteworthy"
, and
id.n>0
, the number of points labeled is the union of the largest
id.n
values on each of L, R, and CookD.
If points are identified, returns a data frame with the hat values, Studentized residuals and Cook's distance of the identified points. If no points are identified, nothing is returned. This function is primarily used for its side-effect of drawing a plot.
Michael Friendly
Barrett, B. E. and Ling, R. F. (1992). General Classes of Influence Measures for Multivariate Regression. Journal of the American Statistical Association, 87(417), 184-191.
Barrett, B. E. (2003). Understanding Influence in Multivariate Regression Communications in Statistics - Theory and Methods, 32, 667-680.
McCulloch, C. E. & Meeter, D. (1983). Discussion of "Outliers..." by R. J. Beckman and R. D. Cook. Technometrics, 25, 152-155
influencePlot
in the car package
data(Rohwer, package="heplots") Rohwer2 <- subset(Rohwer, subset=group==2) Rohwer.mod <- lm(cbind(SAT, PPVT, Raven) ~ n+s+ns+na+ss, data=Rohwer2) influencePlot(Rohwer.mod, id.n=4, type="stres") influencePlot(Rohwer.mod, id.n=4, type="LR") influencePlot(Rohwer.mod, id.n=4, type="cookd") # Sake data data(Sake, package="heplots") Sake.mod <- lm(cbind(taste,smell) ~ ., data=Sake) influencePlot(Sake.mod, id.n=3, type="stres") influencePlot(Sake.mod, id.n=3, type="LR") influencePlot(Sake.mod, id.n=3, type="cookd") # Adopted data data(Adopted, package="heplots") Adopted.mod <- lm(cbind(Age2IQ, Age4IQ, Age8IQ, Age13IQ) ~ AMED + BMIQ, data=Adopted) influencePlot(Adopted.mod, id.n=3) influencePlot(Adopted.mod, id.n=3, type="LR", ylim=c(-4,-1.5))
data(Rohwer, package="heplots") Rohwer2 <- subset(Rohwer, subset=group==2) Rohwer.mod <- lm(cbind(SAT, PPVT, Raven) ~ n+s+ns+na+ss, data=Rohwer2) influencePlot(Rohwer.mod, id.n=4, type="stres") influencePlot(Rohwer.mod, id.n=4, type="LR") influencePlot(Rohwer.mod, id.n=4, type="cookd") # Sake data data(Sake, package="heplots") Sake.mod <- lm(cbind(taste,smell) ~ ., data=Sake) influencePlot(Sake.mod, id.n=3, type="stres") influencePlot(Sake.mod, id.n=3, type="LR") influencePlot(Sake.mod, id.n=3, type="cookd") # Adopted data data(Adopted, package="heplots") Adopted.mod <- lm(cbind(Age2IQ, Age4IQ, Age8IQ, Age13IQ) ~ AMED + BMIQ, data=Adopted) influencePlot(Adopted.mod, id.n=3) influencePlot(Adopted.mod, id.n=3, type="LR", ylim=c(-4,-1.5))
These functions implement the general classes of influence measures for multivariate regression models defined in Barrett and Ling (1992), Eqn 2.3, 2.4, as shown in their Table 1.
Jtr(H, Q, a, b, f) Jdet(H, Q, a, b, f) COOKD(H, Q, n, p, r, m) DFFITS(H, Q, n, p, r, m) COVRATIO(H, Q, n, p, r, m)
Jtr(H, Q, a, b, f) Jdet(H, Q, a, b, f) COOKD(H, Q, n, p, r, m) DFFITS(H, Q, n, p, r, m) COVRATIO(H, Q, n, p, r, m)
H |
a scalar or |
Q |
a scalar or |
a |
the |
b |
the |
f |
scaling factor for the |
n |
sample size |
p |
number of predictor variables |
r |
number of response variables |
m |
deletion subset size |
There are two classes of functions, denoted and
,
with parameters
of the data,
of the subset size
and
and
which define powers of terms in the formulas, typically
in the set
-2, -1, 0
.
They are defined in terms of the submatrices for a deleted index subset
,
corresponding to the hat and residual matrices in univariate models.
For subset size these evaluate to scalar equivalents of hat
values and studentized residuals.
For subset size these are
matrices and
functions in the
class use
and
, while
those in the
class use
and
.
The functions COOKD
, COVRATIO
, and DFFITS
implement
some of the standard influence measures in these terms for the general cases
of multivariate linear models and deletion of subsets of size m>1
,
but they have not yet been incorporated into our main functions
mlm.influence
and influence.mlm
.
The scalar result of the computation.
Michael Friendly
Barrett, B. E. and Ling, R. F. (1992). General Classes of Influence Measures for Multivariate Regression. Journal of the American Statistical Association, 87(417), 184-191.
This function creates a “bubble” plot of functions, R = log(Studentized residuals^2) by L = log(H/p*(1-H)) of the hat values, with the areas of the circles representing the observations proportional to Cook's distances.
lrPlot(model, ...) ## S3 method for class 'lm' lrPlot( model, scale = 12, xlab = "log Leverage factor [log H/p*(1-H)]", ylab = "log (Studentized Residual^2)", xlim = NULL, ylim, labels, id.method = "noteworthy", id.n = if (id.method[1] == "identify") Inf else 0, id.cex = 1, id.col = palette()[1], ref = c("h", "v", "d", "c"), ref.col = "gray", ref.lty = 2, ref.lab = TRUE, ... )
lrPlot(model, ...) ## S3 method for class 'lm' lrPlot( model, scale = 12, xlab = "log Leverage factor [log H/p*(1-H)]", ylab = "log (Studentized Residual^2)", xlim = NULL, ylim, labels, id.method = "noteworthy", id.n = if (id.method[1] == "identify") Inf else 0, id.cex = 1, id.col = palette()[1], ref = c("h", "v", "d", "c"), ref.col = "gray", ref.lty = 2, ref.lab = TRUE, ... )
model |
a model object fit by |
... |
arguments to pass to the |
scale |
a factor to adjust the radii of the circles, in relation to |
xlab , ylab
|
axis labels. |
xlim , ylim
|
Limits for x and y axes. In the space of (L, R) very small
residuals typically extend the y axis enough to swamp the large residuals,
so the default for |
labels , id.method , id.n , id.cex , id.col
|
settings for labeling points; see
|
ref |
Options to draw reference lines, any one or more of |
ref.col , ref.lty
|
Color and line type for reference lines. Reference
lines for |
ref.lab |
A logical, indicating whether the reference lines should be labeled. |
This plot, suggested by McCulloch & Meeter (1983) has the attractive property that contours of equal Cook's distance are diagonal lines with slope = -1. Various reference lines are drawn on the plot corresponding to twice and three times the average hat value, a “large” squared studentized residual and contours of Cook's distance.
The id.method="noteworthy"
setting also requires setting
id.n>0
to have any effect. Using id.method="noteworthy"
, and
id.n>0
, the number of points labeled is the union of the largest
id.n
values on each of L, R, and CookD.
If points are identified, returns a data frame with the hat values, Studentized residuals and Cook's distance of the identified points. If no points are identified, nothing is returned. This function is primarily used for its side-effect of drawing a plot.
Michael Friendly
A. J. Lawrence (1995). Deletion Influence and Masking in Regression Journal of the Royal Statistical Society. Series B (Methodological) , Vol. 57, No. 1, pp. 181-189.
McCulloch, C. E. & Meeter, D. (1983). Discussion of "Outliers..." by R. J. Beckman and R. D. Cook. Technometrics, 25, 152-155.
influencePlot.mlm
influencePlot
in the car
package for other methods
# artificial example from Lawrence (1995) x <- c( 0, 0, 7, 7, 8, 8, 9, 9, 10, 10, 11, 11, 18, 18 ) y <- c( 0, 6, 6, 7, 6, 7, 6, 7, 6, 7, 6, 7, 7, 18 ) DF <- data.frame(x,y, row.names=LETTERS[1:length(x)]) DF with(DF, { plot(x,y, pch=16, cex=1.3) abline(lm(y~x), col="red", lwd=2) NB <- c(1,2,13,14) text(x[NB],y[NB], LETTERS[NB], pos=c(4,4,2,2)) } ) mod <- lm(y~x, data=DF) # standard influence plot from car influencePlot(mod, id.n=4) # lrPlot version lrPlot(mod, id.n=4) library(car) dmod <- lm(prestige ~ income + education, data = Duncan) influencePlot(dmod, id.n=3) lrPlot(dmod, id.n=3)
# artificial example from Lawrence (1995) x <- c( 0, 0, 7, 7, 8, 8, 9, 9, 10, 10, 11, 11, 18, 18 ) y <- c( 0, 6, 6, 7, 6, 7, 6, 7, 6, 7, 6, 7, 7, 18 ) DF <- data.frame(x,y, row.names=LETTERS[1:length(x)]) DF with(DF, { plot(x,y, pch=16, cex=1.3) abline(lm(y~x), col="red", lwd=2) NB <- c(1,2,13,14) text(x[NB],y[NB], LETTERS[NB], pos=c(4,4,2,2)) } ) mod <- lm(y~x, data=DF) # standard influence plot from car influencePlot(mod, id.n=4) # lrPlot version lrPlot(mod, id.n=4) library(car) dmod <- lm(prestige ~ income + education, data = Duncan) influencePlot(dmod, id.n=3) lrPlot(dmod, id.n=3)
mlm.influence
is the main computational function in this package. It
is usually not called directly, but rather via its alias,
influence.mlm
, the S3 method for a mlm
object.
mlm.influence(model, do.coef = TRUE, m = 1, ...)
mlm.influence(model, do.coef = TRUE, m = 1, ...)
model |
An |
do.coef |
logical. Should the coefficients be returned in the
|
m |
Size of the subsets for deletion diagnostics |
... |
Further arguments passed to other methods |
The computations and methods for the m=1
case are straight-forward,
as are the computations for the m>1
case. Associated methods for
m>1
are still under development.
mlm.influence
returns an S3 object of class inflmlm
, a
list with the following components:
m |
Deletion subset size |
H |
Hat values, |
Q |
Residuals, |
CookD |
Cook's distance values |
L |
Leverage components |
R |
Residual components |
subsets |
Indices of the subsets |
CookD |
Cook's distance values |
L |
Leverage components |
R |
Residual components |
subsets |
Indices of the observations in the subsets of size |
labels |
Observation labels |
call |
Model call for the |
Beta |
Deletion regression coefficients– included if |
Michael Friendly
Barrett, B. E. and Ling, R. F. (1992). General Classes of Influence Measures for Multivariate Regression. Journal of the American Statistical Association, 87(417), 184-191.
Barrett, B. E. (2003). Understanding Influence in Multivariate Regression. Communications in Statistics – Theory and Methods, 32, 3, 667-680.
Rohwer2 <- subset(Rohwer, subset=group==2) rownames(Rohwer2)<- 1:nrow(Rohwer2) Rohwer.mod <- lm(cbind(SAT, PPVT, Raven) ~ n+s+ns+na+ss, data=Rohwer2) Rohwer.mod influence(Rohwer.mod) # extract the most influential cases influence(Rohwer.mod) |> as.data.frame() |> dplyr::arrange(dplyr::desc(CookD)) |> head() # Sake data Sake.mod <- lm(cbind(taste,smell) ~ ., data=Sake) influence(Sake.mod) |> as.data.frame() |> dplyr::arrange(dplyr::desc(CookD)) |> head()
Rohwer2 <- subset(Rohwer, subset=group==2) rownames(Rohwer2)<- 1:nrow(Rohwer2) Rohwer.mod <- lm(cbind(SAT, PPVT, Raven) ~ n+s+ns+na+ss, data=Rohwer2) Rohwer.mod influence(Rohwer.mod) # extract the most influential cases influence(Rohwer.mod) |> as.data.frame() |> dplyr::arrange(dplyr::desc(CookD)) |> head() # Sake data Sake.mod <- lm(cbind(taste,smell) ~ ., data=Sake) influence(Sake.mod) |> as.data.frame() |> dplyr::arrange(dplyr::desc(CookD)) |> head()
Calculates the n
-th power of a square matrix, where n
can be a
positive or negative integer or a fractional power.
mpower(A, n) A %^% n
mpower(A, n) A %^% n
A |
A square matrix. Must also be symmetric for non-integer powers. |
n |
matrix power |
If n<0
, the method is applied to .
When
n
is an
integer, the function uses the Russian peasant method, or repeated squaring
for efficiency.
Otherwise, it uses the spectral decomposition of A
,
requiring a symmetric matrix.
Returns the matrix
Michael Friendly
https://en.wikipedia.org/wiki/Exponentiation_by_squaring
Packages corpcor and expm define similar functions.
M <- matrix(sample(1:9), 3,3) mpower(M,2) mpower(M,4) # make a symmetric matrix MM <- crossprod(M) mpower(MM, -1) Mhalf <- mpower(MM, 1/2) all.equal(MM, Mhalf %*% Mhalf)
M <- matrix(sample(1:9), 3,3) mpower(M,2) mpower(M,4) # make a symmetric matrix MM <- crossprod(M) mpower(MM, -1) Mhalf <- mpower(MM, 1/2) all.equal(MM, Mhalf %*% Mhalf)
Functions in this package compute regression deletion diagnostics for multivariate linear models following methods proposed by Barrett & Ling (1992) and provide some associated diagnostic plots.
The design goal for this package is that, as an extension of standard methods for univariate linear models, you should be able to fit a linear model with a multivariate response,
mymlm <- lm( cbind(y1, y2, y3) ~ x1 + x2 + x3, data=mydata)
and then get useful diagnostics and plots with
influence(mymlm) hatvalues(mymlm) influencePlot(mymlm, ...)
The diagnostic measures include hat-values (leverages), generalized Cook's distance and generalized squared 'studentized' residuals. Several types of plots to detect influential observations are provided.
In addition, the functions provide diagnostics for deletion of subsets of observations
of size m>1
. This case is theoretically interesting because sometimes pairs (m=2
)
of influential observations can mask each other, sometimes they can have joint influence
far exceeding their individual effects, as well as other interesting phenomena described
by Lawrence (1995). Associated methods for the case m>1
are still under development in this package.
The main function in the package is the S3 method, influence.mlm
, a simple wrapper for
mlm.influence
, which does the actual computations.
This design was dictated by that used in the stats package, which provides
the generic method influence
and methods
influence.lm
and influence.glm
. The car package extends this to include
influence.lme
for models fit by lme
.
The following sections describe the notation and measures used in the calculations.
Let be the model matrix in the multivariate linear model,
.
The usual least squares estimate of
is given by
.
Then let
be the submatrix of
whose
rows are indexed by
,
is the complement, the submatrix of
with the
rows in
deleted,
Matrices ,
are defined similarly.
In the calculation of regression coefficients,
are the estimated
coefficients
when the cases indexed by
have been removed. The corresponding residuals are
.
The influence measures defined by Barrett & Ling (1992) are functions of two matrices and
defined as follows:
For the full data set, the “hat matrix”, , is given by
,
is
the submatrix of
corresponding to the index set
,
,
is the analog of
defined for the residual matrix
, that is,
, with corresponding submatrix
,
In these terms, Cook's distance is defined for a univariate response by
a measure of the squared distance between the coefficients for the full data set and those
obtained when the cases in
are deleted.
In the multivariate case, Cook's distance is obtained
by replacing the vector of coefficients by
, the result of stringing out
the coefficients for all responses in a single
-length vector.
where is the Kronecker (direct) product and
is the covariance matrix of the residuals.
For a univariate response, and when m = 1
, Cook's distance can be re-written as a product of leverage and residual components as
Then we can define a leverage component and residual component
as
is the studentized residual, and
.
In the general, multivariate case there are analogous matrix expressions for and
.
When
m > 1
, the quantities ,
,
, and
are
matrices. Where scalar quantities are needed, the package functions apply
a function,
FUN
, either det()
or tr()
to calculate a measure of “size”, as in
H <- sapply(x$H, FUN) Q <- sapply(x$Q, FUN) L <- sapply(x$L, FUN) R <- sapply(x$R, FUN)
The stats-package
provides a collection of other leave-one-out deletion diagnostics that work with
multivariate response models.
rstandard
Standardized residuals, re-scaling the residuals to have unit variance
rstudent
Studentized residuals, re-scaling the residuals to have leave-one-out variance
dfits
a scaled measure of the change in the predicted value for the ith observation
covratio
the change in the determinant of the covariance matrix of the estimates by deleting the ith observation
Barrett, B. E. and Ling, R. F. (1992). General Classes of Influence Measures for Multivariate Regression. Journal of the American Statistical Association, 87(417), 184-191.
Barrett, B. E. (2003). Understanding Influence in Multivariate Regression. Communications in Statistics – Theory and Methods, 32, 3, 667-680.
A. J. Lawrence (1995). Deletion Influence and Masking in Regression. Journal of the Royal Statistical Society. Series B (Methodological) , 57, 1, 181-189.
Print an inflmlm object
## S3 method for class 'inflmlm' print(x, digits = max(3, getOption("digits") - 4), FUN = det, ...)
## S3 method for class 'inflmlm' print(x, digits = max(3, getOption("digits") - 4), FUN = det, ...)
x |
An |
digits |
Number of digits to print |
FUN |
Function to combine diagnostics when |
... |
passed to |
Invisibly returns the object
# none
# none
Calculates the trace of a matrix
tr(M)
tr(M)
M |
a matrix |
For square, symmetric matrices, such as covariance matrices, the trace is sometimes used as a measure of size, e.g., in Pillai's trace criterion for a MLM.
returns the sum of the diagonal elements of the matrix
Michael Friendly
M <- matrix(sample(1:9), 3,3) tr(M)
M <- matrix(sample(1:9), 3,3) tr(M)