Title: | Sparse and Regularized Discriminant Analysis |
---|---|
Description: | A collection of sparse and regularized discriminant analysis methods intended for small-sample, high-dimensional data sets. The package features the High-Dimensional Regularized Discriminant Analysis classifier from Ramey et al. (2017) <arXiv:1602.01182>. Other classifiers include those from Dudoit et al. (2002) <doi:10.1198/016214502753479248>, Pang et al. (2009) <doi:10.1111/j.1541-0420.2009.01200.x>, and Tong et al. (2012) <doi:10.1093/bioinformatics/btr690>. |
Authors: | John A. Ramey <[email protected]> |
Maintainer: | Max Kuhn <[email protected]> |
License: | MIT + file LICENSE |
Version: | 0.2.5.9000 |
Built: | 2025-01-07 02:58:31 UTC |
Source: | https://github.com/topepo/sparsediscrim |
Centers the observations in a matrix by their respective class sample means
center_data(x, y)
center_data(x, y)
x |
Matrix or data frame containing the training data. The rows are the sample observations, and the columns are the features. Only complete data are retained. |
y |
vector of class labels for each training observation |
matrix with observations centered by its corresponding class sample mean
autocorrelated covariance matrixThis function generates a autocorrelated covariance matrix
with autocorrelation parameter
rho
. The variance sigma2
is
constant for each feature and defaulted to 1.
cov_autocorrelation(p, rho, sigma2 = 1)
cov_autocorrelation(p, rho, sigma2 = 1)
p |
the size of the covariance matrix |
rho |
the autocorrelation parameter. Must be less than 1 in absolute value. |
sigma2 |
the variance of each feature |
The autocorrelated covariance matrix is defined as:
The th entry of the autocorrelated covariance matrix is defined as:
.
The value of rho
must be such that to ensure that
the covariance matrix is positive definite.
autocorrelated covariance matrix
block-diagonal covariance matrix with
autocorrelated blocks.This function generates a covariance matrix with
autocorrelated blocks. The autocorrelation parameter is
rho
.
There are num_blocks
blocks each with size, block_size
.
The variance, sigma2
, is constant for each feature and defaulted to 1.
cov_block_autocorrelation(num_blocks, block_size, rho, sigma2 = 1)
cov_block_autocorrelation(num_blocks, block_size, rho, sigma2 = 1)
num_blocks |
the number of blocks in the covariance matrix |
block_size |
the size of each square block within the covariance matrix |
rho |
the autocorrelation parameter. Must be less than 1 in absolute value. |
sigma2 |
the variance of each feature |
The autocorrelated covariance matrix is defined as:
where denotes the direct sum and the
th entry of
is
The matrix is the autocorrelated block discussed above.
The value of rho
must be such that to ensure that
the covariance matrix is positive definite.
The size of the resulting matrix is , where
p = num_blocks * block_size
.
autocorrelated covariance matrix
For the classes given in the vector y
, we compute the eigenvalue
(spectral) decomposition of the class sample covariance matrices (MLEs) using
the data matrix x
.
cov_eigen(x, y, pool = FALSE, fast = FALSE, tol = 1e-06)
cov_eigen(x, y, pool = FALSE, fast = FALSE, tol = 1e-06)
x |
data matrix with |
y |
class labels for observations (rows) in |
pool |
logical. Should the sample covariance matrices be pooled? |
fast |
logical. Should the Fast SVD be used? See details. |
tol |
tolerance value below which the singular values of |
If the fast
argument is selected, we utilize the so-called Fast
Singular Value Decomposition (SVD) to quickly compute the eigenvalue
decomposition. To compute the Fast SVD, we use the corpcor::fast.svd()
function, which employs a well-known trick for tall data (large n
,
small p
) and wide data (large p
, small n
) to compute the
SVD corresponding to the nonzero singular values. For more information about
the Fast SVD, see corpcor::fast.svd()
.
a list containing the eigendecomposition for each class. If
pool = TRUE
, then a single list is returned.
cov_eigen(x = iris[, -5], y = iris[, 5]) cov_eigen(x = iris[, -5], y = iris[, 5], pool = TRUE) cov_eigen(x = iris[, -5], y = iris[, 5], pool = TRUE, fast = TRUE) # Generates a data set having fewer observations than features. # We apply the Fast SVD to compute the eigendecomposition corresponding to the # nonzero eigenvalues of the covariance matrices. set.seed(42) n <- 5 p <- 20 num_classes <- 3 x <- lapply(seq_len(num_classes), function(k) { replicate(p, rnorm(n, mean = k)) }) x <- do.call(rbind, x) colnames(x) <- paste0("x", 1:ncol(x)) y <- gl(num_classes, n) cov_eigen(x = x, y = y, fast = TRUE) cov_eigen(x = x, y = y, pool = TRUE, fast = TRUE)
cov_eigen(x = iris[, -5], y = iris[, 5]) cov_eigen(x = iris[, -5], y = iris[, 5], pool = TRUE) cov_eigen(x = iris[, -5], y = iris[, 5], pool = TRUE, fast = TRUE) # Generates a data set having fewer observations than features. # We apply the Fast SVD to compute the eigendecomposition corresponding to the # nonzero eigenvalues of the covariance matrices. set.seed(42) n <- 5 p <- 20 num_classes <- 3 x <- lapply(seq_len(num_classes), function(k) { replicate(p, rnorm(n, mean = k)) }) x <- do.call(rbind, x) colnames(x) <- paste0("x", 1:ncol(x)) y <- gl(num_classes, n) cov_eigen(x = x, y = y, fast = TRUE) cov_eigen(x = x, y = y, pool = TRUE, fast = TRUE)
intraclass covariance matrixThis function generates a intraclass covariance matrix with
correlation
rho
. The variance sigma2
is constant for each
feature and defaulted to 1.
cov_intraclass(p, rho, sigma2 = 1)
cov_intraclass(p, rho, sigma2 = 1)
p |
the size of the covariance matrix |
rho |
the value of the off-diagonal elements |
sigma2 |
the variance of each feature |
The intraclass covariance matrix is defined as:
where is the
matrix of ones and
is the
identity matrix.
By default, with sigma2 = 1
, the diagonal elements of the intraclass
covariance matrix are all 1, while the off-diagonal elements of the matrix
are all rho
.
The value of rho
must be between and 1,
exclusively, to ensure that the covariance matrix is positive definite.
intraclass covariance matrix
For a sample matrix, x
, we compute the MLE for the covariance matrix
for each class given in the vector, y
.
cov_list(x, y)
cov_list(x, y)
x |
data matrix with |
y |
class labels for observations (rows) in |
list of the sample covariance matrices of size for
each class given in
y
.
For a sample matrix, x
, we compute the sample covariance matrix of the
data as the maximum likelihood estimator (MLE) of the population covariance
matrix.
cov_mle(x, diag = FALSE)
cov_mle(x, diag = FALSE)
x |
data matrix with |
diag |
logical value. If TRUE, assumes the population covariance matrix
is diagonal. By default, we assume that |
If the diag
option is set to TRUE
, then we assume the population
covariance matrix is diagonal, and the MLE is computed under this assumption.
In this case, we return a vector of length p
instead.
sample covariance matrix of size . If
diag
is
TRUE
, then a vector of length p
is returned instead.
For the matrix x
, we compute the MLE for the population covariance
matrix under the assumption that the data are sampled from
multivariate normal populations having equal covariance matrices.
cov_pool(x, y)
cov_pool(x, y)
x |
data matrix with |
y |
class labels for observations (rows) in |
pooled sample covariance matrix of size
cov_pool(iris[, -5], iris$Species)
cov_pool(iris[, -5], iris$Species)
For a sample matrix, x
, we compute the sample covariance matrix as the
maximum likelihood estimator (MLE) of the population covariance matrix and
shrink it towards its diagonal.
cov_shrink_diag(x, gamma = 1)
cov_shrink_diag(x, gamma = 1)
x |
data matrix with |
gamma |
the shrinkage parameter. Must be between 0 and 1, inclusively. By default, the shrinkage parameter is 1, which simply yields the MLE. |
Let be the MLE of the covariance matrix
.
Then, we shrink the MLE towards its diagonal by computing
where denotes the Hadamard product
and
.
For , the resulting shrunken covariance matrix estimator is
positive definite, and for
, we simply have the MLE, which can
potentially be positive semidefinite (singular).
The estimator given here is based on Section 18.3.1 of the Hastie et al. (2008) text.
shrunken sample covariance matrix of size
Hastie, T., Tibshirani, R., and Friedman, J. (2008), "The Elements of Statistical Learning: Data Mining, Inference, and Prediction," 2nd edition. http://web.stanford.edu/~hastie/ElemStatLearn/
For a vector of training labels, we return a list of cross-validation folds,
where each fold has the indices of the observations to leave out in the fold.
In terms of classification error rate estimation, one can think of a fold as a
the observations to hold out as a test sample set. Either the hold_out
size or the number of folds, num_folds
, can be specified. The number
of folds defaults to 10, but if the hold_out
size is specified, then
num_folds
is ignored.
cv_partition(y, num_folds = 10, hold_out = NULL, seed = NULL)
cv_partition(y, num_folds = 10, hold_out = NULL, seed = NULL)
y |
a vector of class labels |
num_folds |
the number of cross-validation folds. Ignored if
|
hold_out |
the hold-out size for cross-validation. See Details. |
seed |
optional random number seed for splitting the data for cross-validation |
We partition the vector y
based on its length, which we treat as the
sample size, 'n'. If an object other than a vector is used in y
, its
length can yield unexpected results. For example, the output of
length(diag(3))
is 9.
list the indices of the training and test observations for each fold.
# The following three calls to `cv_partition` yield the same partitions. set.seed(42) cv_partition(iris$Species) cv_partition(iris$Species, num_folds = 10, seed = 42) cv_partition(iris$Species, hold_out = 15, seed = 42)
# The following three calls to `cv_partition` yield the same partitions. set.seed(42) cv_partition(iris$Species) cv_partition(iris$Species, num_folds = 10, seed = 42) cv_partition(iris$Species, hold_out = 15, seed = 42)
Computes the maximum likelihood estimators (MLEs) for each class under the assumption of multivariate normality for each class. Also, computes ancillary information necessary for classifier summary, such as sample size, the number of features, etc.
diag_estimates(x, y, prior = NULL, pool = FALSE, est_mean = c("mle", "tong"))
diag_estimates(x, y, prior = NULL, pool = FALSE, est_mean = c("mle", "tong"))
x |
Matrix or data frame containing the training data. The rows are the sample observations, and the columns are the features. Only complete data are retained. |
y |
Vector of class labels for each training observation. Only complete data are retained. |
prior |
Vector with prior probabilities for each class. If NULL (default), then equal probabilities are used. See details. |
pool |
logical value. If TRUE, calculates the pooled sample variances for each class. |
est_mean |
the estimator for the class means. By default, we use the maximum likelihood estimator (MLE). To improve the estimation, we provide the option to use a shrunken mean estimator proposed by Tong et al. (2012). |
This function computes the common estimates and ancillary information used in
all of the diagonal classifiers in the sparsediscrim
package.
The matrix of training observations are given in x
. The rows of x
contain the sample observations, and the columns contain the features for each
training observation.
The vector of class labels given in y
are coerced to a factor
.
The length of y
should match the number of rows in x
.
An error is thrown if a given class has less than 2 observations because the variance for each feature within a class cannot be estimated with less than 2 observations. If other data have zero variances, these will be removed with a warning.
The vector, prior
, contains the a priori class membership for
each class. If prior
is NULL (default), the class membership
probabilities are estimated as the sample proportion of observations belonging
to each class. Otherwise, prior
should be a vector with the same length
as the number of classes in y
. The prior
probabilities should be
nonnegative and sum to one.
named list with estimators for each class and necessary ancillary information
Tong, T., Chen, L., and Zhao, H. (2012), "Improved Mean Estimation and Its Application to Diagonal Discriminant Analysis," Bioinformatics, 28, 4, 531-537. http://bioinformatics.oxfordjournals.org/content/28/4/531.long
Alternative to mvtnorm::dmvnorm
dmvnorm_diag(x, mean, sigma)
dmvnorm_diag(x, mean, sigma)
x |
matrix |
mean |
vector of means |
sigma |
vector containing diagonal covariance matrix |
multivariate normal density
K
multivariate normal data populations, where each
population (class) has a covariance matrix consisting of block-diagonal
autocorrelation matrices.This function generates K
multivariate normal data sets, where each
class is generated with a constant mean vector and a covariance matrix
consisting of block-diagonal autocorrelation matrices. The data are returned
as a single matrix x
along with a vector of class labels y
that
indicates class membership.
generate_blockdiag(n, mu, num_blocks, block_size, rho, sigma2 = rep(1, K))
generate_blockdiag(n, mu, num_blocks, block_size, rho, sigma2 = rep(1, K))
n |
vector of the sample sizes of each class. The length of |
mu |
matrix containing the mean vectors for each class. Expected to have
|
num_blocks |
the number of block matrices. See details. |
block_size |
the dimensions of the square block matrix. See details. |
rho |
vector of the values of the autocorrelation parameter for each
class covariance matrix. Must equal the length of |
sigma2 |
vector of the variance coefficients for each class covariance
matrix. Must equal the length of |
For simplicity, we assume that a class mean vector is constant for each
feature. That is, we assume that the mean vector of the th class is
, where
is a
vector of ones and
is a real scalar.
The th class covariance matrix is defined as
where denotes the direct sum and the
th entry of
is
The matrix is referred to as a block. Its dimensions
are provided in the
block_size
argument, and the number of blocks are
specified in the num_blocks
argument.
Each matrix is generated by the
cov_block_autocorrelation()
function.
The number of classes K
is determined with lazy evaluation as the
length of n
.
The number of features p
is computed as block_size * num_blocks
.
named list with elements:
x
: matrix of observations with n
rows and p
columns
y
: vector of class labels that indicates class membership for
each observation (row) in x
.
# Generates data from K = 3 classes. means <- matrix(rep(1:3, each=9), ncol=3) data <- generate_blockdiag(n = c(15, 15, 15), block_size = 3, num_blocks = 3, rho = seq(.1, .9, length = 3), mu = means) data$x data$y # Generates data from K = 4 classes. Notice that we use specify a variance. means <- matrix(rep(1:4, each=9), ncol=4) data <- generate_blockdiag(n = c(15, 15, 15, 20), block_size = 3, num_blocks = 3, rho = seq(.1, .9, length = 4), mu = means) data$x data$y
# Generates data from K = 3 classes. means <- matrix(rep(1:3, each=9), ncol=3) data <- generate_blockdiag(n = c(15, 15, 15), block_size = 3, num_blocks = 3, rho = seq(.1, .9, length = 3), mu = means) data$x data$y # Generates data from K = 4 classes. Notice that we use specify a variance. means <- matrix(rep(1:4, each=9), ncol=4) data <- generate_blockdiag(n = c(15, 15, 15, 20), block_size = 3, num_blocks = 3, rho = seq(.1, .9, length = 4), mu = means) data$x data$y
K
multivariate normal data populations, where each
population (class) has an intraclass covariance matrix.This function generates K
multivariate normal data sets, where each
class is generated with a constant mean vector and an intraclass covariance
matrix. The data are returned as a single matrix x
along with a vector
of class labels y
that indicates class membership.
generate_intraclass(n, p, rho, mu, sigma2 = rep(1, K))
generate_intraclass(n, p, rho, mu, sigma2 = rep(1, K))
n |
vector of the sample sizes of each class. The length of |
p |
the number of features (variables) in the data |
rho |
vector of the values of the off-diagonal elements for each
intraclass covariance matrix. Must equal the length of |
mu |
vector containing the mean for each class. Must equal the length of
|
sigma2 |
vector of variances for each class. Must equal the length of
|
For simplicity, we assume that a class mean vector is constant for each
feature. That is, we assume that the mean vector of the th class is
, where
is a
vector of ones and
is a real scalar.
The intraclass covariance matrix for the th class is defined as:
where is the
matrix of ones and
is the
identity matrix.
By default, with , the diagonal elements of the intraclass
covariance matrix are all 1, while the off-diagonal elements of the matrix
are all
rho
.
The values of rho
must be between and 1,
exclusively, to ensure that the covariance matrix is positive definite.
The number of classes K
is determined with lazy evaluation as the
length of n
.
named list with elements:
x
: matrix of observations with n
rows and p
columns
y
: vector of class labels that indicates class membership for
each observation (row) in x
.
# Generates data from K = 3 classes. data <- generate_intraclass(n = 3:5, p = 5, rho = seq(.1, .9, length = 3), mu = c(0, 3, -2)) data$x data$y # Generates data from K = 4 classes. Notice that we use specify a variance. data <- generate_intraclass(n = 3:6, p = 4, rho = seq(0, .9, length = 4), mu = c(0, 3, -2, 6), sigma2 = 1:4) data$x data$y
# Generates data from K = 3 classes. data <- generate_intraclass(n = 3:5, p = 5, rho = seq(.1, .9, length = 3), mu = c(0, 3, -2)) data$x data$y # Generates data from K = 4 classes. Notice that we use specify a variance. data <- generate_intraclass(n = 3:6, p = 4, rho = seq(0, .9, length = 4), mu = c(0, 3, -2, 6), sigma2 = 1:4) data$x data$y
This function computes the function on page 1023 of Pang
et al. (2009).
h(nu, p, t = -1)
h(nu, p, t = -1)
nu |
a specified constant (nu = N - K) |
p |
the feature space dimension. |
t |
a constant specified by the user that indicates the exponent to use with the variance estimator. By default, t = -1 as in Pang et al. See the paper for more details. |
the bias correction value
Pang, H., Tong, T., & Zhao, H. (2009). "Shrinkage-based Diagonal Discriminant Analysis and Its Applications in High-Dimensional Data," Biometrics, 65, 4, 1021-1029. http://onlinelibrary.wiley.com/doi/10.1111/j.1541-0420.2009.01200.x/abstract
Given a set of training data, this function builds the Diagonal Linear Discriminant Analysis (DLDA) classifier, which is often attributed to Dudoit et al. (2002). The DLDA classifier belongs to the family of Naive Bayes classifiers, where the distributions of each class are assumed to be multivariate normal and to share a common covariance matrix.
The DLDA classifier is a modification to LDA, where the off-diagonal elements of the pooled sample covariance matrix are set to zero.
lda_diag(x, ...) ## Default S3 method: lda_diag(x, y, prior = NULL, ...) ## S3 method for class 'formula' lda_diag(formula, data, prior = NULL, ...) ## S3 method for class 'lda_diag' predict(object, newdata, type = c("class", "prob", "score"), ...)
lda_diag(x, ...) ## Default S3 method: lda_diag(x, y, prior = NULL, ...) ## S3 method for class 'formula' lda_diag(formula, data, prior = NULL, ...) ## S3 method for class 'lda_diag' predict(object, newdata, type = c("class", "prob", "score"), ...)
x |
Matrix or data frame containing the training data. The rows are the sample observations, and the columns are the features. Only complete data are retained. |
... |
additional arguments (not currently used). |
y |
Vector of class labels for each training observation. Only complete data are retained. |
prior |
Vector with prior probabilities for each class. If NULL (default), then equal probabilities are used. See details. |
formula |
A formula of the form |
data |
data frame from which variables specified in |
object |
Fitted model object |
newdata |
Matrix or data frame of observations to predict. Each row corresponds to a new observation. |
type |
Prediction type: either '"class"', '"prob"', or '"score"'. |
The DLDA classifier is a modification to the well-known LDA classifier, where the off-diagonal elements of the pooled sample covariance matrix are assumed to be zero – the features are assumed to be uncorrelated. Under multivariate normality, the assumption uncorrelated features is equivalent to the assumption of independent features. The feature-independence assumption is a notable attribute of the Naive Bayes classifier family. The benefit of these classifiers is that they are fast and have much fewer parameters to estimate, especially when the number of features is quite large.
The matrix of training observations are given in x
. The rows of x
contain the sample observations, and the columns contain the features for each
training observation.
The vector of class labels given in y
are coerced to a factor
.
The length of y
should match the number of rows in x
.
An error is thrown if a given class has less than 2 observations because the variance for each feature within a class cannot be estimated with less than 2 observations.
The vector, prior
, contains the a priori class membership for
each class. If prior
is NULL (default), the class membership
probabilities are estimated as the sample proportion of observations belonging
to each class. Otherwise, prior
should be a vector with the same length
as the number of classes in y
. The prior
probabilities should be
nonnegative and sum to one.
The model fitting function returns the fitted classifier. The 'predict()' method returns either a vector ('type = "class"') or a data frame (all other 'type' values).
Dudoit, S., Fridlyand, J., & Speed, T. P. (2002). "Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data," Journal of the American Statistical Association, 97, 457, 77-87.
library(modeldata) data(penguins) pred_rows <- seq(1, 344, by = 20) penguins <- penguins[, c("species", "body_mass_g", "flipper_length_mm")] dlda_out <- lda_diag(species ~ ., data = penguins[-pred_rows, ]) predicted <- predict(dlda_out, penguins[pred_rows, -1], type = "class") dlda_out2 <- lda_diag(x = penguins[-pred_rows, -1], y = penguins$species[-pred_rows]) predicted2 <- predict(dlda_out2, penguins[pred_rows, -1], type = "class") all.equal(predicted, predicted2)
library(modeldata) data(penguins) pred_rows <- seq(1, 344, by = 20) penguins <- penguins[, c("species", "body_mass_g", "flipper_length_mm")] dlda_out <- lda_diag(species ~ ., data = penguins[-pred_rows, ]) predicted <- predict(dlda_out, penguins[pred_rows, -1], type = "class") dlda_out2 <- lda_diag(x = penguins[-pred_rows, -1], y = penguins$species[-pred_rows]) predicted2 <- predict(dlda_out2, penguins[pred_rows, -1], type = "class") all.equal(predicted, predicted2)
Given a set of training data, this function builds the MDMP classifier from
Srivistava and Kubokawa (2007). The MDMP classifier is an adaptation of the
linear discriminant analysis (LDA) classifier that is designed for
small-sample, high-dimensional data. Srivastava and Kubokawa (2007) have
proposed a modification of the standard maximum likelihood estimator of the
pooled covariance matrix, where only the largest 95% of the eigenvalues and
their corresponding eigenvectors are kept. The value of 95% is the default
and can be changed via the eigen_pct
argument.
The MDMP classifier from Srivistava and Kubokawa (2007) is an adaptation of the linear discriminant analysis (LDA) classifier that is designed for small-sample, high-dimensional data. Srivastava and Kubokawa (2007) have proposed a modification of the standard maximum likelihood estimator of the pooled covariance matrix, where only the largest 95% of the eigenvalues and their corresponding eigenvectors are kept.
lda_eigen(x, ...) ## Default S3 method: lda_eigen(x, y, prior = NULL, eigen_pct = 0.95, ...) ## S3 method for class 'formula' lda_eigen(formula, data, prior = NULL, ...) ## S3 method for class 'lda_eigen' predict(object, newdata, type = c("class", "prob", "score"), ...)
lda_eigen(x, ...) ## Default S3 method: lda_eigen(x, y, prior = NULL, eigen_pct = 0.95, ...) ## S3 method for class 'formula' lda_eigen(formula, data, prior = NULL, ...) ## S3 method for class 'lda_eigen' predict(object, newdata, type = c("class", "prob", "score"), ...)
x |
Matrix or data frame containing the training data. The rows are the sample observations, and the columns are the features. Only complete data are retained. |
... |
additional arguments (not currently used). |
y |
Vector of class labels for each training observation. Only complete data are retained. |
prior |
Vector with prior probabilities for each class. If NULL (default), then equal probabilities are used. See details. |
eigen_pct |
the percentage of eigenvalues kept |
formula |
A formula of the form |
data |
data frame from which variables specified in |
object |
Fitted model object |
newdata |
Matrix or data frame of observations to predict. Each row corresponds to a new observation. |
type |
Prediction type: either '"class"', '"prob"', or '"score"'. |
The matrix of training observations are given in x
. The rows of x
contain the sample observations, and the columns contain the features for each
training observation.
The vector of class labels given in y
are coerced to a factor
.
The length of y
should match the number of rows in x
.
An error is thrown if a given class has less than 2 observations because the variance for each feature within a class cannot be estimated with less than 2 observations.
The vector, prior
, contains the a priori class membership for
each class. If prior
is NULL (default), the class membership
probabilities are estimated as the sample proportion of observations belonging
to each class. Otherwise, prior
should be a vector with the same length
as the number of classes in y
. The prior
probabilities should be
nonnegative and sum to one.
lda_eigen
object that contains the trained MDMP classifier
Srivastava, M. and Kubokawa, T. (2007). "Comparison of Discrimination Methods for High Dimensional Data," Journal of the Japanese Statistical Association, 37, 1, 123-134.
library(modeldata) data(penguins) pred_rows <- seq(1, 344, by = 20) penguins <- penguins[, c("species", "body_mass_g", "flipper_length_mm")] mdmp_out <- lda_eigen(species ~ ., data = penguins[-pred_rows, ]) predicted <- predict(mdmp_out, penguins[pred_rows, -1], type = "class") mdmp_out2 <- lda_eigen(x = penguins[-pred_rows, -1], y = penguins$species[-pred_rows]) predicted2 <- predict(mdmp_out2, penguins[pred_rows, -1], type = "class") all.equal(predicted, predicted2)
library(modeldata) data(penguins) pred_rows <- seq(1, 344, by = 20) penguins <- penguins[, c("species", "body_mass_g", "flipper_length_mm")] mdmp_out <- lda_eigen(species ~ ., data = penguins[-pred_rows, ]) predicted <- predict(mdmp_out, penguins[pred_rows, -1], type = "class") mdmp_out2 <- lda_eigen(x = penguins[-pred_rows, -1], y = penguins$species[-pred_rows]) predicted2 <- predict(mdmp_out2, penguins[pred_rows, -1], type = "class") all.equal(predicted, predicted2)
Given a set of training data, this function builds the MDEB classifier from Srivistava and Kubokawa (2007). The MDEB classifier is an adaptation of the linear discriminant analysis (LDA) classifier that is designed for small-sample, high-dimensional data. Rather than using the standard maximum likelihood estimator of the pooled covariance matrix, Srivastava and Kubokawa (2007) have proposed an Empirical Bayes estimator where the eigenvalues of the pooled sample covariance matrix are shrunken towards the identity matrix: the shrinkage constant has a closed form and is quick to calculate.
The MDEB classifier from Srivistava and Kubokawa (2007) is an adaptation of the linear discriminant analysis (LDA) classifier that is designed for small-sample, high-dimensional data. Rather than using the standard maximum likelihood estimator of the pooled covariance matrix, Srivastava and Kubokawa (2007) have proposed an Empirical Bayes estimator where the eigenvalues of the pooled sample covariance matrix are shrunken towards the identity matrix: the shrinkage constant has a closed form and is quick to calculate
lda_emp_bayes(x, ...) ## Default S3 method: lda_emp_bayes(x, y, prior = NULL, ...) ## S3 method for class 'formula' lda_emp_bayes(formula, data, prior = NULL, ...) ## S3 method for class 'lda_emp_bayes' predict(object, newdata, type = c("class", "prob", "score"), ...)
lda_emp_bayes(x, ...) ## Default S3 method: lda_emp_bayes(x, y, prior = NULL, ...) ## S3 method for class 'formula' lda_emp_bayes(formula, data, prior = NULL, ...) ## S3 method for class 'lda_emp_bayes' predict(object, newdata, type = c("class", "prob", "score"), ...)
x |
Matrix or data frame containing the training data. The rows are the sample observations, and the columns are the features. Only complete data are retained. |
... |
additional arguments (not currently used). |
y |
Vector of class labels for each training observation. Only complete data are retained. |
prior |
Vector with prior probabilities for each class. If NULL (default), then equal probabilities are used. See details. |
formula |
A formula of the form |
data |
data frame from which variables specified in |
object |
Fitted model object |
newdata |
Matrix or data frame of observations to predict. Each row corresponds to a new observation. |
type |
Prediction type: either '"class"', '"prob"', or '"score"'. |
The matrix of training observations are given in x
. The rows of x
contain the sample observations, and the columns contain the features for each
training observation.
The vector of class labels given in y
are coerced to a factor
.
The length of y
should match the number of rows in x
.
An error is thrown if a given class has less than 2 observations because the variance for each feature within a class cannot be estimated with less than 2 observations.
The vector, prior
, contains the a priori class membership for
each class. If prior
is NULL (default), the class membership
probabilities are estimated as the sample proportion of observations belonging
to each class. Otherwise, prior
should be a vector with the same length
as the number of classes in y
. The prior
probabilities should be
nonnegative and sum to one.
lda_emp_bayes
object that contains the trained MDEB classifier
Srivastava, M. and Kubokawa, T. (2007). "Comparison of Discrimination Methods for High Dimensional Data," Journal of the Japanese Statistical Association, 37, 1, 123-134.
library(modeldata) data(penguins) pred_rows <- seq(1, 344, by = 20) penguins <- penguins[, c("species", "body_mass_g", "flipper_length_mm")] mdeb_out <- lda_emp_bayes(species ~ ., data = penguins[-pred_rows, ]) predicted <- predict(mdeb_out, penguins[pred_rows, -1], type = "class") mdeb_out2 <- lda_emp_bayes(x = penguins[-pred_rows, -1], y = penguins$species[-pred_rows]) predicted2 <- predict(mdeb_out2, penguins[pred_rows, -1], type = "class") all.equal(predicted, predicted2)
library(modeldata) data(penguins) pred_rows <- seq(1, 344, by = 20) penguins <- penguins[, c("species", "body_mass_g", "flipper_length_mm")] mdeb_out <- lda_emp_bayes(species ~ ., data = penguins[-pred_rows, ]) predicted <- predict(mdeb_out, penguins[pred_rows, -1], type = "class") mdeb_out2 <- lda_emp_bayes(x = penguins[-pred_rows, -1], y = penguins$species[-pred_rows]) predicted2 <- predict(mdeb_out2, penguins[pred_rows, -1], type = "class") all.equal(predicted, predicted2)
Given a set of training data, this function builds the MDMEB classifier from
Srivistava and Kubokawa (2007). The MDMEB classifier is an adaptation of the
linear discriminant analysis (LDA) classifier that is designed for
small-sample, high-dimensional data. Srivastava and Kubokawa (2007) have
proposed a modification of the standard maximum likelihood estimator of the
pooled covariance matrix, where only the largest 95% of the eigenvalues and
their corresponding eigenvectors are kept. The resulting covariance matrix is
then shrunken towards a scaled identity matrix. The value of 95% is the
default and can be changed via the eigen_pct
argument.
The MDMEB classifier is an adaptation of the linear discriminant analysis (LDA) classifier that is designed for small-sample, high-dimensional data. Srivastava and Kubokawa (2007) have proposed a modification of the standard maximum likelihood estimator of the pooled covariance matrix, where only the largest 95% of the eigenvalues and their corresponding eigenvectors are kept. The resulting covariance matrix is then shrunken towards a scaled identity matrix.
lda_emp_bayes_eigen(x, ...) ## Default S3 method: lda_emp_bayes_eigen(x, y, prior = NULL, eigen_pct = 0.95, ...) ## S3 method for class 'formula' lda_emp_bayes_eigen(formula, data, prior = NULL, ...) ## S3 method for class 'lda_emp_bayes_eigen' predict(object, newdata, type = c("class", "prob", "score"), ...)
lda_emp_bayes_eigen(x, ...) ## Default S3 method: lda_emp_bayes_eigen(x, y, prior = NULL, eigen_pct = 0.95, ...) ## S3 method for class 'formula' lda_emp_bayes_eigen(formula, data, prior = NULL, ...) ## S3 method for class 'lda_emp_bayes_eigen' predict(object, newdata, type = c("class", "prob", "score"), ...)
x |
Matrix or data frame containing the training data. The rows are the sample observations, and the columns are the features. Only complete data are retained. |
... |
additional arguments (not currently used). |
y |
Vector of class labels for each training observation. Only complete data are retained. |
prior |
Vector with prior probabilities for each class. If NULL (default), then equal probabilities are used. See details. |
eigen_pct |
the percentage of eigenvalues kept |
formula |
A formula of the form |
data |
data frame from which variables specified in |
object |
Fitted model object |
newdata |
Matrix or data frame of observations to predict. Each row corresponds to a new observation. |
type |
Prediction type: either '"class"', '"prob"', or '"score"'. |
The matrix of training observations are given in x
. The rows of x
contain the sample observations, and the columns contain the features for each
training observation.
The vector of class labels given in y
are coerced to a factor
.
The length of y
should match the number of rows in x
.
An error is thrown if a given class has less than 2 observations because the variance for each feature within a class cannot be estimated with less than 2 observations.
The vector, prior
, contains the a priori class membership for
each class. If prior
is NULL (default), the class membership
probabilities are estimated as the sample proportion of observations belonging
to each class. Otherwise, prior
should be a vector with the same length
as the number of classes in y
. The prior
probabilities should be
nonnegative and sum to one.
lda_emp_bayes_eigen
object that contains the trained MDMEB classifier
Srivastava, M. and Kubokawa, T. (2007). "Comparison of Discrimination Methods for High Dimensional Data," Journal of the Japanese Statistical Association, 37, 1, 123-134.
library(modeldata) data(penguins) pred_rows <- seq(1, 344, by = 20) penguins <- penguins[, c("species", "body_mass_g", "flipper_length_mm")] mdmeb_out <- lda_emp_bayes_eigen(species ~ ., data = penguins[-pred_rows, ]) predicted <- predict(mdmeb_out, penguins[pred_rows, -1], type = "class") mdmeb_out2 <- lda_emp_bayes_eigen(x = penguins[-pred_rows, -1], y = penguins$species[-pred_rows]) predicted2 <- predict(mdmeb_out2, penguins[pred_rows, -1], type = "class") all.equal(predicted, predicted2)
library(modeldata) data(penguins) pred_rows <- seq(1, 344, by = 20) penguins <- penguins[, c("species", "body_mass_g", "flipper_length_mm")] mdmeb_out <- lda_emp_bayes_eigen(species ~ ., data = penguins[-pred_rows, ]) predicted <- predict(mdmeb_out, penguins[pred_rows, -1], type = "class") mdmeb_out2 <- lda_emp_bayes_eigen(x = penguins[-pred_rows, -1], y = penguins$species[-pred_rows]) predicted2 <- predict(mdmeb_out2, penguins[pred_rows, -1], type = "class") all.equal(predicted, predicted2)
Given a set of training data, this function builds the Linear Discriminant Analysis (LDA) classifier, where the distributions of each class are assumed to be multivariate normal and share a common covariance matrix. When the pooled sample covariance matrix is singular, the linear discriminant function is incalculable. A common method to overcome this issue is to replace the inverse of the pooled sample covariance matrix with the Moore-Penrose pseudo-inverse, which is unique and always exists. Note that when the pooled sample covariance matrix is nonsingular, it is equal to the pseudo-inverse.
The Linear Discriminant Analysis (LDA) classifier involves the assumption that the distributions of each class are assumed to be multivariate normal and share a common covariance matrix. When the pooled sample covariance matrix is singular, the linear discriminant function is incalculable. A common method to overcome this issue is to replace the inverse of the pooled sample covariance matrix with the Moore-Penrose pseudo-inverse, which is unique and always exists. Note that when the pooled sample covariance matrix is nonsingular, it is equal to the pseudo-inverse.
lda_pseudo(x, ...) ## Default S3 method: lda_pseudo(x, y, prior = NULL, tol = 1e-08, ...) ## S3 method for class 'formula' lda_pseudo(formula, data, prior = NULL, tol = 1e-08, ...) ## S3 method for class 'lda_pseudo' predict(object, newdata, type = c("class", "prob", "score"), ...)
lda_pseudo(x, ...) ## Default S3 method: lda_pseudo(x, y, prior = NULL, tol = 1e-08, ...) ## S3 method for class 'formula' lda_pseudo(formula, data, prior = NULL, tol = 1e-08, ...) ## S3 method for class 'lda_pseudo' predict(object, newdata, type = c("class", "prob", "score"), ...)
x |
Matrix or data frame containing the training data. The rows are the sample observations, and the columns are the features. Only complete data are retained. |
... |
additional arguments (not currently used). |
y |
Vector of class labels for each training observation. Only complete data are retained. |
prior |
Vector with prior probabilities for each class. If NULL (default), then equal probabilities are used. See details. |
tol |
tolerance value below which eigenvalues are considered numerically equal to 0 |
formula |
A formula of the form |
data |
data frame from which variables specified in |
object |
Fitted model object |
newdata |
Matrix or data frame of observations to predict. Each row corresponds to a new observation. |
type |
Prediction type: either '"class"', '"prob"', or '"score"'. |
The matrix of training observations are given in x
. The rows of x
contain the sample observations, and the columns contain the features for each
training observation.
The vector of class labels given in y
are coerced to a factor
.
The length of y
should match the number of rows in x
.
An error is thrown if a given class has less than 2 observations because the variance for each feature within a class cannot be estimated with less than 2 observations.
The vector, prior
, contains the a priori class membership for
each class. If prior
is NULL (default), the class membership
probabilities are estimated as the sample proportion of observations belonging
to each class. Otherwise, prior
should be a vector with the same length
as the number of classes in y
. The prior
probabilities should be
nonnegative and sum to one.
lda_pseudo
object that contains the trained lda_pseudo
classifier
library(modeldata) data(penguins) pred_rows <- seq(1, 344, by = 20) penguins <- penguins[, c("species", "body_mass_g", "flipper_length_mm")] lda_pseudo_out <- lda_pseudo(species ~ ., data = penguins[-pred_rows, ]) predicted <- predict(lda_pseudo_out, penguins[pred_rows, -1], type = "class") lda_pseudo_out2 <- lda_pseudo(x = penguins[-pred_rows, -1], y = penguins$species[-pred_rows]) predicted2 <- predict(lda_pseudo_out2, penguins[pred_rows, -1], type = "class") all.equal(predicted, predicted2)
library(modeldata) data(penguins) pred_rows <- seq(1, 344, by = 20) penguins <- penguins[, c("species", "body_mass_g", "flipper_length_mm")] lda_pseudo_out <- lda_pseudo(species ~ ., data = penguins[-pred_rows, ]) predicted <- predict(lda_pseudo_out, penguins[pred_rows, -1], type = "class") lda_pseudo_out2 <- lda_pseudo(x = penguins[-pred_rows, -1], y = penguins$species[-pred_rows]) predicted2 <- predict(lda_pseudo_out2, penguins[pred_rows, -1], type = "class") all.equal(predicted, predicted2)
Given a set of training data, this function builds the Linear Discriminant
Analysis (LDA) classifier, where the distributions of each class are assumed
to be multivariate normal and share a common covariance matrix. When the
pooled sample covariance matrix is singular, the linear discriminant function
is incalculable. This function replaces the inverse of pooled sample
covariance matrix with an estimator proposed by Schafer and Strimmer
(2005). The estimator is calculated via corpcor::invcov.shrink()
.
The Linear Discriminant Analysis (LDA) classifier involves the assumption that the distributions of each class are assumed to be multivariate normal and share a common covariance matrix. When the pooled sample covariance matrix is singular, the linear discriminant function is incalculable. Here, the inverse of the pooled sample covariance matrix is replaced with an estimator from Schafer and Strimmer (2005).
lda_schafer(x, ...) ## Default S3 method: lda_schafer(x, y, prior = NULL, ...) ## S3 method for class 'formula' lda_schafer(formula, data, prior = NULL, ...) ## S3 method for class 'lda_schafer' predict(object, newdata, type = c("class", "prob", "score"), ...)
lda_schafer(x, ...) ## Default S3 method: lda_schafer(x, y, prior = NULL, ...) ## S3 method for class 'formula' lda_schafer(formula, data, prior = NULL, ...) ## S3 method for class 'lda_schafer' predict(object, newdata, type = c("class", "prob", "score"), ...)
x |
Matrix or data frame containing the training data. The rows are the sample observations, and the columns are the features. Only complete data are retained. |
... |
Options passed to |
y |
Vector of class labels for each training observation. Only complete data are retained. |
prior |
Vector with prior probabilities for each class. If NULL (default), then equal probabilities are used. See details. |
formula |
A formula of the form |
data |
data frame from which variables specified in |
object |
Fitted model object |
newdata |
Matrix or data frame of observations to predict. Each row corresponds to a new observation. |
type |
Prediction type: either '"class"', '"prob"', or '"score"'. |
The matrix of training observations are given in x
. The rows of x
contain the sample observations, and the columns contain the features for each
training observation.
The vector of class labels given in y
are coerced to a factor
.
The length of y
should match the number of rows in x
.
An error is thrown if a given class has less than 2 observations because the variance for each feature within a class cannot be estimated with less than 2 observations.
The vector, prior
, contains the a priori class membership for
each class. If prior
is NULL (default), the class membership
probabilities are estimated as the sample proportion of observations belonging
to each class. Otherwise, prior
should be a vector with the same length
as the number of classes in y
. The prior
probabilities should be
nonnegative and sum to one.
lda_schafer
object that contains the trained classifier
Schafer, J., and Strimmer, K. (2005). "A shrinkage approach to large-scale covariance estimation and implications for functional genomics," Statist. Appl. Genet. Mol. Biol. 4, 32.
library(modeldata) data(penguins) pred_rows <- seq(1, 344, by = 20) penguins <- penguins[, c("species", "body_mass_g", "flipper_length_mm")] lda_schafer_out <- lda_schafer(species ~ ., data = penguins[-pred_rows, ]) predicted <- predict(lda_schafer_out, penguins[pred_rows, -1], type = "class") lda_schafer_out2 <- lda_schafer(x = penguins[-pred_rows, -1], y = penguins$species[-pred_rows]) predicted2 <- predict(lda_schafer_out2, penguins[pred_rows, -1], type = "class") all.equal(predicted, predicted2)
library(modeldata) data(penguins) pred_rows <- seq(1, 344, by = 20) penguins <- penguins[, c("species", "body_mass_g", "flipper_length_mm")] lda_schafer_out <- lda_schafer(species ~ ., data = penguins[-pred_rows, ]) predicted <- predict(lda_schafer_out, penguins[pred_rows, -1], type = "class") lda_schafer_out2 <- lda_schafer(x = penguins[-pred_rows, -1], y = penguins$species[-pred_rows]) predicted2 <- predict(lda_schafer_out2, penguins[pred_rows, -1], type = "class") all.equal(predicted, predicted2)
Given a set of training data, this function builds the Shrinkage-based Diagonal Linear Discriminant Analysis (SDLDA) classifier, which is based on the DLDA classifier, often attributed to Dudoit et al. (2002). The DLDA classifier belongs to the family of Naive Bayes classifiers, where the distributions of each class are assumed to be multivariate normal and to share a common covariance matrix. To improve the estimation of the pooled variances, Pang et al. (2009) proposed the SDLDA classifier which uses a shrinkage-based estimators of the pooled covariance matrix.
The SDLDA classifier is a modification to LDA, where the off-diagonal elements of the pooled sample covariance matrix are set to zero. To improve the estimation of the pooled variances, we use a shrinkage method from Pang et al. (2009).
lda_shrink_cov(x, ...) ## Default S3 method: lda_shrink_cov(x, y, prior = NULL, num_alphas = 101, ...) ## S3 method for class 'formula' lda_shrink_cov(formula, data, prior = NULL, num_alphas = 101, ...) ## S3 method for class 'lda_shrink_cov' predict(object, newdata, type = c("class", "prob", "score"), ...)
lda_shrink_cov(x, ...) ## Default S3 method: lda_shrink_cov(x, y, prior = NULL, num_alphas = 101, ...) ## S3 method for class 'formula' lda_shrink_cov(formula, data, prior = NULL, num_alphas = 101, ...) ## S3 method for class 'lda_shrink_cov' predict(object, newdata, type = c("class", "prob", "score"), ...)
x |
Matrix or data frame containing the training data. The rows are the sample observations, and the columns are the features. Only complete data are retained. |
... |
additional arguments (not currently used). |
y |
Vector of class labels for each training observation. Only complete data are retained. |
prior |
Vector with prior probabilities for each class. If NULL (default), then equal probabilities are used. See details. |
num_alphas |
the number of values used to find the optimal amount of shrinkage |
formula |
A formula of the form |
data |
data frame from which variables specified in |
object |
Fitted model object |
newdata |
Matrix or data frame of observations to predict. Each row corresponds to a new observation. |
type |
Prediction type: either '"class"', '"prob"', or '"score"'. |
The DLDA classifier is a modification to the well-known LDA classifier, where the off-diagonal elements of the pooled covariance matrix are assumed to be zero – the features are assumed to be uncorrelated. Under multivariate normality, the assumption uncorrelated features is equivalent to the assumption of independent features. The feature-independence assumption is a notable attribute of the Naive Bayes classifier family. The benefit of these classifiers is that they are fast and have much fewer parameters to estimate, especially when the number of features is quite large.
The matrix of training observations are given in x
. The rows of
x
contain the sample observations, and the columns contain the
features for each training observation.
The vector of class labels given in y
are coerced to a factor
.
The length of y
should match the number of rows in x
.
An error is thrown if a given class has less than 2 observations because the variance for each feature within a class cannot be estimated with less than 2 observations.
The vector, prior
, contains the a priori class membership for
each class. If prior
is NULL (default), the class membership
probabilities are estimated as the sample proportion of observations
belonging to each class. Otherwise, prior
should be a vector with the
same length as the number of classes in y
. The prior
probabilities should be nonnegative and sum to one.
lda_shrink_cov
object that contains the trained SDLDA classifier
Dudoit, S., Fridlyand, J., & Speed, T. P. (2002). "Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data," Journal of the American Statistical Association, 97, 457, 77-87.
Pang, H., Tong, T., & Zhao, H. (2009). "Shrinkage-based Diagonal Discriminant Analysis and Its Applications in High-Dimensional Data," Biometrics, 65, 4, 1021-1029.
library(modeldata) data(penguins) pred_rows <- seq(1, 344, by = 20) penguins <- penguins[, c("species", "body_mass_g", "flipper_length_mm")] sdlda_out <- lda_shrink_cov(species ~ ., data = penguins[-pred_rows, ]) predicted <- predict(sdlda_out, penguins[pred_rows, -1], type = "class") sdlda_out2 <- lda_shrink_cov(x = penguins[-pred_rows, -1], y = penguins$species[-pred_rows]) predicted2 <- predict(sdlda_out2, penguins[pred_rows, -1], type = "class") all.equal(predicted, predicted2)
library(modeldata) data(penguins) pred_rows <- seq(1, 344, by = 20) penguins <- penguins[, c("species", "body_mass_g", "flipper_length_mm")] sdlda_out <- lda_shrink_cov(species ~ ., data = penguins[-pred_rows, ]) predicted <- predict(sdlda_out, penguins[pred_rows, -1], type = "class") sdlda_out2 <- lda_shrink_cov(x = penguins[-pred_rows, -1], y = penguins$species[-pred_rows]) predicted2 <- predict(sdlda_out2, penguins[pred_rows, -1], type = "class") all.equal(predicted, predicted2)
Given a set of training data, this function builds the Shrinkage-mean-based
Diagonal Linear Discriminant Analysis (SmDLDA) classifier from Tong, Chen,
and Zhao (2012). The SmDLDA classifier incorporates a Lindley-type shrunken
mean estimator into the DLDA classifier from Dudoit et al. (2002). For more
about the DLDA classifier, see lda_diag()
.
The SmDLDA classifier is a modification to LDA, where the off-diagonal elements of the pooled sample covariance matrix are set to zero.
lda_shrink_mean(x, ...) ## Default S3 method: lda_shrink_mean(x, y, prior = NULL, ...) ## S3 method for class 'formula' lda_shrink_mean(formula, data, prior = NULL, ...) ## S3 method for class 'lda_shrink_mean' predict(object, newdata, type = c("class", "prob", "score"), ...)
lda_shrink_mean(x, ...) ## Default S3 method: lda_shrink_mean(x, y, prior = NULL, ...) ## S3 method for class 'formula' lda_shrink_mean(formula, data, prior = NULL, ...) ## S3 method for class 'lda_shrink_mean' predict(object, newdata, type = c("class", "prob", "score"), ...)
x |
Matrix or data frame containing the training data. The rows are the sample observations, and the columns are the features. Only complete data are retained. |
... |
additional arguments (not currently used). |
y |
Vector of class labels for each training observation. Only complete data are retained. |
prior |
Vector with prior probabilities for each class. If NULL (default), then equal probabilities are used. See details. |
formula |
A formula of the form |
data |
data frame from which variables specified in |
object |
Fitted model object |
newdata |
Matrix or data frame of observations to predict. Each row corresponds to a new observation. |
type |
Prediction type: either '"class"', '"prob"', or '"score"'. |
The DLDA classifier belongs to the family of Naive Bayes classifiers, where the distributions of each class are assumed to be multivariate normal and to share a common covariance matrix.
The DLDA classifier is a modification to the well-known LDA classifier, where the off-diagonal elements of the pooled sample covariance matrix are assumed to be zero – the features are assumed to be uncorrelated. Under multivariate normality, the assumption uncorrelated features is equivalent to the assumption of independent features. The feature-independence assumption is a notable attribute of the Naive Bayes classifier family. The benefit of these classifiers is that they are fast and have much fewer parameters to estimate, especially when the number of features is quite large.
The matrix of training observations are given in x
. The rows of x
contain the sample observations, and the columns contain the features for each
training observation.
The vector of class labels given in y
are coerced to a factor
.
The length of y
should match the number of rows in x
.
An error is thrown if a given class has less than 2 observations because the variance for each feature within a class cannot be estimated with less than 2 observations.
The vector, prior
, contains the a priori class membership for
each class. If prior
is NULL (default), the class membership
probabilities are estimated as the sample proportion of observations belonging
to each class. Otherwise, prior
should be a vector with the same length
as the number of classes in y
. The prior
probabilities should be
nonnegative and sum to one.
lda_shrink_mean
object that contains the trained SmDLDA classifier
Tong, T., Chen, L., and Zhao, H. (2012), "Improved Mean Estimation and Its Application to Diagonal Discriminant Analysis," Bioinformatics, 28, 4, 531-537. http://bioinformatics.oxfordjournals.org/content/28/4/531.long
Dudoit, S., Fridlyand, J., & Speed, T. P. (2002). "Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data," Journal of the American Statistical Association, 97, 457, 77-87.
library(modeldata) data(penguins) pred_rows <- seq(1, 344, by = 20) penguins <- penguins[, c("species", "body_mass_g", "flipper_length_mm")] smdlda_out <- lda_shrink_mean(species ~ ., data = penguins[-pred_rows, ]) predicted <- predict(smdlda_out, penguins[pred_rows, -1], type = "class") smdlda_out2 <- lda_shrink_mean(x = penguins[-pred_rows, -1], y = penguins$species[-pred_rows]) predicted2 <- predict(smdlda_out2, penguins[pred_rows, -1], type = "class") all.equal(predicted, predicted2)
library(modeldata) data(penguins) pred_rows <- seq(1, 344, by = 20) penguins <- penguins[, c("species", "body_mass_g", "flipper_length_mm")] smdlda_out <- lda_shrink_mean(species ~ ., data = penguins[-pred_rows, ]) predicted <- predict(smdlda_out, penguins[pred_rows, -1], type = "class") smdlda_out2 <- lda_shrink_mean(x = penguins[-pred_rows, -1], y = penguins$species[-pred_rows]) predicted2 <- predict(smdlda_out2, penguins[pred_rows, -1], type = "class") all.equal(predicted, predicted2)
Given a set of training data, this function builds the Linear Discriminant Analysis (LDA) classifier, where the distributions of each class are assumed to be multivariate normal and share a common covariance matrix. When the pooled sample covariance matrix is singular, the linear discriminant function is incalculable. This function replaces the pooled sample covariance matrix with a regularized estimator from Thomaz et al. (2006), where the smallest eigenvalues are replaced with the average eigenvalue. Specifically, small eigenvalues here means that the eigenvalues are less than the average eigenvalue.
Given a set of training data, this function builds the Linear Discriminant Analysis (LDA) classifier, where the distributions of each class are assumed to be multivariate normal and share a common covariance matrix. When the pooled sample covariance matrix is singular, the linear discriminant function is incalculable. This function replaces the pooled sample covariance matrix with a regularized estimator from Thomaz et al. (2006), where the smallest eigenvalues are replaced with the average eigenvalue. Specifically, small eigenvalues here means that the eigenvalues are less than the average eigenvalue.
lda_thomaz(x, ...) ## Default S3 method: lda_thomaz(x, y, prior = NULL, ...) ## S3 method for class 'formula' lda_thomaz(formula, data, prior = NULL, ...) ## S3 method for class 'lda_thomaz' predict(object, newdata, type = c("class", "prob", "score"), ...)
lda_thomaz(x, ...) ## Default S3 method: lda_thomaz(x, y, prior = NULL, ...) ## S3 method for class 'formula' lda_thomaz(formula, data, prior = NULL, ...) ## S3 method for class 'lda_thomaz' predict(object, newdata, type = c("class", "prob", "score"), ...)
x |
Matrix or data frame containing the training data. The rows are the sample observations, and the columns are the features. Only complete data are retained. |
... |
additional arguments (not currently used). |
y |
Vector of class labels for each training observation. Only complete data are retained. |
prior |
Vector with prior probabilities for each class. If NULL (default), then equal probabilities are used. See details. |
formula |
A formula of the form |
data |
data frame from which variables specified in |
object |
Fitted model object |
newdata |
Matrix or data frame of observations to predict. Each row corresponds to a new observation. |
type |
Prediction type: either '"class"', '"prob"', or '"score"'. |
The matrix of training observations are given in x
. The rows of x
contain the sample observations, and the columns contain the features for each
training observation.
The vector of class labels given in y
are coerced to a factor
.
The length of y
should match the number of rows in x
.
An error is thrown if a given class has less than 2 observations because the variance for each feature within a class cannot be estimated with less than 2 observations.
The vector, prior
, contains the a priori class membership for
each class. If prior
is NULL (default), the class membership
probabilities are estimated as the sample proportion of observations belonging
to each class. Otherwise, prior
should be a vector with the same length
as the number of classes in y
. The prior
probabilities should be
nonnegative and sum to one.
lda_thomaz
object that contains the trained classifier
Thomaz, C. E., Kitani, E. C., and Gillies, D. F. (2006). "A maximum uncertainty LDA-based approach for limited sample size problems with application to face recognition," J. Braz. Comp. Soc., 12, 2, 7-18.
library(modeldata) data(penguins) pred_rows <- seq(1, 344, by = 20) penguins <- penguins[, c("species", "body_mass_g", "flipper_length_mm")] lda_thomaz_out <- lda_thomaz(species ~ ., data = penguins[-pred_rows, ]) predicted <- predict(lda_thomaz_out, penguins[pred_rows, -1], type = "class") lda_thomaz_out2 <- lda_thomaz(x = penguins[-pred_rows, -1], y = penguins$species[-pred_rows]) predicted2 <- predict(lda_thomaz_out2, penguins[pred_rows, -1], type = "class") all.equal(predicted, predicted2)
library(modeldata) data(penguins) pred_rows <- seq(1, 344, by = 20) penguins <- penguins[, c("species", "body_mass_g", "flipper_length_mm")] lda_thomaz_out <- lda_thomaz(species ~ ., data = penguins[-pred_rows, ]) predicted <- predict(lda_thomaz_out, penguins[pred_rows, -1], type = "class") lda_thomaz_out2 <- lda_thomaz(x = penguins[-pred_rows, -1], y = penguins$species[-pred_rows]) predicted2 <- predict(lda_thomaz_out2, penguins[pred_rows, -1], type = "class") all.equal(predicted, predicted2)
Computes the log determinant of a matrix.
log_determinant(x)
log_determinant(x)
x |
matrix |
log determinant of x
Often, we prefer not to have an intercept term in a model, but user-specified formulas might have included the intercept term. In this case, we wish to update the formula but without the intercept term. This is especially true in numerous classification models, where errors and doom can occur if an intercept is included in the model.
no_intercept(formula, data)
no_intercept(formula, data)
formula |
a model formula to remove its intercept term |
data |
data frame |
formula with no intercept term
iris_formula <- formula(Species ~ .) sparsediscrim:::no_intercept(iris_formula, data = iris)
iris_formula <- formula(Species ~ .) sparsediscrim:::no_intercept(iris_formula, data = iris)
Uses ggplot2::ggplot2()
to plot a heatmap of the training error
grid.
## S3 method for class 'rda_high_dim_cv' plot(x, ...)
## S3 method for class 'rda_high_dim_cv' plot(x, ...)
x |
object to plot |
... |
unused |
Computes posterior probabilities via Bayes Theorem under normality
posterior_probs(x, means, covs, priors)
posterior_probs(x, means, covs, priors)
x |
matrix of observations |
means |
list of means for each class |
covs |
list of covariance matrices for each class |
priors |
list of prior probabilities for each class |
matrix of posterior probabilities for each observation
Given a set of training data, this function builds the Diagonal Quadratic Discriminant Analysis (DQDA) classifier, which is often attributed to Dudoit et al. (2002). The DQDA classifier belongs to the family of Naive Bayes classifiers, where the distributions of each class are assumed to be multivariate normal. Note that the DLDA classifier is a special case of the DQDA classifier.
The DQDA classifier is a modification to QDA, where the off-diagonal elements of the pooled sample covariance matrix are set to zero.
qda_diag(x, ...) ## Default S3 method: qda_diag(x, y, prior = NULL, ...) ## S3 method for class 'formula' qda_diag(formula, data, prior = NULL, ...) ## S3 method for class 'qda_diag' predict(object, newdata, type = c("class", "prob", "score"), ...)
qda_diag(x, ...) ## Default S3 method: qda_diag(x, y, prior = NULL, ...) ## S3 method for class 'formula' qda_diag(formula, data, prior = NULL, ...) ## S3 method for class 'qda_diag' predict(object, newdata, type = c("class", "prob", "score"), ...)
x |
Matrix or data frame containing the training data. The rows are the sample observations, and the columns are the features. Only complete data are retained. |
... |
additional arguments (not currently used). |
y |
Vector of class labels for each training observation. Only complete data are retained. |
prior |
Vector with prior probabilities for each class. If NULL (default), then equal probabilities are used. See details. |
formula |
A formula of the form |
data |
data frame from which variables specified in |
object |
Fitted model object |
newdata |
Matrix or data frame of observations to predict. Each row corresponds to a new observation. |
type |
Prediction type: either '"class"', '"prob"', or '"score"'. |
The DQDA classifier is a modification to the well-known QDA classifier, where the off-diagonal elements of each class covariance matrix are assumed to be zero – the features are assumed to be uncorrelated. Under multivariate normality, the assumption uncorrelated features is equivalent to the assumption of independent features. The feature-independence assumption is a notable attribute of the Naive Bayes classifier family. The benefit of these classifiers is that they are fast and have much fewer parameters to estimate, especially when the number of features is quite large.
The matrix of training observations are given in x
. The rows of x
contain the sample observations, and the columns contain the features for each
training observation.
The vector of class labels given in y
are coerced to a factor
.
The length of y
should match the number of rows in x
.
An error is thrown if a given class has less than 2 observations because the variance for each feature within a class cannot be estimated with less than 2 observations.
The vector, prior
, contains the a priori class membership for
each class. If prior
is NULL (default), the class membership
probabilities are estimated as the sample proportion of observations belonging
to each class. Otherwise, prior
should be a vector with the same length
as the number of classes in y
. The prior
probabilities should be
nonnegative and sum to one.
qda_diag
object that contains the trained DQDA classifier
Dudoit, S., Fridlyand, J., & Speed, T. P. (2002). "Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data," Journal of the American Statistical Association, 97, 457, 77-87.
library(modeldata) data(penguins) pred_rows <- seq(1, 344, by = 20) penguins <- penguins[, c("species", "body_mass_g", "flipper_length_mm")] dqda_out <- qda_diag(species ~ ., data = penguins[-pred_rows, ]) predicted <- predict(dqda_out, penguins[pred_rows, -1], type = "class") dqda_out2 <- qda_diag(x = penguins[-pred_rows, -1], y = penguins$species[-pred_rows]) predicted2 <- predict(dqda_out2, penguins[pred_rows, -1], type = "class") all.equal(predicted, predicted2)
library(modeldata) data(penguins) pred_rows <- seq(1, 344, by = 20) penguins <- penguins[, c("species", "body_mass_g", "flipper_length_mm")] dqda_out <- qda_diag(species ~ ., data = penguins[-pred_rows, ]) predicted <- predict(dqda_out, penguins[pred_rows, -1], type = "class") dqda_out2 <- qda_diag(x = penguins[-pred_rows, -1], y = penguins$species[-pred_rows]) predicted2 <- predict(dqda_out2, penguins[pred_rows, -1], type = "class") all.equal(predicted, predicted2)
Given a set of training data, this function builds the Shrinkage-based Diagonal Quadratic Discriminant Analysis (SDQDA) classifier, which is based on the DQDA classifier, often attributed to Dudoit et al. (2002). The DQDA classifier belongs to the family of Naive Bayes classifiers, where the distributions of each class are assumed to be multivariate normal. To improve the estimation of the class variances, Pang et al. (2009) proposed the SDQDA classifier which uses a shrinkage-based estimators of each class covariance matrix.
The SDQDA classifier is a modification to QDA, where the off-diagonal elements of the pooled sample covariance matrix are set to zero. To improve the estimation of the pooled variances, we use a shrinkage method from Pang et al. (2009).
qda_shrink_cov(x, ...) ## Default S3 method: qda_shrink_cov(x, y, prior = NULL, num_alphas = 101, ...) ## S3 method for class 'formula' qda_shrink_cov(formula, data, prior = NULL, num_alphas = 101, ...) ## S3 method for class 'qda_shrink_cov' predict(object, newdata, type = c("class", "prob", "score"), ...)
qda_shrink_cov(x, ...) ## Default S3 method: qda_shrink_cov(x, y, prior = NULL, num_alphas = 101, ...) ## S3 method for class 'formula' qda_shrink_cov(formula, data, prior = NULL, num_alphas = 101, ...) ## S3 method for class 'qda_shrink_cov' predict(object, newdata, type = c("class", "prob", "score"), ...)
x |
Matrix or data frame containing the training data. The rows are the sample observations, and the columns are the features. Only complete data are retained. |
... |
additional arguments (not currently used). |
y |
Vector of class labels for each training observation. Only complete data are retained. |
prior |
Vector with prior probabilities for each class. If NULL (default), then equal probabilities are used. See details. |
num_alphas |
the number of values used to find the optimal amount of shrinkage |
formula |
A formula of the form |
data |
data frame from which variables specified in |
object |
Fitted model object |
newdata |
Matrix or data frame of observations to predict. Each row corresponds to a new observation. |
type |
Prediction type: either '"class"', '"prob"', or '"score"'. |
The DQDA classifier is a modification to the well-known QDA classifier, where the off-diagonal elements of the pooled covariance matrix are assumed to be zero – the features are assumed to be uncorrelated. Under multivariate normality, the assumption uncorrelated features is equivalent to the assumption of independent features. The feature-independence assumption is a notable attribute of the Naive Bayes classifier family. The benefit of these classifiers is that they are fast and have much fewer parameters to estimate, especially when the number of features is quite large.
The matrix of training observations are given in x
. The rows of
x
contain the sample observations, and the columns contain the
features for each training observation.
The vector of class labels given in y
are coerced to a factor
.
The length of y
should match the number of rows in x
.
An error is thrown if a given class has less than 2 observations because the variance for each feature within a class cannot be estimated with less than 2 observations.
The vector, prior
, contains the a priori class membership for
each class. If prior
is NULL (default), the class membership
probabilities are estimated as the sample proportion of observations
belonging to each class. Otherwise, prior
should be a vector with the
same length as the number of classes in y
. The prior
probabilities should be nonnegative and sum to one.
qda_shrink_cov
object that contains the trained SDQDA classifier
Dudoit, S., Fridlyand, J., & Speed, T. P. (2002). "Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data," Journal of the American Statistical Association, 97, 457, 77-87.
Pang, H., Tong, T., & Zhao, H. (2009). "Shrinkage-based Diagonal Discriminant Analysis and Its Applications in High-Dimensional Data," Biometrics, 65, 4, 1021-1029.
library(modeldata) data(penguins) pred_rows <- seq(1, 344, by = 20) penguins <- penguins[, c("species", "body_mass_g", "flipper_length_mm")]#' set.seed(42) sdqda_out <- qda_shrink_cov(species ~ ., data = penguins[-pred_rows, ]) predicted <- predict(sdqda_out, penguins[pred_rows, -1], type = "class") sdqda_out2 <- qda_shrink_cov(x = penguins[-pred_rows, -1], y = penguins$species[-pred_rows]) predicted2 <- predict(sdqda_out2, penguins[pred_rows, -1], type = "class") all.equal(predicted, predicted2)
library(modeldata) data(penguins) pred_rows <- seq(1, 344, by = 20) penguins <- penguins[, c("species", "body_mass_g", "flipper_length_mm")]#' set.seed(42) sdqda_out <- qda_shrink_cov(species ~ ., data = penguins[-pred_rows, ]) predicted <- predict(sdqda_out, penguins[pred_rows, -1], type = "class") sdqda_out2 <- qda_shrink_cov(x = penguins[-pred_rows, -1], y = penguins$species[-pred_rows]) predicted2 <- predict(sdqda_out2, penguins[pred_rows, -1], type = "class") all.equal(predicted, predicted2)
Given a set of training data, this function builds the Shrinkage-mean-based
Diagonal Quadratic Discriminant Analysis (SmDQDA) classifier from Tong, Chen,
and Zhao (2012). The SmDQDA classifier incorporates a Lindley-type shrunken
mean estimator into the DQDA classifier from Dudoit et al. (2002). For more
about the DQDA classifier, see qda_diag()
.
The SmDQDA classifier is a modification to QDA, where the off-diagonal elements of the pooled sample covariance matrix are set to zero.
qda_shrink_mean(x, ...) ## Default S3 method: qda_shrink_mean(x, y, prior = NULL, ...) ## S3 method for class 'formula' qda_shrink_mean(formula, data, prior = NULL, ...) ## S3 method for class 'qda_shrink_mean' predict(object, newdata, type = c("class", "prob", "score"), ...)
qda_shrink_mean(x, ...) ## Default S3 method: qda_shrink_mean(x, y, prior = NULL, ...) ## S3 method for class 'formula' qda_shrink_mean(formula, data, prior = NULL, ...) ## S3 method for class 'qda_shrink_mean' predict(object, newdata, type = c("class", "prob", "score"), ...)
x |
Matrix or data frame containing the training data. The rows are the sample observations, and the columns are the features. Only complete data are retained. |
... |
additional arguments (not currently used). |
y |
Vector of class labels for each training observation. Only complete data are retained. |
prior |
Vector with prior probabilities for each class. If NULL (default), then equal probabilities are used. See details. |
formula |
A formula of the form |
data |
data frame from which variables specified in |
object |
Fitted model object |
newdata |
Matrix or data frame of observations to predict. Each row corresponds to a new observation. |
type |
Prediction type: either '"class"', '"prob"', or '"score"'. |
The DQDA classifier is a modification to the well-known QDA classifier, where the off-diagonal elements of each class covariance matrix are assumed to be zero – the features are assumed to be uncorrelated. Under multivariate normality, the assumption uncorrelated features is equivalent to the assumption of independent features. The feature-independence assumption is a notable attribute of the Naive Bayes classifier family. The benefit of these classifiers is that they are fast and have much fewer parameters to estimate, especially when the number of features is quite large.
The matrix of training observations are given in x
. The rows of x
contain the sample observations, and the columns contain the features for each
training observation.
The vector of class labels given in y
are coerced to a factor
.
The length of y
should match the number of rows in x
.
An error is thrown if a given class has less than 2 observations because the variance for each feature within a class cannot be estimated with less than 2 observations.
The vector, prior
, contains the a priori class membership for
each class. If prior
is NULL (default), the class membership
probabilities are estimated as the sample proportion of observations belonging
to each class. Otherwise, prior
should be a vector with the same length
as the number of classes in y
. The prior
probabilities should be
nonnegative and sum to one.
qda_shrink_mean
object that contains the trained SmDQDA classifier
Tong, T., Chen, L., and Zhao, H. (2012), "Improved Mean Estimation and Its Application to Diagonal Discriminant Analysis," Bioinformatics, 28, 4, 531-537. http://bioinformatics.oxfordjournals.org/content/28/4/531.long
Dudoit, S., Fridlyand, J., & Speed, T. P. (2002). "Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data," Journal of the American Statistical Association, 97, 457, 77-87.
library(modeldata) data(penguins) pred_rows <- seq(1, 344, by = 20) penguins <- penguins[, c("species", "body_mass_g", "flipper_length_mm")] smdqda_out <- qda_shrink_mean(species ~ ., data = penguins[-pred_rows, ]) predicted <- predict(smdqda_out, penguins[pred_rows, -1], type = "class") smdqda_out2 <- qda_shrink_mean(x = penguins[-pred_rows, -1], y = penguins$species[-pred_rows]) predicted2 <- predict(smdqda_out2, penguins[pred_rows, -1], type = "class") all.equal(predicted, predicted2)
library(modeldata) data(penguins) pred_rows <- seq(1, 344, by = 20) penguins <- penguins[, c("species", "body_mass_g", "flipper_length_mm")] smdqda_out <- qda_shrink_mean(species ~ ., data = penguins[-pred_rows, ]) predicted <- predict(smdqda_out, penguins[pred_rows, -1], type = "class") smdqda_out2 <- qda_shrink_mean(x = penguins[-pred_rows, -1], y = penguins$species[-pred_rows]) predicted2 <- predict(smdqda_out2, penguins[pred_rows, -1], type = "class") all.equal(predicted, predicted2)
We compute the quadratic form of a vector and a matrix in an efficient
manner. Let x
be a real vector of length p
, and let A
be
a p x p real matrix. Then, we compute the quadratic form .
quadform(A, x)
quadform(A, x)
A |
matrix of dimension p x p |
x |
vector of length p |
A naive way to compute the quadratic form is to explicitly write
t(x) \%*\% A \%*\% x
, but for large p
, this operation is
inefficient. We provide a more efficient method below.
Note that we have adapted the code from: http://tolstoy.newcastle.edu.au/R/help/05/11/14989.html
scalar value
We compute the quadratic form of a vector and the inverse of a matrix in an
efficient manner. Let x
be a real vector of length p
, and let
A
be a p x p nonsingular matrix. Then, we compute the quadratic form
.
quadform_inv(A, x)
quadform_inv(A, x)
A |
matrix that is p x p and nonsingular |
x |
vector of length p |
A naive way to compute the quadratic form is to explicitly write
t(x) \%*\% solve(A) \%*\% x
, but for large p
, this operation is
inefficient. We provide a more efficient method below.
Note that we have adapted the code from: http://tolstoy.newcastle.edu.au/R/help/05/11/14989.html
scalar value
For the classes given in the vector y
, this function calculates the
class covariance-matrix estimators employed in the HDRDA classifier,
implemented in rda_high_dim()
.
rda_cov(x, y, lambda = 1)
rda_cov(x, y, lambda = 1)
x |
Matrix or data frame containing the training data. The rows are the sample observations, and the columns are the features. Only complete data are retained. |
y |
vector of class labels for each training observation |
lambda |
the RDA pooling parameter. Must be between 0 and 1, inclusively. |
list containing the RDA covariance-matrix estimators for each class
given in y
Ramey, J. A., Stein, C. K., and Young, D. M. (2013), "High-Dimensional Regularized Discriminant Analysis."
Given a set of training data, this function builds the HDRDA classifier from Ramey, Stein, and Young (2017). Specially designed for small-sample, high-dimensional data, the HDRDA classifier incorporates dimension reduction and covariance-matrix shrinkage to enable a computationally efficient classifier.
For a given rda_high_dim
object, we predict the class of each observation
(row) of the the matrix given in newdata
.
rda_high_dim(x, ...) ## Default S3 method: rda_high_dim( x, y, lambda = 1, gamma = 0, shrinkage_type = c("ridge", "convex"), prior = NULL, tol = 1e-06, ... ) ## S3 method for class 'formula' rda_high_dim(formula, data, ...) ## S3 method for class 'rda_high_dim' predict( object, newdata, projected = FALSE, type = c("class", "prob", "score"), ... )
rda_high_dim(x, ...) ## Default S3 method: rda_high_dim( x, y, lambda = 1, gamma = 0, shrinkage_type = c("ridge", "convex"), prior = NULL, tol = 1e-06, ... ) ## S3 method for class 'formula' rda_high_dim(formula, data, ...) ## S3 method for class 'rda_high_dim' predict( object, newdata, projected = FALSE, type = c("class", "prob", "score"), ... )
x |
Matrix or data frame containing the training data. The rows are the sample observations, and the columns are the features. Only complete data are retained. |
... |
additional arguments (not currently used). |
y |
vector of class labels for each training observation |
lambda |
the HDRDA pooling parameter. Must be between 0 and 1, inclusively. |
gamma |
a numeric values used for the shrinkage parameter. |
shrinkage_type |
the type of covariance-matrix shrinkage to apply. By
default, a ridge-like shrinkage is applied. If |
prior |
vector with prior probabilities for each class. If |
tol |
a threshold for determining nonzero eigenvalues. |
formula |
A formula of the form |
data |
data frame from which variables specified in |
object |
Object of type |
newdata |
Matrix or data frame of observations to predict. Each row corresponds to a new observation. |
projected |
logical indicating whether |
type |
Prediction type: either '"class"', '"prob"', or '"score"'. |
The HDRDA classifier utilizes a covariance-matrix estimator that is a convex
combination of the covariance-matrix estimators used in the Linear
Discriminant Analysis (LDA) and Quadratic Discriminant Analysis (QDA)
classifiers. For each of the K
classes given in y
,
, we first define this convex combination as
where is the pooling parameter. We then
calculate the covariance-matrix estimator
where is the
identity matrix. The matrix
is substituted into the HDRDA classifier. See Ramey et
al. (2017) for more details.
The matrix of training observations are given in x
. The rows of
x
contain the sample observations, and the columns contain the features
for each training observation. The vector of class labels given in y
are coerced to a factor
. The length of y
should match the number
of rows in x
.
The vector prior
contains the a priori class membership for
each class. If prior
is NULL
(default), the class membership
probabilities are estimated as the sample proportion of observations
belonging to each class. Otherwise, prior
should be a vector with the
same length as the number of classes in y
. The prior
probabilities should be nonnegative and sum to one. The order of the prior
probabilities is assumed to match the levels of factor(y)
.
rda_high_dim
object that contains the trained HDRDA classifier
list with predicted class and discriminant scores for each of the K classes
Ramey, J. A., Stein, C. K., and Young, D. M. (2017), "High-Dimensional Regularized Discriminant Analysis." https://arxiv.org/abs/1602.01182.
Friedman, J. H. (1989), "Regularized Discriminant Analysis," Journal of American Statistical Association, 84, 405, 165-175. http://www.jstor.org/pss/2289860 (Requires full-text access).
For a given data set, we apply cross-validation (cv) to select the optimal HDRDA tuning parameters.
rda_high_dim_cv( x, y, num_folds = 10, num_lambda = 21, num_gamma = 8, shrinkage_type = c("ridge", "convex"), verbose = FALSE, ... )
rda_high_dim_cv( x, y, num_folds = 10, num_lambda = 21, num_gamma = 8, shrinkage_type = c("ridge", "convex"), verbose = FALSE, ... )
x |
Matrix or data frame containing the training data. The rows are the sample observations, and the columns are the features. Only complete data are retained. |
y |
vector of class labels for each training observation |
num_folds |
the number of cross-validation folds. |
num_lambda |
The number of values of |
num_gamma |
The number of values of |
shrinkage_type |
the type of covariance-matrix shrinkage to apply. By
default, a ridge-like shrinkage is applied. If |
verbose |
If set to |
... |
Options passed to |
The number of cross-validation folds is given in num_folds
.
list containing the HDRDA model that minimizes cross-validation as
well as a data.frame
that summarizes the cross-validation results.
This function calculates the weight for each observation in the data matrix
x
in order to calculate the covariance matrices employed in the HDRDA
classifier, implemented in rda_high_dim()
.
rda_weights(x, y, lambda = 1)
rda_weights(x, y, lambda = 1)
x |
Matrix or data frame containing the training data. The rows are the sample observations, and the columns are the features. Only complete data are retained. |
y |
vector of class labels for each training observation |
lambda |
the RDA pooling parameter. Must be between 0 and 1, inclusively. |
list containing the observations for each class given in y
Ramey, J. A., Stein, C. K., and Young, D. M. (2013), "High-Dimensional Regularized Discriminant Analysis."
Computes the maximum likelihood estimators (MLEs) for each class under the assumption of multivariate normality for each class. Also, computes ancillary information necessary for classifier summary, such as sample size, the number of features, etc.
regdiscrim_estimates(x, y, cov = TRUE, prior = NULL)
regdiscrim_estimates(x, y, cov = TRUE, prior = NULL)
x |
Matrix or data frame containing the training data. The rows are the sample observations, and the columns are the features. Only complete data are retained. |
y |
vector of class labels for each training observation |
cov |
logical. Should the sample covariance matrices be computed? (Default: yes) |
prior |
vector with prior probabilities for each class. If NULL (default), then the sample proportions are used. See details. |
This function computes the common estimates and ancillary information used in
all of the regularized discriminant classifiers in the sparsediscrim
package.
The matrix of training observations are given in x
. The rows of x
contain the sample observations, and the columns contain the features for each
training observation.
The vector of class labels given in y
are coerced to a factor
.
The length of y
should match the number of rows in x
.
An error is thrown if a given class has less than 2 observations because the variance for each feature within a class cannot be estimated with less than 2 observations.
The vector, prior
, contains the a priori class membership for
each class. If prior
is NULL (default), the class membership
probabilities are estimated as the sample proportion of observations belonging
to each class. Otherwise, prior
should be a vector with the same length
as the number of classes in y
. The prior
probabilities should be
nonnegative and sum to one.
named list with estimators for each class and necessary ancillary information
This function finds the value for that empirically
minimizes the average risk under a Stein loss function, which is given on
page 1023 of Pang et al. (2009).
risk_stein(N, K, var_feature, num_alphas = 101, t = -1)
risk_stein(N, K, var_feature, num_alphas = 101, t = -1)
N |
the sample size. |
K |
the number of classes. |
var_feature |
a vector of the sample variances for each dimension. |
num_alphas |
The number of values used to find the optimal amount of shrinkage. |
t |
a constant specified by the user that indicates the exponent to use with the variance estimator. By default, t = -1 as in Pang et al. See the paper for more details. |
list with
alpha
: the alpha that minimizes the average risk under a Stein
loss function. If the minimum is not unique, we randomly select an
alpha
from the minimizers.
risk
: the minimum average risk attained.
Pang, H., Tong, T., & Zhao, H. (2009). "Shrinkage-based Diagonal Discriminant Analysis and Its Applications in High-Dimensional Data," Biometrics, 65, 4, 1021-1029. http://onlinelibrary.wiley.com/doi/10.1111/j.1541-0420.2009.01200.x/abstract
This often faster than solve()
for larger matrices.
See, for example:
http://blog.phytools.org/2012/12/faster-inversion-of-square-symmetric.html
and
http://stats.stackexchange.com/questions/14951/efficient-calculation-of-matrix-inverse-in-r.
solve_chol(x)
solve_chol(x)
x |
symmetric, positive-definite matrix |
the inverse of x
An implementation of the Lindley-type shrunken mean estimator utilized in shrinkage-mean-based diagonal linear discriminant analysis (SmDLDA).
tong_mean_shrinkage(x, r_opt = NULL)
tong_mean_shrinkage(x, r_opt = NULL)
x |
a matrix with |
r_opt |
the shrinkage coefficient. If |
vector of length p
with the shrunken mean estimator
Tong, T., Chen, L., and Zhao, H. (2012), "Improved Mean Estimation and Its Application to Diagonal Discriminant Analysis," Bioinformatics, 28, 4, 531-537. http://bioinformatics.oxfordjournals.org/content/28/4/531.long
Example bivariate classification data from caret
These data were generated using by invoking the twoClassSim()
function in the caret
package.
two_class_sim_data |
a tibble |
data(two_class_sim_data)
data(two_class_sim_data)
This function updates some of the quantities in the HDRDA classifier based on
updated values of lambda
and gamma
. The update can greatly
expedite cross-validation to examine a large grid of values for lambda
and gamma
.
update_rda_high_dim(obj, lambda = 1, gamma = 0)
update_rda_high_dim(obj, lambda = 1, gamma = 0)
obj |
a |
lambda |
a numeric value between 0 and 1, inclusively |
gamma |
a numeric value (nonnegative) |
a rda_high_dim
object with updated estimates
This function computes the shrinkage-based estimator of variance of each feature (variable) from Pang et al. (2009) for the SDLDA classifier.
var_shrinkage(N, K, var_feature, num_alphas = 101, t = -1)
var_shrinkage(N, K, var_feature, num_alphas = 101, t = -1)
N |
the sample size. |
K |
the number of classes. |
var_feature |
a vector of the sample variances for each feature. |
num_alphas |
The number of values used to find the optimal amount of shrinkage. |
t |
a constant specified by the user that indicates the exponent to use with the variance estimator. By default, t = -1 as in Pang et al. See the paper for more details. |
a vector of the shrunken variances for each feature.
Pang, H., Tong, T., & Zhao, H. (2009). "Shrinkage-based Diagonal Discriminant Analysis and Its Applications in High-Dimensional Data," Biometrics, 65, 4, 1021-1029. http://onlinelibrary.wiley.com/doi/10.1111/j.1541-0420.2009.01200.x/abstract