Title: | Addressing Detection Limits by Cumulative Probability Models (CPMs) |
---|---|
Description: | Build CPMs (cumulative probability models, also known as cumulative link models) to account for detection limits (both single and multiple detection limits) in response variables. Conditional quantiles and conditional CDFs can be calculated based on fitted models. The package implements methods described in Tian, Y., Li, C., Tu, S., James, N. T., Harrell, F. E., & Shepherd, B. E. (2022). "Addressing Detection Limits with Semiparametric Cumulative Probability Models". <arXiv:2207.02815>. |
Authors: | Yuqi Tian [aut, cre], Chun Li [aut], Shengxin Tu [aut], Nathan James [aut], Frank Harrell [aut], Bryan Shepherd [aut] |
Maintainer: | Yuqi Tian <[email protected]> |
License: | GPL (>= 2) |
Version: | 1.0.0 |
Built: | 2025-01-24 04:27:37 UTC |
Source: | https://github.com/yuqitian35/multipledl |
The package allows fitting regression models on continuous/ordinal response data subject to detection limits (DLs) based on cumulative probability models (CPMs). Both single and multiple DLs can be handled. Conditional quantiles and CDFs (cumulative distribution functions) can obtained from fitted models.
The 'multipleDL' package.
Stan Development Team (2020). RSroxygen2::roxygenize()tan: the R interface to Stan. R package version 2.19.3. https://mc-stan.org Harrell, F. (2020). rms: Regression modeling strategies. R package version 6.1.0. https://CRAN.R-project.org/package=rms Tian et al. "Addressing detection limits by semiparametric cumulative probability models." (2022) (to be submitted)
This functions calculates the conditional CDFs based on the fitted model and new data.
cdf_dl(mod, new.data, at.y = 0, se = TRUE)
cdf_dl(mod, new.data, at.y = 0, se = TRUE)
mod |
the model |
new.data |
the new data |
at.y |
a numeric vector of cut-off points P(y <= at.y | new.data) |
se |
if confidence intervals needed (default = TRUE) |
A list containing the following components:
est |
a vector of estimated condtional CDFs |
se |
a vector of estimated standard errors |
lb |
a vector of estimated lower bounds of 95% confidence intervals |
ub |
a vector of estimated upper bounds of 95% confidence intervals |
#' @examples ## Multiple DLs ## generate a small example data: 3 sites with different lower and upper DLs ## lower DLs: site 1: - 0.2; site 2: 0.3; site 3: no lower DL ## upper DLs: site 1: no upper DL; site 2: 4; site 3: 3.5 ## each site includes 100 subjects n <- 100 x <- rnorm(n * 3) e <- rnorm(n * 3) y <- exp(x + e) no_dl <- 1e6 data <- data.frame(y = y, x = x, subset = rep(c(1, 2, 3), each=n)) data$dl_l <- ifelse(data$subset == 1, 0.2, ifelse(data$subset == 2, 0.3, -no_dl)) data$dl_u <- ifelse(data$subset == 1, no_dl, ifelse(data$subset == 2, 4, 3.5)) data$delta_l <- ifelse(data$y >= data$dl_l, 1, 0) data$delta_u <- ifelse(data$y <= data$dl_u, 1, 0) data$z <- ifelse(data$delta_l == 0, data$dl_l, ifelse(data$delta_u == 0, data$dl_u, data$y)) # model mod <- multipleDL(formula = z ~ x, data = data, delta_lower = data$delta_l, delta_upper = data$delta_u, link='probit') # new data new.data <- data.frame(x = c(0, 1)) conditional_median <- quantile_dl(mod, new.data, probs = 0.5) conditional_cdf <- cdf_dl(mod, new.data, at.y = 1.5) # P(y <= 1.5 | new.data)
#' @examples ## Multiple DLs ## generate a small example data: 3 sites with different lower and upper DLs ## lower DLs: site 1: - 0.2; site 2: 0.3; site 3: no lower DL ## upper DLs: site 1: no upper DL; site 2: 4; site 3: 3.5 ## each site includes 100 subjects n <- 100 x <- rnorm(n * 3) e <- rnorm(n * 3) y <- exp(x + e) no_dl <- 1e6 data <- data.frame(y = y, x = x, subset = rep(c(1, 2, 3), each=n)) data$dl_l <- ifelse(data$subset == 1, 0.2, ifelse(data$subset == 2, 0.3, -no_dl)) data$dl_u <- ifelse(data$subset == 1, no_dl, ifelse(data$subset == 2, 4, 3.5)) data$delta_l <- ifelse(data$y >= data$dl_l, 1, 0) data$delta_u <- ifelse(data$y <= data$dl_u, 1, 0) data$z <- ifelse(data$delta_l == 0, data$dl_l, ifelse(data$delta_u == 0, data$dl_u, data$y)) # model mod <- multipleDL(formula = z ~ x, data = data, delta_lower = data$delta_l, delta_upper = data$delta_u, link='probit') # new data new.data <- data.frame(x = c(0, 1)) conditional_median <- quantile_dl(mod, new.data, probs = 0.5) conditional_cdf <- cdf_dl(mod, new.data, at.y = 1.5) # P(y <= 1.5 | new.data)
This function includes necessary functions related to each link function
func_link(link)
func_link(link)
link |
the link function |
A list of functions subject to a link function
This function faciliates the stan code (used as an internal function)
func_link_num(link)
func_link_num(link)
link |
the link function |
An integer representing corresponding link function
This functions calculates the covariance matrix based on the point estimates
func_V(coef, n, x, y, delta, k, p, fam)
func_V(coef, n, x, y, delta, k, p, fam)
coef |
coefficients (alpha, beta) |
n |
number of subjects |
x |
original covariate matrix |
y |
ranks of code values |
delta |
censoring indicators |
k |
the number of unique code values |
p |
the number of covariates |
fam |
a list of functions subject to the link function |
A covariance matrix of coefficients
This function build the CPM for multiple detection limits (DLs).
multipleDL(formula, data, delta_lower = NULL, delta_upper = NULL, link)
multipleDL(formula, data, delta_lower = NULL, delta_upper = NULL, link)
formula |
an R formula object |
data |
a data frame including response data and covariates |
delta_lower |
(optional) indicators of lower DLs censoring (1: observed; 0:censored). If not specified, treat as observed. |
delta_upper |
(optional) indicators of upper DLs censoring(1: observed; 0:censored). If not specified, treat as observed. |
link |
the link function (probit, logit, loglog, cloglog) |
When there are multiple DLs, we appropriately modify the CPM likelihood.
If a value is below a lower DL, set the censored value as the lower DL and set the
lower DL indicator delta_lower
to be 0. Similarly, if a value is above an upper DL,
set the censored value as the upper DL and set the upper DL indicator delta_upper
to be 0.
This function also works when there is only a single lower and/or upper DL.
Conditional quantiles and CDFs and corresponding 95% confidence intervals can be calculated from the model fit.
A list containing the following components:
coef |
a numeric vector of estimated coeffiencts |
var |
covariance matrix of estimated coeffiencts |
yunique |
a numeric vector of unique response values |
kint |
number of alphas (intercept terms) |
p |
number of betas (regression coeffiencts) |
fam |
a list of functions associated with the specified link function |
x |
the design matrix |
log_likelihood |
the log-likelihood |
Tian, Y., Li, C., Tu, S., James, N. T., Harrell, F. E., & Shepherd, B. E. (2022). Addressing Detection Limits with Semiparametric Cumulative Probability Models. arXiv preprint arXiv:2207.02815.
Stan Development Team (2020). RSroxygen2::roxygenize()tan: the R interface to Stan. R package version 2.19.3. https://mc-stan.org
Harrell, F. (2020). rms: Regression modeling strategies. R package version 6.1.0. https://CRAN.R-project.org/package=rms
## Multiple DLs ## generate a small example data: 3 sites with different lower and upper DLs ## lower DLs: site 1: - 0.2; site 2: 0.3; site 3: no lower DL ## upper DLs: site 1: no upper DL; site 2: 4; site 3: 3.5 ## each site includes 100 subjects n <- 100 x <- rnorm(n * 3) e <- rnorm(n * 3) y <- exp(x + e) no_dl <- 1e6 data <- data.frame(y = y, x = x, subset = rep(c(1, 2, 3), each=n)) data$dl_l <- ifelse(data$subset == 1, 0.2, ifelse(data$subset == 2, 0.3, -no_dl)) data$dl_u <- ifelse(data$subset == 1, no_dl, ifelse(data$subset == 2, 4, 3.5)) data$delta_l <- ifelse(data$y >= data$dl_l, 1, 0) data$delta_u <- ifelse(data$y <= data$dl_u, 1, 0) data$z <- ifelse(data$delta_l == 0, data$dl_l, ifelse(data$delta_u == 0, data$dl_u, data$y)) # model mod <- multipleDL(formula = z ~ x, data = data, delta_lower = data$delta_l, delta_upper = data$delta_u, link='probit') # new data new.data <- data.frame(x = c(0, 1)) conditional_median <- quantile_dl(mod, new.data, probs = 0.5) conditional_cdf <- cdf_dl(mod, new.data, at.y = 1.5) # P(y <= 1.5 | new.data) ## Single DL: lower DL at 0.5 n <- 100 x <- rnorm(n) e <- rnorm(n) y <- exp(x + e) lower_dl <- 0.5 data <- data.frame(y = y, x = x) data$delta_lower <- ifelse(data$y >= lower_dl, 1, 0) data$z <- ifelse(data$delta_lower == 0, lower_dl, data$y) mod <- multipleDL(formula = z ~ x, data = data, delta_lower = data$delta_l, link='probit')
## Multiple DLs ## generate a small example data: 3 sites with different lower and upper DLs ## lower DLs: site 1: - 0.2; site 2: 0.3; site 3: no lower DL ## upper DLs: site 1: no upper DL; site 2: 4; site 3: 3.5 ## each site includes 100 subjects n <- 100 x <- rnorm(n * 3) e <- rnorm(n * 3) y <- exp(x + e) no_dl <- 1e6 data <- data.frame(y = y, x = x, subset = rep(c(1, 2, 3), each=n)) data$dl_l <- ifelse(data$subset == 1, 0.2, ifelse(data$subset == 2, 0.3, -no_dl)) data$dl_u <- ifelse(data$subset == 1, no_dl, ifelse(data$subset == 2, 4, 3.5)) data$delta_l <- ifelse(data$y >= data$dl_l, 1, 0) data$delta_u <- ifelse(data$y <= data$dl_u, 1, 0) data$z <- ifelse(data$delta_l == 0, data$dl_l, ifelse(data$delta_u == 0, data$dl_u, data$y)) # model mod <- multipleDL(formula = z ~ x, data = data, delta_lower = data$delta_l, delta_upper = data$delta_u, link='probit') # new data new.data <- data.frame(x = c(0, 1)) conditional_median <- quantile_dl(mod, new.data, probs = 0.5) conditional_cdf <- cdf_dl(mod, new.data, at.y = 1.5) # P(y <= 1.5 | new.data) ## Single DL: lower DL at 0.5 n <- 100 x <- rnorm(n) e <- rnorm(n) y <- exp(x + e) lower_dl <- 0.5 data <- data.frame(y = y, x = x) data$delta_lower <- ifelse(data$y >= lower_dl, 1, 0) data$z <- ifelse(data$delta_lower == 0, lower_dl, data$y) mod <- multipleDL(formula = z ~ x, data = data, delta_lower = data$delta_l, link='probit')
This functions calculates the conditional weighted quantiles based on the fitted model and new data.
quantile_dl(mod, new.data, probs = 0.5, se = TRUE)
quantile_dl(mod, new.data, probs = 0.5, se = TRUE)
mod |
the model |
new.data |
the new data |
probs |
a numeric vector of pth quantiles |
se |
if confidence intervals needed (default = TRUE) |
A list containing the following components:
est |
a vector of estimated condtional quantiles |
lb |
a vector of estimated lower bounds of 95% confidence intervals |
ub |
a vector of estimated upper bounds of 95% confidence intervals |
Runs a matrix through the QR decomposition and returns the transformed matrix and the forward and inverse transforming matrices
R, Rinv
. If columns of the input matrix X
are centered the QR transformed matrix will be orthogonal.
This is helpful in understanding the transformation and in scaling prior distributions on the transformed scale.
not
can be specified to keep selected columns as-is.
cornerQr
leaves the last column of X
alone (possibly after centering).
When not
is specified, the square transforming matrices have appropriate identity submatrices inserted
so that recreation of original X
is automatic.
selectedQr(X, not = NULL, corner = FALSE, center = TRUE)
selectedQr(X, not = NULL, corner = FALSE, center = TRUE)
X |
a numeric matrix |
not |
an integer vector specifying which columns of |
corner |
set to |
center |
set to |
list with elements X, R, Rinv, xbar
where xbar
is the vector of means (vector of zeros if center=FALSE
)
@export