PDA

View Full Version : False Discovery Rate methods in JMSL

diraviam
05-10-2006, 08:17 PM
Hi,

Are there any False Discovery Rate methods implemented in the JMSL library.
I am looking for qvalue implementation.

Lazar Arulnayagam

ed
05-11-2006, 08:08 AM
There are many statistical functions in IMSL that will return p-values used as input into FDR; however, there are not currently any False Discovery Rate functions in IMSL. I will add this functionality to our internal enhancement request list.

diraviam
05-11-2006, 12:52 PM
What is your estimated time for it to be available.
We have a JMSL in our enviorment and have to make decision on whether to implement our own or use yours. ( The second is our preferred option).

Thanks

Lazar Arulnayagam

ed
05-11-2006, 01:13 PM

Please have a look at this code and see if it returns the expected results. My disclaimer is that I'm an engineer and not a statistician, so I can't really verify the algorithm or the results. It could also use some better error checking.

FDR class:

/**
* FDR class converted from FDR.m v1.3 by Tim Nichols 02/01/18
*/
public class FDR {
double t, n;

/**
* Create a new FDR object.
*
* @param p a double array, the vector of p-values
* @param q a double, the False Discovery Rate level
*/
public FDR(double[] p, double q) {
com.imsl.stat.Sort.ascending(p);
int v = p.length;
double cvn = 0;
for (int i=0; i<v; i++) {
cvn += 1.0/(i+1);
}
double cvid = 1;

double[] test1 = new double[v];
double[] test2 = new double[v];
for (int i=0; i<v; i++) {
test1[i] = (i+1)/(double)v*q/cvid;
test2[i] = (i+1)/(double)v*q/cvn;
}

int ixt = findlemax(p, test1);
if (ixt != -1) t = p[ixt];
int ixn = findlemax(p, test2);
if (ixn != -1) n = p[ixn];
}

/**
* @return p-value threshold based on independence or positive dependence
*/
public double getThreshold() {
return t;
}

/**
* @return Nonparametric p-value threshold
*/
public double getNThreshold() {
return n;
}

int findlemax(double[] a, double[] b) {
final int len = a.length;
int ix = -1;
for (int i=0; i<len; i++) {
if (a[i] <= b[i]) ix=i;
}
return ix;
}
}

A test class:

public class TestFDR {

public TestFDR() {
double[] p = {0.01,0.1,0.015,0.02,0.025,0.08,0.075};
double q = 0.2;

FDR f = new FDR(p, q);
double pid = f.getThreshold();
double pn = f.getNThreshold();

System.out.println(pid);
System.out.println(pn);
}

public static void main(String[] arg) {
TestFDR t = new TestFDR();
}
}

diraviam
05-11-2006, 02:54 PM

I am a software engineer and not a statistician as well. I will check with the statistician and scientific lead to check if method will work for them.

We are porting some of the R code to a more robust JMSL libraries. The current FDR method that is used is the qvalue method in Bioconductor package.

The R code for this is:

=========Start of R Code==============================

function (p = NULL, lambda = seq(0, 0.9, 0.05), pi0.method = "smoother",
fdr.level = NULL, robust = FALSE, gui = FALSE, smooth.df = 3,
smooth.log.pi0 = FALSE)
{
if (is.null(p)) {
qvalue.gui()
return("Launching point-and-click...")
}
if (gui & !interactive())
gui = FALSE
if (min(p) < 0 || max(p) > 1) {
if (gui)
eval(expression(postMsg(paste("ERROR: p-values not in valid range.",
"\n"))), parent.frame())
else print("ERROR: p-values not in valid range.")
return(0)
}
if (length(lambda) > 1 && length(lambda) < 4) {
if (gui)
eval(expression(postMsg(paste("ERROR: If length of lambda greater than 1, you need at least 4 values.",
"\n"))), parent.frame())
else print("ERROR: If length of lambda greater than 1, you need at least 4 values.")
return(0)
}
if (length(lambda) > 1 && (min(lambda) < 0 || max(lambda) >=
1)) {
if (gui)
eval(expression(postMsg(paste("ERROR: Lambda must be within [0, 1).",
"\n"))), parent.frame())
else print("ERROR: Lambda must be within [0, 1).")
return(0)
}
m <- length(p)
if (length(lambda) == 1) {
if (lambda < 0 || lambda >= 1) {
if (gui)
eval(expression(postMsg(paste("ERROR: Lambda must be within [0, 1).",
"\n"))), parent.frame())
else print("ERROR: Lambda must be within [0, 1).")
return(0)
}
pi0 <- mean(p >= lambda)/(1 - lambda)
pi0 <- min(pi0, 1)
}
else {
pi0 <- rep(0, length(lambda))
for (i in 1:length(lambda)) {
pi0[i] <- mean(p >= lambda[i])/(1 - lambda[i])
}
if (pi0.method == "smoother") {
if (smooth.log.pi0)
pi0 <- log(pi0)
spi0 <- smooth.spline(lambda, pi0, df = smooth.df)
pi0 <- predict(spi0, x = max(lambda))\$y
if (smooth.log.pi0)
pi0 <- exp(pi0)
pi0 <- min(pi0, 1)
}
else if (pi0.method == "bootstrap") {
minpi0 <- min(pi0)
mse <- rep(0, length(lambda))
pi0.boot <- rep(0, length(lambda))
for (i in 1:100) {
p.boot <- sample(p, size = m, replace = TRUE)
for (i in 1:length(lambda)) {
pi0.boot[i] <- mean(p.boot > lambda[i])/(1 -
lambda[i])
}
mse <- mse + (pi0.boot - minpi0)^2
}
pi0 <- min(pi0[mse == min(mse)])
pi0 <- min(pi0, 1)
}
else {
print("ERROR: 'pi0.method' must be one of 'smoother' or 'bootstrap'.")
return(0)
}
}
if (pi0 <= 0) {
if (gui)
eval(expression(postMsg(paste("ERROR: The estimated pi0 <= 0. Check that you have valid p-values or use another lambda method.",
"\n"))), parent.frame())
else print("ERROR: The estimated pi0 <= 0. Check that you have valid p-values or use another lambda method.")
return(0)
}
if (!is.null(fdr.level) && (fdr.level <= 0 || fdr.level >
1)) {
if (gui)
eval(expression(postMsg(paste("ERROR: 'fdr.level' must be within (0, 1].",
"\n"))), parent.frame())
else print("ERROR: 'fdr.level' must be within (0, 1].")
return(0)
}
u <- order(p)
qvalue.rank <- function(x) {
idx <- sort.list(x)
fc <- factor(x)
nl <- length(levels(fc))
bin <- as.integer(fc)
tbl <- tabulate(bin)
cs <- cumsum(tbl)
tbl <- rep(cs, tbl)
tbl[idx] <- tbl
return(tbl)
}
v <- qvalue.rank(p)
qvalue <- pi0 * m * p/v
if (robust) {
qvalue <- pi0 * m * p/(v * (1 - (1 - p)^m))
}
qvalue[u[m]] <- min(qvalue[u[m]], 1)
for (i in (m - 1):1) {
qvalue[u[i]] <- min(qvalue[u[i]], qvalue[u[i + 1]], 1)
}
if (!is.null(fdr.level)) {
retval <- list(call = match.call(), pi0 = pi0, qvalues = qvalue,
pvalues = p, fdr.level = fdr.level, significant = (qvalue <=
fdr.level), lambda = lambda)
}
else {
retval <- list(call = match.call(), pi0 = pi0, qvalues = qvalue,
pvalues = p, lambda = lambda)
}
class(retval) <- "qvalue"
return(retval)
}

==================================================

diraviam
05-17-2006, 03:05 PM
After consulting with the scientific lead on the project, he suggest that we use the BH method to determine the corrected QValue as the FDR method.

The rcode for it is here: We are currently interested only in the BH method.

Thanks

function (rawp, proc = c("Bonferroni", "Holm", "Hochberg", "SidakSS",
"SidakSD", "BH", "BY"))
{
m <- length(rawp)
n <- length(proc)
index <- order(rawp)
spval <- rawp[index]
adjp <- matrix(0, m, n + 1)
if (is.element("Bonferroni", proc)) {
tmp <- m * spval
tmp[tmp > 1] <- 1
}
if (is.element("Holm", proc)) {
tmp <- spval
tmp <- min(m * spval, 1)
for (i in 2:m) tmp[i] <- max(tmp[i - 1], min((m - i +
1) * spval[i], 1))
}
if (is.element("Hochberg", proc)) {
tmp <- spval
for (i in (m - 1):1) {
tmp[i] <- min(tmp[i + 1], min((m - i + 1) * spval[i],
1, na.rm = TRUE), na.rm = TRUE)
if (is.na(spval[i]))
tmp[i] <- NA
}
}
if (is.element("SidakSS", proc))
adjp[, "SidakSS"] <- 1 - (1 - spval)^m
if (is.element("SidakSD", proc)) {
tmp <- spval
tmp <- 1 - (1 - spval)^m
for (i in 2:m) tmp[i] <- max(tmp[i - 1], 1 - (1 - spval[i])^(m -
i + 1))
}
if (is.element("BH", proc)) {
tmp <- spval
for (i in (m - 1):1) {
tmp[i] <- min(tmp[i + 1], min((m/i) * spval[i], 1,
na.rm = TRUE), na.rm = TRUE)
if (is.na(spval[i]))
tmp[i] <- NA
}
}
if (is.element("BY", proc)) {
tmp <- spval
a <- sum(1/(1:m))
tmp[m] <- min(a * spval[m], 1)
for (i in (m - 1):1) {
tmp[i] <- min(tmp[i + 1], min((m * a/i) * spval[i],
1, na.rm = TRUE), na.rm = TRUE)
if (is.na(spval[i]))
tmp[i] <- NA
}
}