PDA

View Full Version : False Discovery Rate methods in JMSL



diraviam
05-10-2006, 09:17 PM
Hi,

Are there any False Discovery Rate methods implemented in the JMSL library.
I am looking for qvalue implementation.

Thanks in advance
Lazar Arulnayagam

ed
05-11-2006, 09:08 AM
There are many statistical functions in IMSL that will return p-values used as input into FDR; however, there are not currently any False Discovery Rate functions in IMSL. I will add this functionality to our internal enhancement request list.

diraviam
05-11-2006, 01:52 PM
What is your estimated time for it to be available.
We have a JMSL in our enviorment and have to make decision on whether to implement our own or use yours. ( The second is our preferred option).

Thanks

Lazar Arulnayagam

ed
05-11-2006, 02:13 PM
I've actually been thinking about this some today. I Googled around and found some Matlab code (http://www.sph.umich.edu/~nichols/FDR/) and converted it to Java.

Please have a look at this code and see if it returns the expected results. My disclaimer is that I'm an engineer and not a statistician, so I can't really verify the algorithm or the results. It could also use some better error checking.

FDR class:

/**
* FDR class converted from FDR.m v1.3 by Tim Nichols 02/01/18
*/
public class FDR {
double t, n;

/**
* Create a new FDR object.
*
* @param p a double array, the vector of p-values
* @param q a double, the False Discovery Rate level
*/
public FDR(double[] p, double q) {
com.imsl.stat.Sort.ascending(p);
int v = p.length;
double cvn = 0;
for (int i=0; i<v; i++) {
cvn += 1.0/(i+1);
}
double cvid = 1;

double[] test1 = new double[v];
double[] test2 = new double[v];
for (int i=0; i<v; i++) {
test1[i] = (i+1)/(double)v*q/cvid;
test2[i] = (i+1)/(double)v*q/cvn;
}

int ixt = findlemax(p, test1);
if (ixt != -1) t = p[ixt];
int ixn = findlemax(p, test2);
if (ixn != -1) n = p[ixn];
}

/**
* @return p-value threshold based on independence or positive dependence
*/
public double getThreshold() {
return t;
}

/**
* @return Nonparametric p-value threshold
*/
public double getNThreshold() {
return n;
}

int findlemax(double[] a, double[] b) {
final int len = a.length;
int ix = -1;
for (int i=0; i<len; i++) {
if (a[i] <= b[i]) ix=i;
}
return ix;
}
}

A test class:

public class TestFDR {

public TestFDR() {
double[] p = {0.01,0.1,0.015,0.02,0.025,0.08,0.075};
double q = 0.2;

FDR f = new FDR(p, q);
double pid = f.getThreshold();
double pn = f.getNThreshold();

System.out.println(pid);
System.out.println(pn);
}

public static void main(String[] arg) {
TestFDR t = new TestFDR();
}
}

diraviam
05-11-2006, 03:54 PM
Thanks for the quick reply.

I am a software engineer and not a statistician as well. I will check with the statistician and scientific lead to check if method will work for them.

We are porting some of the R code to a more robust JMSL libraries. The current FDR method that is used is the qvalue method in Bioconductor package.

http://www.ugrad.stat.ubc.ca/R/library/qvalue/html/00Index.html

The R code for this is:


=========Start of R Code==============================

function (p = NULL, lambda = seq(0, 0.9, 0.05), pi0.method = "smoother",
fdr.level = NULL, robust = FALSE, gui = FALSE, smooth.df = 3,
smooth.log.pi0 = FALSE)
{
if (is.null(p)) {
qvalue.gui()
return("Launching point-and-click...")
}
if (gui & !interactive())
gui = FALSE
if (min(p) < 0 || max(p) > 1) {
if (gui)
eval(expression(postMsg(paste("ERROR: p-values not in valid range.",
"\n"))), parent.frame())
else print("ERROR: p-values not in valid range.")
return(0)
}
if (length(lambda) > 1 && length(lambda) < 4) {
if (gui)
eval(expression(postMsg(paste("ERROR: If length of lambda greater than 1, you need at least 4 values.",
"\n"))), parent.frame())
else print("ERROR: If length of lambda greater than 1, you need at least 4 values.")
return(0)
}
if (length(lambda) > 1 && (min(lambda) < 0 || max(lambda) >=
1)) {
if (gui)
eval(expression(postMsg(paste("ERROR: Lambda must be within [0, 1).",
"\n"))), parent.frame())
else print("ERROR: Lambda must be within [0, 1).")
return(0)
}
m <- length(p)
if (length(lambda) == 1) {
if (lambda < 0 || lambda >= 1) {
if (gui)
eval(expression(postMsg(paste("ERROR: Lambda must be within [0, 1).",
"\n"))), parent.frame())
else print("ERROR: Lambda must be within [0, 1).")
return(0)
}
pi0 <- mean(p >= lambda)/(1 - lambda)
pi0 <- min(pi0, 1)
}
else {
pi0 <- rep(0, length(lambda))
for (i in 1:length(lambda)) {
pi0[i] <- mean(p >= lambda[i])/(1 - lambda[i])
}
if (pi0.method == "smoother") {
if (smooth.log.pi0)
pi0 <- log(pi0)
spi0 <- smooth.spline(lambda, pi0, df = smooth.df)
pi0 <- predict(spi0, x = max(lambda))$y
if (smooth.log.pi0)
pi0 <- exp(pi0)
pi0 <- min(pi0, 1)
}
else if (pi0.method == "bootstrap") {
minpi0 <- min(pi0)
mse <- rep(0, length(lambda))
pi0.boot <- rep(0, length(lambda))
for (i in 1:100) {
p.boot <- sample(p, size = m, replace = TRUE)
for (i in 1:length(lambda)) {
pi0.boot[i] <- mean(p.boot > lambda[i])/(1 -
lambda[i])
}
mse <- mse + (pi0.boot - minpi0)^2
}
pi0 <- min(pi0[mse == min(mse)])
pi0 <- min(pi0, 1)
}
else {
print("ERROR: 'pi0.method' must be one of 'smoother' or 'bootstrap'.")
return(0)
}
}
if (pi0 <= 0) {
if (gui)
eval(expression(postMsg(paste("ERROR: The estimated pi0 <= 0. Check that you have valid p-values or use another lambda method.",
"\n"))), parent.frame())
else print("ERROR: The estimated pi0 <= 0. Check that you have valid p-values or use another lambda method.")
return(0)
}
if (!is.null(fdr.level) && (fdr.level <= 0 || fdr.level >
1)) {
if (gui)
eval(expression(postMsg(paste("ERROR: 'fdr.level' must be within (0, 1].",
"\n"))), parent.frame())
else print("ERROR: 'fdr.level' must be within (0, 1].")
return(0)
}
u <- order(p)
qvalue.rank <- function(x) {
idx <- sort.list(x)
fc <- factor(x)
nl <- length(levels(fc))
bin <- as.integer(fc)
tbl <- tabulate(bin)
cs <- cumsum(tbl)
tbl <- rep(cs, tbl)
tbl[idx] <- tbl
return(tbl)
}
v <- qvalue.rank(p)
qvalue <- pi0 * m * p/v
if (robust) {
qvalue <- pi0 * m * p/(v * (1 - (1 - p)^m))
}
qvalue[u[m]] <- min(qvalue[u[m]], 1)
for (i in (m - 1):1) {
qvalue[u[i]] <- min(qvalue[u[i]], qvalue[u[i + 1]], 1)
}
if (!is.null(fdr.level)) {
retval <- list(call = match.call(), pi0 = pi0, qvalues = qvalue,
pvalues = p, fdr.level = fdr.level, significant = (qvalue <=
fdr.level), lambda = lambda)
}
else {
retval <- list(call = match.call(), pi0 = pi0, qvalues = qvalue,
pvalues = p, lambda = lambda)
}
class(retval) <- "qvalue"
return(retval)
}



==================================================

diraviam
05-17-2006, 04:05 PM
After consulting with the scientific lead on the project, he suggest that we use the BH method to determine the corrected QValue as the FDR method.

The rcode for it is here: We are currently interested only in the BH method.

Thanks


function (rawp, proc = c("Bonferroni", "Holm", "Hochberg", "SidakSS",
"SidakSD", "BH", "BY"))
{
m <- length(rawp)
n <- length(proc)
index <- order(rawp)
spval <- rawp[index]
adjp <- matrix(0, m, n + 1)
dimnames(adjp) <- list(NULL, c("rawp", proc))
adjp[, 1] <- spval
if (is.element("Bonferroni", proc)) {
tmp <- m * spval
tmp[tmp > 1] <- 1
adjp[, "Bonferroni"] <- tmp
}
if (is.element("Holm", proc)) {
tmp <- spval
tmp[1] <- min(m * spval[1], 1)
for (i in 2:m) tmp[i] <- max(tmp[i - 1], min((m - i +
1) * spval[i], 1))
adjp[, "Holm"] <- tmp
}
if (is.element("Hochberg", proc)) {
tmp <- spval
for (i in (m - 1):1) {
tmp[i] <- min(tmp[i + 1], min((m - i + 1) * spval[i],
1, na.rm = TRUE), na.rm = TRUE)
if (is.na(spval[i]))
tmp[i] <- NA
}
adjp[, "Hochberg"] <- tmp
}
if (is.element("SidakSS", proc))
adjp[, "SidakSS"] <- 1 - (1 - spval)^m
if (is.element("SidakSD", proc)) {
tmp <- spval
tmp[1] <- 1 - (1 - spval[1])^m
for (i in 2:m) tmp[i] <- max(tmp[i - 1], 1 - (1 - spval[i])^(m -
i + 1))
adjp[, "SidakSD"] <- tmp
}
if (is.element("BH", proc)) {
tmp <- spval
for (i in (m - 1):1) {
tmp[i] <- min(tmp[i + 1], min((m/i) * spval[i], 1,
na.rm = TRUE), na.rm = TRUE)
if (is.na(spval[i]))
tmp[i] <- NA
}
adjp[, "BH"] <- tmp
}
if (is.element("BY", proc)) {
tmp <- spval
a <- sum(1/(1:m))
tmp[m] <- min(a * spval[m], 1)
for (i in (m - 1):1) {
tmp[i] <- min(tmp[i + 1], min((m * a/i) * spval[i],
1, na.rm = TRUE), na.rm = TRUE)
if (is.na(spval[i]))
tmp[i] <- NA
}
adjp[, "BY"] <- tmp
}
list(adjp = adjp, index = index)
}

totallyunimodular
05-18-2006, 11:49 AM
Diriviam,

I know very little about FDR, but I am just curious why you do not want to use the FDR algorithm in the R Bioconductor package you mentioned? It seems like the FDR routine "qvalue" is based upon Storey's 2002 paper, "A direct approach to false discovery rates", in which it is shown that his approach is more powerful than the traditional BH method. This paper can be found at

http://faculty.washington.edu/jstorey/papers/directfdr.pdf

Ultimately what you need is for an FDR algorithm in JMSL, but before folks go off and start trying to code something up, I was curious about the direction you are taking. For the BH R code you posted, is this code you wrote or did it come from another R package?

Cheers