PDA

View Full Version : confidence interval for standard deviation?

hcrisp
10-29-2009, 09:57 AM
The PV-WAVE: IMSL Statistics Reference states that I can use ANOVA1 to determine the confidence intervals on all pairwise differences of means (using one of six methods). How would I go about computing the confidence intervals for the pairwise differences of standard deviations?

totallyunimodular
10-30-2009, 10:15 AM
I do not believe there are any routines in PV-WAVE that can directly get these results. That being said, I don't think this functionality is available in most software either. I think you'll need to research and apply Levy's method for multiple comparisons on statistics other than the mean. For example, see this link (http://epm.sagepub.com/cgi/content/abstract/55/5/795) and this link (http://cat.inist.fr/?aModele=afficheN&cpsidt=3663210).

hcrisp
11-12-2009, 04:35 PM
Another question along the same lines:

SIMPLESTAT returns the following:
result(12): lower confidence limit for the variance (assuming normality)
result(13): upper confidence limit for the variance (assuming normality)

Are these the same as the lower/upper confidence limits for the standard deviation? I didn't know if they were since variance = (standard deviation)^2. If they are not, how can I go about getting them for standard deviation?

hcrisp
01-07-2010, 04:00 PM
In case anyone is interested, here is how you can get the confidence interval for standard deviation. I did conclude that the CI for the standard deviation is not the same as the CI for the variance.

npts = 30
x = RANDOMN(s, npts)

; get mean confidence interval
df = npts - 1
one_minus_alpha = 0.95 ; 95% Confidence Interval
alpha = 1. - one_minus_alpha
prob = 1. - alpha / 2.
x_sigma = STDEV(x, x_mean)
t_ahalf = TCDF(prob, df, /INVERSE, /DOUBLE)
error = t_ahalf * x_sigma / SQRT(npts)
mean_ci_low = x_mean - error
mean_ci_high = x_mean + error
info, mean_ci_low, mean_ci_high

; get standard deviation confidence interval
chi_sq_rt = CHISQCDF(prob, df, /INVERSE, /DOUBLE)
chi_sq_lt = CHISQCDF(1-prob, df, /INVERSE, /DOUBLE)
stdev_ci_low = SQRT((df * x_sigma^2) / chi_sq_rt)
stdev_ci_high = SQRT((df * x_sigma^2) / chi_sq_lt)
info, stdev_ci_low, stdev_ci_high

; compare to SIMPLESTAT
res = SIMPLESTAT(x)
info, res(10) ; mean_ci_low
info, res(11) ; mean_ci_high
info, res(12) ; var_ci_low
info, res(13) ; var_ci_high

totallyunimodular
01-08-2010, 10:27 AM

I did conclude that the CI for the standard deviation is not the same as the CI for the variance.

It looks to me like the square of the standard deviation values you computed manually are the same as what is returned by SIMPLESTAT...

WAVE> info, res(10) ; mean_ci_low
<Expression> FLOAT = -0.382960
WAVE> info, res(11) ; mean_ci_high
<Expression> FLOAT = 0.409303
WAVE> info, res(12) ; var_ci_low
<Expression> FLOAT = 0.713816
WAVE> info, res(13) ; var_ci_high
<Expression> FLOAT = 2.03385
WAVE> pm, stdev_ci_low^2
0.71381576
WAVE> pm, stdev_ci_high^2
2.0338474

I think an important caveat here is gleaned from the output of SIMPLESTAT: sample variance is Chi-square distributed if the underlying sample is Normal in distribution. Referring to the original question about ANOVA though, The F-test in an ANOVA is generally robust to departures from Normality although the test is no longer the "most powerful". But I am still not sure what the correct technique is for estimating the confidence level around pairwise differences between standard deviations.

Thanks again for posting your code!

hcrisp
01-11-2010, 09:29 AM
Ah, thanks! I had forgot to compare the square of the standard deviation to the variance calculated by SIMPLESTAT. With that additional step the comparison of the two for n = 30 is equal.

As to the assumption of normality, I can live with that, although I may have sample sizes less than 30. In that case, I should use the Student-t distribution, not the normal distribution. Unfortunately, the PV-WAVE documentation does not say which distribution is used in its algorithm for small sample sizes, only that the confidence limits "assume normality". Empirical tests do show that the calculated limits are equal to the results of my code, however, so it may be using the Student-t after all.

npts = 15
x = RANDOMN(s, npts)
res = SIMPLESTAT(x)
info, res(10) ; mean_ci_low
;<Expression> FLOAT = -0.488818
info, res(11) ; mean_ci_high
;<Expression> FLOAT = 0.326370
info, SQRT(res(12)) ; stdev_ci_low
;<Expression> FLOAT = 0.538860
info, SQRT(res(13)) ; stdev_ci_high
;<Expression> FLOAT = 1.16078