В Чернобыльской зоне построят солнечную электростацию

Question

This was something I thought about:

If you have data from a coin with a very low probability of success, it is possible that the Wald confidence interval can contain negative numbers and be illogical? But if you do bootstrap, the ci should never be illogical, i.e. worse case scenario the lower bound will always be exactly 0:

I verified this in R:

set.seed(123)
p_true <- 0.001
n <- 1000
n_sims <- 100
n_boot <- 100

wald_negative <- 0
bootstrap_negative <- 0

for(i in 1:n_sims) {
  x <- rbinom(n, 1, p_true)
  successes <- sum(x)
  
  p_hat <- successes / n
  
  se <- sqrt(p_hat * (1 - p_hat) / n)
  wald_lower <- p_hat - 1.96 * se
  wald_upper <- p_hat + 1.96 * se
  
  if(wald_lower < 0) {
    wald_negative <- wald_negative + 1
  }
  
  boot_estimates <- numeric(n_boot)
  for(j in 1:n_boot) {
    boot_x <- sample(x, n, replace = TRUE)
    boot_estimates[j] <- sum(boot_x) / n
  }
  
  boot_lower <- quantile(boot_estimates, 0.025, 
                         names = FALSE)
  boot_upper <- quantile(boot_estimates, 0.975, 
                         names = FALSE)
  
  if(boot_lower < 0) {
    bootstrap_negative <- bootstrap_negative + 1
  }
}

cat("Results from", n_sims, "simulations:\n")
cat("Wald CI negative:", wald_negative, 
    "times (", round(100 * wald_negative / n_sims, 1), 
    "%)\n")
cat("Bootstrap CI negative:", bootstrap_negative, 
    "times (", round(100 * bootstrap_negative / n_sims, 1), 
    "%)\n")

If you have data from an exponential random variable with a rate parameter very close to being 0, it is possible that the Wald confidence interval can contain negative numbers and be illogical? But if you do bootstrap, the ci should never be illogical. i.e. worse case scenario the lower bound will always be exactly 0

I also verified this in R:

set.seed(123)
lambda_true <- 0.001
n <- 100
n_sims <- 100
n_boot <- 100

wald_negative <- 0
bootstrap_negative <- 0

for(i in 1:n_sims) {
    x <- rexp(n, rate = lambda_true)
    
    lambda_hat <- 1 / mean(x)
    
    se <- lambda_hat / sqrt(n)
    wald_lower <- lambda_hat - 1.96 * se
    wald_upper <- lambda_hat + 1.96 * se
    
    if(wald_lower < 0) {
        wald_negative <- wald_negative + 1
    }
    
    boot_estimates <- numeric(n_boot)
    for(j in 1:n_boot) {
        boot_x <- sample(x, n, replace = TRUE)
        boot_estimates[j] <- 1 / mean(boot_x)
    }
    
    boot_lower <- quantile(boot_estimates, 0.025, 
                           names = FALSE)
    boot_upper <- quantile(boot_estimates, 0.975, 
                           names = FALSE)
    
    if(boot_lower < 0) {
        bootstrap_negative <- bootstrap_negative + 1
    }
}

cat("Results from", n_sims, "simulations:\n")
cat("True lambda:", lambda_true, "\n")
cat("Wald CI negative:", wald_negative, 
    "times (", round(100 * wald_negative / n_sims, 1), 
    "%)\n")
cat("Bootstrap CI negative:", bootstrap_negative, 
    "times (", round(100 * bootstrap_negative / n_sims, 1), 
    "%)\n")

If this is true, is the bootstrap CI always more advantageous than the Wald CI? Now with modern computers where simulation is not a problem, wont bootstrapping CI almost always be better (ie avoid illogical problem) if not same as Wald?

As bootstrapping resamples the original data, a bootstrap sample can never include impossible values (unless they occur through measurement error). Turn and turn about, consider bootstrapping maximum or minimum; a bootstrap sample can never include values beyond the observed maximum or minimum and confidence intervals will not be helpful. This is I believe an utterly standard example but the reminder may help. — Nick Cox, Commented yesterday
I wonder how bad a problem the "illogical problem" in your sense actually is. If you know the minimum is 0, you can just cut the values lower than 0 from your CI, problem solved. — Christian Hennig, Commented yesterday
Note by the way that Wald (and anything that computes a symmetric interval based on standard error) implicitly assumes a normal distribution with negative values well possible. Under this assumption the result isn't "illogical". Application to other settings relies on the central limit theorem, i.e., a large enough sample, and it is well known that in a binomial problem with a very low success probability the required sample size is quite large. Note also that the bootstrap for small samples can be quite imprecise, "illogical" or not. — Christian Hennig, Commented yesterday
My reaction to reading the title was "but of course!" This can happen whenever you employ a statistic whose values (on at least one resample of the data) can be "illogical," such as lying in an impossible range of the estimand. One example of such a statistic would be when estimating a component of variance by subtracting one estimated variance from another. You might elect to keep negative results in the spirit of yielding an (approximately) unbiased estimator. — whuber, Commented yesterday

Demetri Pananos · Accepted Answer · 2025-08-08 18:25:03Z

11

There are a few kinds of bootstrap confidence intervals, and it appears you're using the percentile method. Yes, the percentile bootstrap confidence intervals will never cover infeasible parameter space (assuming the statistic you're using doesn't ever become infeasible. For example, the sample mean can't be negative if all the data are non-negative). This is because the point estimate can never be infeasible, and the percentile method calculates the confidence bounds from the bootstrapped point estimates.

Just as there are a few bootstrap confidence intervals, so too are there several binomial confidence intervals. If you are studying a rare outcome, it may be advantageous to use something like a Wilson Interval or a Clopper Pearson interval.

Similarly, for a random variable supported on the positive reals -- such as an exponential random variable -- it may make sense to calculate the confidence interval in log space and then transform the interval via the exponential.

Bootstrapping is a great invention, but analytic results extend beyond the Wald interval, especially for cases which are bounded like the ones you provide. These analytic results can have good coverage in some circumstances, and the percentile confidence interval is by no means the best of the bootstrap confidence intervals.

edited yesterday

answered yesterday

Demetri Pananos

41.4k2 gold badges66 silver badges161 bronze badges

2

$\begingroup$ (+1) Even though "percentile bootstrap confidence intervals will never cover infeasible parameter space," they might well be "illogical" in a more limited sense: the confidence intervals might not even cover the true value of a statistic, if the estimator is biased. See this 10-year-old question for the case of Shannon entropy. $\endgroup$
– EdM
Commented yesterday
$\begingroup$ @EdM In your opinion, is the fact that the confidence intervals might not even cover the true value of a statistic a problem with the bootstrap, or the estimator? Could I use an unbiased estimator and still not have the bootstrap CI cover the estimand (systematically, not a false positive as an example). $\endgroup$
– Demetri Pananos
Commented yesterday
$\begingroup$ As far as I understand (which unfortunately isn't really that far), the problem is with a biased estimator. If there's no bias or skewness in the quantity being estimated and the quantity is pivotal, then I understand that most any bootstrapping should be OK. The BCa bootstrap should help if there is bias/skewness. $\endgroup$
– EdM
Commented yesterday

Add a comment |

jginestet · Accepted Answer · 2025-08-08 20:44:10Z

5

I will just add a few comments, on the margins, to @DemetriPananos' very good answer.

There are many (really, many) methods for computing a binomial proportion CI; the Wald method is about as poor as it gets (the first section of Wikipedia's article on binomial CI's is titled "Problems with using a normal approximation or "Wald interval"". A normal is a very poor approximation for a binomial when p is close to 0, or 1. This is well known, and is a reason why many other methods came about (Wilson, Agresti-Coull, Jeffreys, Clopper-Pearson, Blaker, etc.). Wald had a place in days before computers; I am not sure why it is still being taught today? Clopper Pearson, or Blaker, should be the default (it used to be painful to compute by hand, in part because of factorials).

I also want to address your final sentence; "Now with modern computers where simulation is not a problem, wont bootstrapping CI almost always be better (ie avoid illogical problem) if not same as Wald?". Bootstrapping is certainly a major advance for statistics, but it is not a "silver bullet". The major assumption for any of the many bootstrapping methods to provide "reasonable" answers is that the sample be representative of the population. This is fundamentally untestable (since we do not know the population's distribution), so it really means that the sample size should be large, and typically larger than what would be required for parametric methods.

edited yesterday

answered yesterday

jginestet

10.9k2 gold badges7 silver badges29 bronze badges

2

$\begingroup$ For what it's worth, Frank Harrell prefers the Wilson CI for binomial proportions over the alternatives (see ?Hmisc::binconf in R) and this is also the only preferred method within the UK civil service (or at least some parts of it I'm familiar with - it's a big organisation!). $\endgroup$
– Silverfish
Commented yesterday
$\begingroup$ People who prefer the Wilson CI give the following references in support of their case: Agresti, A., and Coull, B. (1998). Approximate is better than ‘exact’ for interval estimation of binomial proportions. Am Stat 52: 119–126; Newcombe, R. (1998). Two-sided confidence intervals for the single proportion: Comparison of seven methods. Stat Med 17: 857–872; Newcombe, R., and Altman, D. (2000). Proportions and their differences (in statistics with confidence) (BMJ books). $\endgroup$
– Silverfish
Commented yesterday
1

$\begingroup$ As for why the Wald interval is still taught - and I'm afraid I've taught many different syllabuses where it was compulsory and no other method was to be seen! - the ease of computation is still a selling point, particularly when students take pen and paper exams with calculator not computer. It was a topic in A-Level Mathematics / Further Mathematics (final year of high school) for example, but many undergraduate and even graduate exams follow a similar format. It's also well suited for approximate mental computation, explaining how margin of error in polls depends on $n$ (and $p$), etc. $\endgroup$
– Silverfish
Commented yesterday
$\begingroup$ @Silverfish, thanks for the additional information; very informative. I will definitively take a look at the "Approximate is better than ‘exact’" paper; provocative title for sure :-). I know many find the "exact" binomial "too conservative", and I never got why one would want more than $\alpha$% Type I errors when he null is true? It is not as if Type I errors disappear the moment the null is "barely false"; they just become what some have called Type III errors (or A Gelman has called Type S and Type M errors). Thanks again $\endgroup$
– jginestet
Commented yesterday
$\begingroup$ I believe that's the paper where the Agresti–Coull interval was introduced - which is somewhat curious, given that it's frequently cited in support of the Wilson interval! $\endgroup$
– Silverfish
Commented yesterday

| Show 1 more comment

肚子疼做什么检查	耳鸣挂什么科	婴儿掉头发是什么原因	二次元文化是什么意思	肝阳上亢是什么意思
豆干炒什么好吃	骨质增生什么意思	糖尿病可以吃什么菜	纨绔子弟是什么意思	肩膀疼挂什么科
炖牛肉放什么料	排卵试纸两条杠是什么意思	受害者是什么意思	吃什么补记忆力最快	1月22号什么星座
无限极是干什么的	经常偏头疼是什么原因	5月30号是什么星座	两边太阳胀痛什么原因引起的	胸痒痒是什么原因

什么马奔腾hcv9jop1ns9r.cn	中度贫血是什么原因造成的onlinewuye.com	过敏性鼻炎喷什么药hcv9jop6ns8r.cn	引火归元是什么意思hcv7jop6ns7r.cn	绿豆不能和什么同吃hcv8jop1ns9r.cn
浓缩汁是什么意思hcv8jop1ns7r.cn	漏斗胸是什么病hcv9jop6ns6r.cn	冠冕堂皇什么意思hcv8jop3ns6r.cn	组cp是什么意思hcv8jop8ns4r.cn	高血压吃什么降的快hcv9jop0ns7r.cn
3p 什么感觉hcv8jop9ns5r.cn	检查贫血挂什么科hcv7jop9ns3r.cn	5是什么生肖hcv9jop1ns0r.cn	为什么大便是绿色的hcv8jop0ns2r.cn	鹅蛋脸适合什么发型hcv7jop6ns2r.cn
叶公好龙是什么故事cl108k.com	巾帼不让须眉是什么意思hcv9jop8ns1r.cn	继往开来是什么意思hcv8jop5ns5r.cn	七月十六是什么日子imcecn.com	颈椎不舒服挂什么科hcv9jop7ns5r.cn

Stack Exchange Network

В Чернобыльской зоне построят солнечную электростацию

2 Answers 2

Your Answer

Linked

Hot Network Questions

В Чернобыльской зоне построят солнечную электростацию

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Linked

Related

Hot Network Questions