Chapter 9 More Hypothesis Testing

What do we mean by hypothesis testing ? Let’s say you have a sample from the population and you are interested in some mean of the population. You can use the realized summary statistic, compute the standard error and then either use a Normal distribution or a \(t\) distribution to reject the null or reject alternate

  • If the population distributoin is not normal but has finite variance, then as \(n \to \infty\), the distribution of \(T\) approaches a \(t\) distribution, by CLT
  • In practice, we may use the \(t\) distribution to compute approximate \(p\) values if the sample size is large and sample distribution is not too skewed

9.1 Bootstrap t test

In this procedure, one computes the t-statistic from the sample. Subsequently bootstrap samples are drawn and a sampling distribution for t-statistic is obtained. The pvalue is the fraction of times the bootstrap statistic exceeds the original statistic.

9.1.1 t.test

t.test(Bangladesh$Arsenic, mu=100, alt="greater")
## 
##  One Sample t-test
## 
## data:  Bangladesh$Arsenic
## t = 1.3988, df = 270, p-value = 0.08151
## alternative hypothesis: true mean is greater than 100
## 95 percent confidence interval:
##  95.44438      Inf
## sample estimates:
## mean of x 
##  125.3199

9.1.2 Recomputation


n <- length(Bangladesh$Arsenic)
mu <-  mean(Bangladesh$Arsenic)
stdev <- sd(Bangladesh$Arsenic)
t_obs <- (mu-100)/(stdev/sqrt(n))
print(1-pt(t_obs, n-1))
## [1] 0.08150503

9.1.3 Bootstrap t test


t_samples <- replicate(1000, {
    y <- sample(Bangladesh$Arsenic, n, T)
    mu_b <-  mean(y)
    stdev_b <- sd(y)
    (mu_b - mu)/(stdev_b/sqrt(n))
    }
)

mean(t_samples > t_obs)
## [1] 0.069

Bootstrap \(t\)-tests are usually quite accurate. Under fairly general assumptions, the difference between actual type I error rate and nominal rate is less than or equal to \(c/n\) for some \(c\), where \(n\) is the sample size. In contrast, for \(t\)-tests, the errors are less than equal to \(c/sqrt{n}\)

9.2 Hypothesis tests : Two populations

t.test(Weight~Tobacco, data = NCBirths2004, alt="greater")
## 
##  Welch Two Sample t-test
## 
## data:  Weight by Tobacco
## t = 4.1411, df = 134.01, p-value = 3.04e-05
## alternative hypothesis: true difference in means is greater than 0
## 95 percent confidence interval:
##  129.009     Inf
## sample estimates:
##  mean in group No mean in group Yes 
##          3471.912          3256.910

9.2.1 Fischer’s exact test

prop.test(c(550,425), c(684, 563))
## 
##  2-sample test for equality of proportions with continuity correction
## 
## data:  c(550, 425) out of c(684, 563)
## X-squared = 4.101, df = 1, p-value = 0.04286
## alternative hypothesis: two.sided
## 95 percent confidence interval:
##  0.001251812 0.097166228
## sample estimates:
##    prop 1    prop 2 
## 0.8040936 0.7548845

9.3 Type I and Type II errors

Terminology - False positive - False negative - Rejection region - Critical values

9.4 Power of a Test

  • power of a test \(1 - \beta\) = \(P(Reject H_0 | H_A =True)\)
    • The study has low power means that there was not enough data collected to detect an effect
    • Power is the probability of correctly rejecting a false null hypothesis

The factors that determine the power of a test are - Effect Size: The difference between hypothesized mean and the actual mean. The larger the difference, the more likely we would detect the difference - \(\sigma\sqrt{n}\): The larget the sample size, the smaller the denominator and hence the large the amount to be subtracted. Hence it increases power - Significance level \(\alpha\): Increasing \(\alpha\) decreases the quantile \(q\) and pushes that lower bound to the left, increasing the power

9.5 Likelihood Ratio Tests

I guess this is the first time I have really understood the meaning of Likelihood Ratio Tests. Firstly one must be able to differentiate between Simple Hypothesis and Composite Hypothesis

A Simple Hypothesis is one if it completely specifies the distribution of the population. Othewise, it is Composite

For Simple hypothesis, one can easily compute LRT \[ T = {L(\theta_0) \over L(\theta_A)} - {L(\theta_0|X_1, X_2, \ldots X_n) \over L(\theta_A|X_1, X_2, \ldots, X_n)} \]

The idea is that you check whether this statistic is very small. If it happens to be very small, it means that the alternate hypothesis is so large that one can reject the null

9.6 Neyman-Pearson Lemma

Let \(X_1, X_2, \ldots X_n\) be a random sample from a distribution with parameter \(\theta\). Suppose we wish to test two simple hypotheses \[ H_0: \theta = \theta_0\] versus \[H_A: \theta = \theta_A\] The LRT rejects the null hypothesis if the test statistic satisfies \(T < c\) at a significance level \(\alpha\). Then any test with significance level less than or equal to \(\alpha\) has power less than or equal to the power for this LRT. That is, for a fixed \(\alpha\), the LRT minimizes the probability of a type II error

9.7 LR Test for Composite Hypothesis

Let \(\Omega\) denote the set of possible values for \(\theta\) and \(\Omega_o\) the subset thst satisfies the null hypothesis. We test \[ H_0 : \theta \in \Omega_0\] vs \[ H_A: \theta \in \Omega \setminus \Omega_0\]

The LRT statistic for testing the hypothesis given by \[ T(X) = {\max_{\Omega_0} L(\hat \theta |x) \over \max_{\Omega} L(\hat \theta |x)} \] An LRT rejects the null hypothesis when \(T(X)\leq c\) for some number \(c, 0\leq c \leq 1\). Given an observed value \(T(x)\), the p-value is \(P(T(X)\leq T(x))\)