Chapter 4 Permutation Tests

4.1 Hypothesis Testing

A test statistic is a numerical function of the data, whose value determines the result of the test. It is denoted by \(T(X)\). The statistic computed by the realized sample is denoted by \(t\)
p-value: This is the probability that chance alone would produce a test statistic as extreme as the observed test statistic, if the null hypothesis were true
Statistically Significant : A result is statistically significant if it would rarely occur by chance
It is important to keep in mind that p-value by itself does not help you in rejecting the alternate hypothesis. You must also consider the cost of accepting / not accepting a false positive
Null Distribution is the distribution of the test statistic if the null hypothesis is true

4.2 Permutation Tests

4.2.1 Test on Beerwings dataset

set.seed(1234)
mus <-  as_tibble(Beerwings)%>%group_by(Gender)%>%summarise(mu=mean(Hotwings))%>%dplyr::select(mu)%>%unlist()
observed_stat <-  diff(mus)
hwings <- Beerwings$Hotwings
n <- 10^3 - 1 
m <- 30
perms <- replicate(n, {
    x <- sample(hwings, 30)
    mean(x[1:(m/2)])-mean(x[(m/2+1):m])
})
results <- perms > observed_stat
results <- c(TRUE, results)
pval <- mean(results)
print(pval)
## [1] 0.003

The fact that the pval is 0.003, might warrant a case to reject null hypothesis.

4.2.2 Test on Verizon dataset

set.seed(1234)
mus  <-  as_tibble(Verizon)%>%group_by(Group)%>%summarise(mu=mean(Time))%>%dplyr::select(mu)%>%unlist()
observed_stat <- mus[1]-mus[2]
realized_data <- Verizon$Time
n             <- 10^3 - 1 
classes       <- as_tibble(Verizon)%>%group_by(Group)%>%summarise(n=n())%>%dplyr::select(n)%>%unlist()
perms <- replicate(n, {
    x <- sample(realized_data, sum(classes))
    mean(x[1:classes[1]])-mean(x[(classes[1]+1):sum(classes)])
})
results <- (perms > observed_stat)
results <- c(TRUE, results)
pval <- mean(results)
print(pval)
## [1] 0.018

The fact that the pval is 0.018, might warrant a case to reject null hypothesis.

4.2.3 Test on Beerwings dataset -Two sided

set.seed(1234)
mus <-  as_tibble(Beerwings)%>%group_by(Gender)%>%summarise(mu=mean(Hotwings))%>%dplyr::select(mu)%>%unlist()
observed_stat <-  diff(mus)
hwings <- Beerwings$Hotwings
n <- 10^3 - 1 
m <- 30
perms <- replicate(n, {
    x <- sample(hwings, 30)
    mean(x[1:(m/2)])-mean(x[(m/2+1):m])
})
results <- (perms > observed_stat | perms < -observed_stat)
results <- c(TRUE, results)
pval <- mean(results)
print(pval)
## [1] 0.003

The fact that the pval is 0.003, might warrant a case to reject null hypothesis.

4.2.4 Test on Recidivism dataset

set.seed(1234)
Recidivism%>%group_by(Age25)%>%summarise(n=n())
## # A tibble: 3 x 2
##   Age25        n
## * <fct>    <int>
## 1 Under 25  3077
## 2 Over 25  13942
## 3 <NA>         3
reci2 <- Recidivism[complete.cases(Recidivism$Age),]
observed_stats <-  as_tibble(reci2)%>%group_by(Age25)%>%mutate(rec_status = Recid=="Yes")%>%summarise(mu=mean(rec_status))%>%unlist()
observed_stat <- observed_stats[3]-observed_stats[4]
realized_data <- reci2$Recid=="Yes"
counts <-  as_tibble(reci2)%>%group_by(Age25)%>%summarise(n=n())%>%unlist()
classes <- counts[3:4]
n <- 10^3 - 1 
perms <- replicate(n, {
    x <- sample(realized_data, sum(classes))
    mean(x[1:classes[1]])-mean(x[(classes[1]+1):sum(classes)])
})
results <- (perms>observed_stat | perms < -observed_stat)
results <- c(TRUE, results)
pval <- mean(results)
print(pval)
## [1] 0.001

The fact that the pval is 0.001, might warrant a case to reject null hypothesis.

4.2.5 Things to keep in mind

Samples are not unique: In the sampling implementation, it is computationally expensive to implement sampling without replacement. Generating all the unique samples is too expensive
Add one to Num and Den: One can add 1 to the num and denominator to take in to consideration that the original data as the extra resample
You should always perform a two sided hypothesis test, unless there is a particular reason to always perform a one sided test
Permutation procedures give a lot of flexibility. The basic procedure works for any test statistic. This means you can take a robust statistic of the sample and work with it
Assumptions
- The permutation test makes no assumption about the distributional assumptions on the two populations
- It is more robust to situations when the two populations under consideration are from two different populations
Why is the method called permutation? It is often the case that the class variable column is kept constant and the value column is permuted across rows.
Important to keep in mind whether to do a matched pair test or independent test