Chapter 4 Permutation Tests

4.1 Hypothesis Testing

  • A test statistic is a numerical function of the data, whose value determines the result of the test. It is denoted by \(T(X)\). The statistic computed by the realized sample is denoted by \(t\)
  • p-value: This is the probability that chance alone would produce a test statistic as extreme as the observed test statistic, if the null hypothesis were true
  • Statistically Significant : A result is statistically significant if it would rarely occur by chance
  • It is important to keep in mind that p-value by itself does not help you in rejecting the alternate hypothesis. You must also consider the cost of accepting / not accepting a false positive
  • Null Distribution is the distribution of the test statistic if the null hypothesis is true

4.2 Permutation Tests

4.2.1 Test on Beerwings dataset

set.seed(1234)
mus <-  as_tibble(Beerwings)%>%group_by(Gender)%>%summarise(mu=mean(Hotwings))%>%dplyr::select(mu)%>%unlist()
observed_stat <-  diff(mus)
hwings <- Beerwings$Hotwings
n <- 10^3 - 1 
m <- 30
perms <- replicate(n, {
    x <- sample(hwings, 30)
    mean(x[1:(m/2)])-mean(x[(m/2+1):m])
})
results <- perms > observed_stat
results <- c(TRUE, results)
pval <- mean(results)
print(pval)
## [1] 0.003

The fact that the pval is 0.003, might warrant a case to reject null hypothesis.

4.2.2 Test on Verizon dataset

set.seed(1234)
mus  <-  as_tibble(Verizon)%>%group_by(Group)%>%summarise(mu=mean(Time))%>%dplyr::select(mu)%>%unlist()
observed_stat <- mus[1]-mus[2]
realized_data <- Verizon$Time
n             <- 10^3 - 1 
classes       <- as_tibble(Verizon)%>%group_by(Group)%>%summarise(n=n())%>%dplyr::select(n)%>%unlist()
perms <- replicate(n, {
    x <- sample(realized_data, sum(classes))
    mean(x[1:classes[1]])-mean(x[(classes[1]+1):sum(classes)])
})
results <- (perms > observed_stat)
results <- c(TRUE, results)
pval <- mean(results)
print(pval)
## [1] 0.018

The fact that the pval is 0.018, might warrant a case to reject null hypothesis.

4.2.3 Test on Beerwings dataset -Two sided

set.seed(1234)
mus <-  as_tibble(Beerwings)%>%group_by(Gender)%>%summarise(mu=mean(Hotwings))%>%dplyr::select(mu)%>%unlist()
observed_stat <-  diff(mus)
hwings <- Beerwings$Hotwings
n <- 10^3 - 1 
m <- 30
perms <- replicate(n, {
    x <- sample(hwings, 30)
    mean(x[1:(m/2)])-mean(x[(m/2+1):m])
})
results <- (perms > observed_stat | perms < -observed_stat)
results <- c(TRUE, results)
pval <- mean(results)
print(pval)
## [1] 0.003

The fact that the pval is 0.003, might warrant a case to reject null hypothesis.

4.2.4 Test on Recidivism dataset

set.seed(1234)
Recidivism%>%group_by(Age25)%>%summarise(n=n())
## # A tibble: 3 x 2
##   Age25        n
## * <fct>    <int>
## 1 Under 25  3077
## 2 Over 25  13942
## 3 <NA>         3
reci2 <- Recidivism[complete.cases(Recidivism$Age),]
observed_stats <-  as_tibble(reci2)%>%group_by(Age25)%>%mutate(rec_status = Recid=="Yes")%>%summarise(mu=mean(rec_status))%>%unlist()
observed_stat <- observed_stats[3]-observed_stats[4]
realized_data <- reci2$Recid=="Yes"
counts <-  as_tibble(reci2)%>%group_by(Age25)%>%summarise(n=n())%>%unlist()
classes <- counts[3:4]
n <- 10^3 - 1 
perms <- replicate(n, {
    x <- sample(realized_data, sum(classes))
    mean(x[1:classes[1]])-mean(x[(classes[1]+1):sum(classes)])
})
results <- (perms>observed_stat | perms < -observed_stat)
results <- c(TRUE, results)
pval <- mean(results)
print(pval)
## [1] 0.001

The fact that the pval is 0.001, might warrant a case to reject null hypothesis.

4.2.5 Things to keep in mind

  • Samples are not unique: In the sampling implementation, it is computationally expensive to implement sampling without replacement. Generating all the unique samples is too expensive
  • Add one to Num and Den: One can add 1 to the num and denominator to take in to consideration that the original data as the extra resample
  • You should always perform a two sided hypothesis test, unless there is a particular reason to always perform a one sided test
  • Permutation procedures give a lot of flexibility. The basic procedure works for any test statistic. This means you can take a robust statistic of the sample and work with it
  • Assumptions
    • The permutation test makes no assumption about the distributional assumptions on the two populations
    • It is more robust to situations when the two populations under consideration are from two different populations
  • Why is the method called permutation? It is often the case that the class variable column is kept constant and the value column is permuted across rows.
  • Important to keep in mind whether to do a matched pair test or independent test