One-Way ANOVA
Assuming we have three levels of a factor, and we want to perform ANOVA. We can’t use the independent two-samples t-test or a non-parametric test.
We can use a One-Way ANOVA test however, using the F statistics
1## One-way ANOVA
2
3ide3 = read.csv("ide3.csv")
4View(ide3)
5ide3$Subject = factor(ide3$Subject) # convert to nominal factor
6ide3$IDE = factor(ide3$IDE) # Rv4
7summary(ide3)
8
9library(plyr)
10ddply(ide3, ~ IDE, function(data) summary(data$Time))
11ddply(ide3, ~ IDE, summarise, Time.mean=mean(Time), Time.sd=sd(Time))
12
13hist(ide3[ide3$IDE == "VStudio",]$Time)
14hist(ide3[ide3$IDE == "Eclipse",]$Time)
15hist(ide3[ide3$IDE == "PyCharm",]$Time) # new one
16plot(Time ~ IDE, data=ide3) # boxplot
17
18shapiro.test(ide3[ide3$IDE == "PyCharm",]$Time)
19m = aov(Time ~ IDE, data=ide3) # fit model
20shapiro.test(residuals(m)) # test residuals
21qqnorm(residuals(m)); qqline(residuals(m)) # plot residuals
22
23library(MASS)
24fit = fitdistr(ide3[ide3$IDE == "PyCharm",]$Time, "lognormal")$estimate
25ks.test(ide3[ide3$IDE == "PyCharm",]$Time, "plnorm", meanlog=fit[1], sdlog=fit[2], exact=TRUE) # lognormality
26
27ide3$logTime = log(ide3$Time) # add new column
28View(ide3) # verify
29shapiro.test(ide3[ide3$IDE == "PyCharm",]$logTime)
30m = aov(logTime ~ IDE, data=ide3) # fit model
31shapiro.test(residuals(m)) # test residuals
32qqnorm(residuals(m)); qqline(residuals(m)) # plot residuals
33
34library(car)
35leveneTest(logTime ~ IDE, data=ide3, center=median) # Brown-Forsythe test
36
37m = aov(logTime ~ IDE, data=ide3) # fit model
38anova(m) # report anova
39
40#
The omnibus tells us that there is some difference in the mean of completion time between the three IDEs. The omnibus test gives thus us permission to investigate into post-hoc tests to determine who’s better.
1library(multcomp)
2summary(glht(m, mcp(IDE="Tukey")), test=adjusted(type="holm")) # Tukey means compare all pairs, mcp stands for multiple comparisons
3
4#
5#
6#
Report the results
1F(2, 57) = 8.80, p<.001
In parentheses we have the two degrees of freedom. The first one is called numerator, the second is called the denominator.
References
#statistics #designing_running_and_analyzing_experiments #week4 #assumptions #test #coursera #f_test #experiment #theory #normality #design #anova #rlang