What is Two-Sample Kolmogorov-Smirnov Test?

 What is Two-Sample Kolmogorov-Smirnov Test?






The two-sample Kolmogorov-Smirnov test is a non-parametric test used to compare the distributions of two independent samples. It tests whether the two samples are drawn from the same distribution.


Purpose:

To test whether two independent samples come from the same continuous distribution.


Assumptions:

• The samples are independent.

• The samples are drawn from continuous distributions.


Null Hypothesis:

H0: The two samples come from the same distribution.

Alternative Hypothesis:

Ha: The two samples come from different distributions.


Test Statistic:

The test statistic is the maximum difference between the cumulative distribution functions (CDFs) of the two samples:

D = sup |F1(x) - F2(x)|


where F1(x) and F2(x) are the empirical CDFs of the first and second samples, respectively.


Critical Value:

The critical value for the test is determined from the distribution of the test statistic under the null hypothesis. The critical value depends on the sample sizes and the significance level α.


Procedure:

1. Calculate the empirical CDFs of the two samples.

2. Find the maximum difference between the two CDFs.

3. Compare the maximum difference to the critical value.

4. If the maximum difference is greater than the critical value, reject the null hypothesis and conclude that the two samples come from different distributions. Otherwise, fail to reject the null hypothesis.


Interpretation:

• If the null hypothesis is rejected, it suggests that the two samples are likely to come from different distributions.

• If the null hypothesis is not rejected, it does not necessarily mean that the two samples come from the same distribution. It could also indicate that the sample sizes are too small to detect a difference.



Advantages:

• Non-parametric test, does not require assumptions about the distribution of the samples.

• Powerful against alternatives that result in large differences in CDFs.


Disadvantages:

• Less powerful than parametric tests when the samples come from normal distributions.

• Can be sensitive to outlies


Applications of the Two-Sample Kolmogorov-Smirnov Test:


• Comparing the distributions of two different populations: For example, comparing the distribution of incomes in two different countries or the distribution of test scores for two different groups of students.

• Testing for the equality of two distributions before performing other statistical tests: For example, before performing a t-test to compare the means of two groups, it is important to test whether the distributions of the two groups are similar.

• Detecting changes in a distribution over time: For example, comparing the distribution of stock prices before and after a major event to see if there has been a significant change.

• Goodness-of-fit testing: Testing whether a sample comes from a specified distribution. For example, testing whether a sample of data follows a normal distribution.

• Non-parametric regression: Estimating the relationship between a response variable and one or more predictor variables without making assumptions about the form of the relationship.


Examples:


• Testing whether the distribution of test scores for a new educational program is different from the distribution of scores for a traditional program.

• Detecting changes in the distribution of customer satisfaction ratings over time to see if a new marketing campaign has had an effect.

• Testing whether a sample of data follows a normal distribution before performing a parametric statistical test that assumes normality.


Example:

Suppose we have two independent samples of data, representing the heights of men and women. We want to test whether the distributions of heights for men and women are different.


Data:

Men's heights: 68, 70, 72, 74, 76, 78, 80, 82, 84, 86

Women's heights: 62, 64, 66, 68, 70, 72, 74, 76, 78, 80


Procedure:

1. Calculate the ECDFs for both samples:

Men's ECDF:

x | ECDF

-- | ----

68 | 0.1

70 | 0.2

72 | 0.3

74 | 0.4

76 | 0.5

78 | 0.6

80 | 0.7

82 | 0.8

84 | 0.9

86 | 1.0


Women's ECDF:

x | ECDF

-- | ----

62 | 0.1

64 | 0.2

66 | 0.3

68 | 0.4

70 | 0.5

72 | 0.6

74 | 0.7

76 | 0.8

78 | 0.9

80 | 1.0


2. Calculate the Kolmogorov-Smirnov statistic:

D = max(|Men's ECDF - Women's ECDF|) = 0.2


3. Determine the critical value:

Using a significance level of 0.05 and sample sizes of 10, the critical value is approximately 0.35.

4. Compare the test statistic to the critical value:

Since D (0.2) is less than the critical value (0.35), we fail to reject the null hypothesis.


Interpretation:

The p-value for this test is approximately 0.15, which is greater than the significance level of 0.05. Therefore, we do not have enough evidence to conclude that the distributions of heights for men and women are different.


                                                          Thank you for reading!


Comments

Popular posts from this blog

Citrobacter freundii

Spearman's Rank Correlation Coefficient (ρ)