What is test of data normality ?

Shapiro-Wilk test: Measures the skewness and kurtosis of the data compared to a normal distribution.

Jarque-Bera test: Tests for the null hypothesis that the data follow a normal distribution. It considers skewness, kurtosis, and the excess kurtosis.

Kolmogorov-Smirnov test: Compares the cumulative distribution function (CDF) of the data to the CDF of a normal distribution.

Anderson-Darling test: Similar to the Kolmogorov-Smirnov test, but it assigns more weight to deviations in the tails of the distribution.

1. Graphical Methods:

Histogram: A bell-shaped histogram suggests normality.

Normal Probability Plot (Q-Q Plot): Plots the data against a theoretical normal distribution. A straight line indicates normality.

Box Plot: Whiskers of equal length above and below the median indicate normality.

2. Statistical Tests:

Shapiro-Wilk Test: Tests the null hypothesis that the data is normally distributed.

Jarque-Bera Test: Tests the null hypothesis that the data is normally distributed based on its skewness and kurtosis.

Anderson-Darling Test: A non-parametric test for normality.

Assumptions of Normality Tests:

Independence: Data points should be independent of each other.

Randomness: Data should be randomly sampled from the population.

Sufficient Sample Size: Typically, sample sizes of at least 30 are recommended for normality tests.

Interpretation of Results:

If the test results suggest that the data is not normally distributed, non-parametric tests or transformations may be necessary.

If the test results indicate normality, it supports the assumption that the population from which the data was drawn is also normally distributed.

Factors Affecting Normality:

Sample Size: Smaller sample sizes are less likely to exhibit normal distribution.

Outliers: Extreme data points can distort the distribution.

Skewness: Distribution can be skewed if there is more data concentrated on one side of the peak.

Kurtosis: Distribution can be flattened or peaked if the data deviates from the normal curve in terms of spread.

Consequences of Non-Normality:

Assumptions of many statistical methods may be violated, such as t-tests and ANOVA.

Results may be biased or inaccurate if non-normality is not accounted for.

Alternatives to Normality Tests:

Non-parametric tests (e.g., Mann-Whitney U test, Kruskal-Wallis test)

Transformations (e.g., logarithmic, square root) to achieve approximate normality

Monte Carlo simulations to assess robustness of statistical results under non-normality

Thesis helper