Two component modified Lilliefors test for normality

Research background: Commonly known and used parametric tests e.g. Student, Behrens? Fisher, Snedecor, Bartlett, Cochran, Hartley tests are applicable when there is an evidence that samples come from the Normal general population. What makes things worse is that testers are not fully aware in what degree of abnormality distorts results of parametric tests listed above and suchlike. So, it is no exaggeration to say that testing for normality (goodness-of-fit testing, GoFT) is a gate to proper parametric statistical reasoning. It seems that the gate opens too easily. In other words, most popular goodness-of-fit tests are weaker than statisticians want them to be. 
Purpose of the article: The main purpose of this paper is to put forward the GoFT that is, in particular circumstances, more powerful than GoFTs used until now. The other goals are to define a similarity measure between an alternative distribution and the normal one and to calculate the power of normality tests for a big set of alternatives. And, of course, to interest statisticians in using the GoFTs in their practice. 
Method: There are two ways to make GoFT more powerful: extensive and intensive one. The extensive method consists in drawing large samples. The intensive method consists in extracting more information from mall samples. In order to make the test method intensive, the test statistics, as distinct from all existing GoFTs, has two components. The first component (denoted by ?) is a classic Kolmogorov / Lilliefors test statistics i.e. the greatest absolute difference between theoretical and empirical cumulative distribution functions. The second component is the order statistics (r) at which the ?_max^((r) ) locate itself. Of course ?_max^((r) ) is the conditional random variable with (r) being the condition. Large scale Monte Carlo simulations provided data sufficient to in-depth study of properties of distributions of ?_max^((r) ) random variable. 
Findings & value-added: Simulation study shows that the Two Component Modified Lilliefors test for normality is the most powerful for some type of alternatives, especially for the symmetrical, unimodal and bimodal distributions with positive excess kurtosis, for symmetrical and unimodal distributions with negative excess kurtosis and small sample sizes. Due to the values of skewness and excess kurtosis, and the defined similarity measure between the ND and an alternative, alternative distributions are close to the normal distribution. Numerous examples of real data show the usefulness of the proposed GoFT.


Introduction
One cannot disagree with the statement that the most common package of statistical inference procedures are applied to the normal distribution parameters. However, before analysts use one of these procedures, they need to check whether the empirical data come from the general population in which the normal distribution "is in force". In other words, analysts are required to perform a goodness-of-fit test (GoFT). As shown in the next section, many authors have proposed GoFTs for normality in recent years. Their goal was to define the most powerful GoFT. The goal of the author of this work is to join a large family of authors of this type of tests.
Any GoFT has its unique measure of discrepancy between theoretical and sample distributions, commonly called the test statistics. In the pre-Monte Carlo era inventors of GoFTs had their hands tied. Formulas of test statistics had to be simple. This simplicity was an indispensable condition for test statistics distributions, which were to be derived in an analytical way. It was the only way they had. In the Monte Carlo era, inventors of GoFTs have their hands freed thanks to Lilliefors. The first goal of the paper is to put into practice the Two Component Modified Lilliefors test for normality. The first component is the greatest absolute value of a difference between CDF and EDF. The second component is the order statistics at which the greatest difference is located. The second goals is to define a similarity measure between an alternative distribution and the normal one. The third goal is to calculate the power of normality tests for a big set of alternatives. The implementation of the new proposal by means of five real data sets is presented.
The other sections of the paper are as follows. Section 2 provides a literature review on GoFTs for normality. Section 3 describes modified Lilliefors goodness-of-fit test for normality with two-component test statistics and presents the family of non-normal (alternative) distributions. Section 4 shows how particular tests vary in their power and evaluates the performance of the proposed test based on real-life examples. Section 5 is dedicated to the discussion. The paper ends with conclusions. The R codes are provided in Appendix and the notation used throughout the paper is presented in Table 0.

Research methodology
Let ( ) , ( ) , … , ( ) be a sample of size . The Lilliefors test statistics for normality is given by (Lilliefors, 1967) where ( ) is the EDF, ̅ is the sample mean and D is the sample standard deviation with denominator ( − 1). The B test for normality is meant for testing ; ' : data come from the ND and ; : not ; ' . Hypothesis ; ' is rejected when LF value exceeds the critical value. The Modified Lilliefors test for normality is defined by the test statistic (6). For the needs of the paper, let's call it One Component Modified Lilliefors Test for Normality.

Two Component Modified Lilliefors test for normality
This section deals with the first aim of the paper. The Two Component Modified Lilliefors test with EDF (5) test denoted by FGB 4 5, 6 ̅ has two components. The first component (denoted by ∆) is, like in the original LF test, an absolute value of the greatest discrepancy between sample and population distributions. The second component (denoted by r) is a position in an ordered sample at which this discrepancy is located. Both components, ∆ and r, are random variables. The conditional distributions of the random variable ∆ J given r = r * , r * = 1,2, … , n, where n is a sample size, were determined with the Monte Carlo method. Then, this distribution served as the basis for determining conditional critical values ∆ J 8 . The decision rule remains unchanged. The ; ' is accepted when ∆ J * < ∆ J * 8 . The FGB 4 5, 6 ̅ test statistics is defined for 4 5, 6 ̅ ≤ 1 as where 8 9,: 9 ( ) is given by (5), ̅ is the sample mean, D is the sample standard deviation with denominator ( − 1) and r is the order statistics at which an absolute value of the greatest difference between CDF of the <(0,1) and EDF is located. The FGB 4 5, 6 ̅ test for normality is meant for testing ; ' : data come from the ND with PDF T( ; 0,1) and ; : not ; ' . The Monte Carlo simulations use the 4 5, 6 ̅ values presented in Table 1 (see Sulewski, 2019b).

Alternative distributions
The second goal of the paper is to define a similarity measure between the ND and alternative distribution. A set of alternative distributions considered in this paper comprises 60 distributions further called alternatives. Alternatives used in Monte Carlo simulations, based on the inequality V ≥ V − 2 (Malachov, 1978), are presented in groups A1-F2 (see Table 2). One can obtain skewness and excess kurtosis values belonging to each of the analyzed group (except C2) by selecting appropriate parameter values of the CND.
Other alternatives are presented in Table 3. Among them, there are distributions: studied in Gan and Koehler (1990), Esteban et al. (2001), Krauczi (2009) andTorabi et al. (2016); selected by the author (in italic) and defined by the author (in bold).
The alternatives are defined for negative and no-negative real numbers. Their domain, except uniform distribution and truncated N(0,1) one, is real numbers. Table 4 presents the alternatives used in the Monte Carlo simulation, divided into twelve groups and numbered from 1 to 60. The author chose the alternatives and values of their parameter so that they were close to the ND. Two measures were used to choose their parameters. The first one is the skewness and excess kurtosis. The other one is the similarity measure between the ND and an alternative. This measure is defined based on the test statistic (7) and is given by where ( ) is the CDF of an alternative with mean and standard deviation [. Obviously, the smaller the value of Z, the bigger the alternative similarity to the ND. The Z values for all the alternatives are presented in Table 4.
Summing up, two hypotheses were formulated, namely ; ' and ; . Hypothesis ; ' states that data come from the general population where the ND holds good. Hypothesis ; , in turn, contradicts ; ' . It states that data come from the general population where non-normal distribution holds good. A set of alternatives considered in this paper comprises 60 distributions. The alternatives were listed in Table 4.
A collection of ∆ J \ critical values ( ] = 1,2, … , ) for the type I error 4 is determined on the basis of N = 10^ Monte Carlo experiments. One experiment consisted of generating = 20,50,100 data points from the <(0,1) and calculating values of the test statistics (7) for seven EDFs from  Table 5) are determined by using N = 10 0 experiments because the probability of occurrence of the extreme order statistics is very small. Unknown quantile values are replaced by appropriate order statistics _`' .', = a (b'.c,d e f) . Table 5 presents critical values ∆ J '.', for the FGB (4 5, 6 ̅ ) test for = 20. As it was mentioned in Sulewski (2019b), 4 5 and 6 ̅ values depend on the nature of an alternative that seems to pass for the ND. The 4 5 = 0 and 6 ̅ = 1 (i.e. the EDF is (/ ) for the alter-natives with positive skewness. The 4 5 = 1 and 6 ̅ = 0 (i.e. the EDF is (( − 1)/ ) for the alternatives with negative skewness. The bimodality of the alternatives does not affect the choice of the 4 5 and 6 ̅ values. In order to contrast FGB 4 5, 6 ̅ test with B 4 5, 6 ̅ test in view of their power Table 14 was constructed. The symbol < 4 5, 6 ̅ (in bold) means that the power of the FGB 4 5, 6 ̅ test is higher by at least 0.002 than the power of the B 4 5, 6 ̅ test. The symbol g 4 5, 6 ̅ means the opposite situation. The symbol g< 4 5, 6 ̅ means that the difference between the PoTs is less than 0.002. The PoTs differ by less than 0.002 for 6 alternatives for = 20, 7 alternatives for = 50, 15 alternatives for = 100. Thus, the higher the sample size, the more difficult it is to select the powerful test, which obviously was expected. The new GoFT is the most powerful for alternatives
Appendix presents R codes to calculate the FGB 4 5, 6 ̅ test statistic with ] position, the corresponding p-value and the critical value for m = 10^ experiments.

Discussion
The first goal of the paper is to put into practice the Two Component GoFT for normality. Critical values of the proposed test are determined for position ] = 1,2, … , of discrepancy between theoretical and empirical cumulative distribution functions. Due to values of the similarity measure between two distributions (the second goal), the alternatives divided in twelve groups according to skewness and excess kurtosis values are close to the ND.
The FGB (1,1) test is the most powerful for all the analyzed symmetrical, unimodal and bimodal distributions with positive excess kurtosis (groups C1 and C2) and for the alternative numbered 5 for the sample size = 50,100. The FGB (0,1) is the most powerful for all the analyzed (except alternative 30), symmetrical and unimodal distributions with negative excess kurtosis (group D1) for the sample size = 20. The FGB (0,0) is the most powerful for the selected, symmetrical and unimodal distributions with negative excess kurtosis (group D1) for sample size = 50,100. The FGB (1,0) test is the most powerful for the alternatives 17 ( = 50,100) and 57 ( = 20,50,100). The FGB (0,1) test is recommended for the selected alternatives from group A2, especially for higher sample sizes. The FGB (1,0) test is recommended for the selected alternatives from group B2, especially for higher sample sizes.

Conclusions
To sum up, the proposed test, as mentioned in the previous section, is the most powerful for some alternatives. The simulation study, as could be expected (Janssen, 2000), shows that none of the examined tests can be considered as the best for all the alternatives. The good performance of the FGB 4 5, 6 ̅ test against other most popular GoFTs through the analysis of real data sets is illustrated. The Monte Carlo simulations are carried out for 60 alternatives. Such a large number of distributions is used to ensure reliable results. Is the FGB 4 5, 6 ̅ for exponentiality also noteworthy? The author of this paper intends to answer this question in the near future. Source: Sulewski (2019b).