Logit and Probit application for the prediction of bankruptcy in Slovak companies

Research background: Prediction of bankruptcy is an issue of interest of various researchers and practitioners since the first study dedicat ed to this topic was published in 1932. Finding the suitable bankruptcy prediction model is the task for economists and analysts from all over the world. forecasting model using. D espite a large number of various models, which have been created by using different methods with the aim to achieve the best results, it is still challenging to predict bankruptcy risk, as corporations have become more global and more complex. Purpose of the article: The aim of the presented study is to construct, via an empirical study of relevant literature and application of sui table chosen mathematical statistical methods, models for bankruptcy prediction of Slovak com panies and provide the comparison of overall prediction ability of the two developed mod els. Methods: The research was conducted on the data set of Slova k corporations covering the period of the year 2015, and two mathematical stati stical methods were applied. The methods are logit and probit, which are both symmetric binary choice models, also known as conditional probability models. On the other hand, these methods show some significant differences in process of model formation, as well as in achieved results. Equilibrium. Quarterly Journal of Economics and Eco nomic Policy, 12(4), 775–791 776 Findings & Value added: Given the fact that mostly discriminant analysis an d logistic regression are used for the construction of bankrup tcy rediction models, we have focused our attention on the development bankruptcy predict ion model in the Slovak Republic via logistic regression and probit. The results of the study suggest that the model based on a logit functions slightly outperforms the classifi cation accuracy of probit model. Differences were obtained also in the detection of the mo st significant predictors of bankruptcy prediction in these types of models constructed in Slovak companies.


Introduction
Application of bankruptcy prediction models had been widely spread in advanced economies mainly in the western part of the world since the first study in this area carried out by Fitzpatrick (1932, pp. 598-605).Since that time, numerous economists and analysts from all over the world have been trying to find an appropriate company's bankruptcy forecasting model applying different methods with the aim to achieve the best results (Ravi Kumar & Ravi, 2007, pp. 1-28).Later it has become also an issue of growing interest for researchers of capitalist, socialist, and transitional economies as well (Brada, 1993, pp. 82-96).Boratynska (2014, pp. 43-57) emphasizes not only the importance of predicting probability of default of companies, but also the aspects of measurement of costs of corporate bankruptcy.
In Slovakia, bankruptcy issue has come to the attention after the success of Slovak transition in 1995, which initiated an institutional evolution, proving remarkably robust (Schonfelder, 2003, pp. 155-180).During that time few studies dealing with the bankruptcy prediction were published (see : Chrastinova, 1998, pp. 34;Gurcik, 2002, pp. 373-378), but the main attention to this issue aroused after the year 2008, when the global financial crisis appeared (Dixon, 2016, pp. 28-62).Because of the deepening globalization and growing independency across economies, also Slovak companies had to cope with various types of financial difficulties.Adamko and Svabova (2016, pp.15-20) studied the prediction ability of global Altman´s model on the data set of Slovak companies.Similarly, Delina and Packova (2013, pp. 101-112) validated three selected bankruptcy prediction models: Altman model, Beerman discriminatory function and Index IN05 in condition of Slovakia, and according to gained results they proposed a model for bankruptcy prediction using regression analysis.
On the other hand, Rybarova, et. al. (2016, pp. 298-306) applied in their analysis the Altman Z-score bankruptcy model only on the key sector of Slovakia, which is construction industry.Selection of one sector, in this case the Slovak logistic sector, was proposed also by Brozyna, et. al. (2016, pp. 93-114).They proposed four bankruptcy prediction models based on discriminant analysis, logit, decision trees and k-nearest neighbours' method and validated prediction power of these models in comparison with Poland logistic sector.
Bankruptcy prediction is focusing not only on companies, but the subject of interest can be city or other municipal entities.Alexy (2015, pp.111-117) highlighted the importance of studying financial health of cities.Furthermore, the modelling of default probability of cities in Slovakia through logit model was identified by Kacer and Alexy (2015, pp. 484-491).
Despite the fact that one can find studies focusing on bankruptcy prediction in Slovakia, there is still a lack of models developed on the basis of the Slovak environment.Similarly, Mihalovic (2016, pp. 101-118) emphasizes the reasons for development of such models and proposed multiple discriminant analysis and logit models for bankruptcy prediction.
Despite a large number of various models, it is still challenging to predict bankruptcy risk as corporations have become more global and more complex.According to the above mentioned, the primary focus in this study is on the creation of bankruptcy prediction models which will be based on two various statistical methods applied on Slovak companies.These methods include both logistic regression as well as probit regression, given the fact that mostly discriminant analysis and logistic regression are used for the construction of bankruptcy prediction models (Spuchlakova & Michalikova-Frajtova, 2016, pp. 2093-2099).Under creation of these models the most significant financial ratios best distinguishing among groups of default and no default companies may be detected.Furthermore, the main objective of this study is to compare the performance of the two proposed bankruptcy prediction models on a sample of selected companies operating in Slovak economic environment.To achieve these efforts, two scientific questions were build: − Are variables included in the created bankruptcy prediction models statistically significant?− Are created bankruptcy prediction models statistically significant?
Although in Slovakia some bankruptcy prediction models have been constructed, there is no generally accepted model which can be used not only by researchers, but also by practitioners and analysts to predict financial health of the Slovak Republic.So the aim of this study is to find out and propose such bankruptcy prediction models which will set a basis for different groups of users and will be generally accepted as delivering high prediction accuracy.
Due to the above mentioned reasons, the composition of the article is the following: the introduction part, stressing the significance of bankruptcy prediction according to provided literature review, followed by the methodology part, describing the data set and research methodology used.The next part displays result of provided research resulting in discussion part and conclusion of the presented study.

Research methodology
Methodology part of the study describes theoretical basis of models employed, data uses, sample design and variable selection procedure.To construct bankruptcy prediction model in this study, two mathematical statistical methods were used, namely logistic regression and probit.In spite of the fact that these methods are both symmetric binary choice models, they show some significant differences in the process of model formation, as well as in achieved results.
The data for the study were obtained from annual financial reports of Slovak companies (Register of financial statements, Ministry of Finance of the Slovak Republic) covering the year 2015.Firstly, there is a need to stress terminological differences between bankruptcy and insolvency.(Boratynska, 2016, pp. 107-129) Currently, the Slovak legal system considers company as default according to three criterions: − the total amount of payable and not payable liabilities is higher that the value of company´s assets, − company has at least two liabilities 30 days after due date from different creditors, − the value of financial independence indicator is less than 0.04.
Additionally to those criteria, we have detected other relevant characteristics which are considered significant according to the Slovak environment.(see Svabova & Kral, 2016, pp. 1759-1768;Svabova & Durica, 2016, pp. 2-11) Considering these specifications, we have specified three criteria for the subsequent classification of the company as default or no default.Thus, the company is included in the default group of sample if it satisfies these conditions: − negative value of earnings after taxes, − the value of current ratio indicator is less than 1. − the value of financial independence indicator is less than 0.04.
So the final sampling was done by applying the above mentioned criteria, three criteria given by the Slovak legal system and three criteria given by the specifics of the Slovak environment.Furthermore, the application of those criteria on the results of financial analysis of set of companies and removal of detected outliers led to the designation of basic data set from which data of companies serving as inputs for models construction were chosen (Table 1).
The final sample consisted of 500 default and 500 no default companies following the suggestion of Agrawal and Maheshwari (2016, pp. 268-284).The selection was done randomly from basic data set, while no specifics, such as industry in which companies are doing their business, size or the legal form of the companies were not taken into considerations.
For the purpose of this study, the procedure of variable selection includes variables significant in previous studies (Kliestik & Majerova, 2015, pp. 537-543;Zvarikova et. al., 2017, pp. 145-157).According to this criterion, the initial set of variables is drawn from 14 explanatory variables (x 1 …x 14 ) in 4 categories (see Table 2.), which served as a basis for construction of bankruptcy prediction models.
Based on given specifications logistic regression and probit were applied to classify the observation (company) into one of the predetermined group.In this type of models, the dependent variable y may obtain only two values.In this study y is a dummy variable representing the occurrence of an event (default of the company or no) expressed by value 0 (no default) and 1 (default).The goal is to quantify the relationship between the individual characteristics (explanatory variables) and the probability of default.
Fundamentals of logistic regression were applied according to Meloun and Militky (2012).The procedure is given by the logit transformation of dependent variable resulting in obtaining the probability of the default of the company P 1 towards the probability of no default of the company P 0 =1-P 1 through the probability ratio P 1 /P 0 , where P 1 is computed by the cumulative logistic function: where Following Hebak, et. al. (2015, pp. 877) the logit can be defined as: where β are values of coefficients 0 1 2 , , , n β β β β K estimated from the data set of companies by maximizing the log-likelihood function.At the centre of the logistic regression is the task estimating the odds ratio of this relationship indicates logit transformation.Additionally, based on assumed probability, the company is classified as default or no default, using a cut-off score (usually 0.5), attempting to minimize the type I and type II errors.The type I error arises when the default company is classified as no default, and the type II error arises when the no default company is classified as default.
After a logistic regression model has been fitted, a global test of goodness of fit of the resulting model should be performed (Archer & Lemeshow, 2006, pp. 97-105).To answer the question "How well does my model fit the data?" is widely used the Hosmer-Lemeshow (HL) test for logistic regression (Hosmer, et. al., 1997, pp. 965-980).According to the given p-value of this test (higher better), we suggest to reject or accept the model.According to Hu et. al. (2006Hu et. al. ( , pp. 1383Hu et. al. ( -1395) ) various R square statistics have been proposed for logistic regression to quantify the extent to which the binary response can be predicted by a given logistic regression model and covariates.The Nagelkerke's R Square, Cox & Snell R Square and -2 Log likelihood can provide assessing the goodness of fit of the logistic regression model.These statistics show the power of explanation of the model.Cox & Snell R Square is the ratio of the likelihoods reflecting the improvement of the full model over the intercept model (the smaller the ratio, the greater the improvement).Furthermore, Nagelkerke's R Square adjusts Cox & Snell's so that the range of possible values is in interval 0,1 while considering smaller as greater.
The probability of the observed results given the parameter estimates is known as the Likelihood.Since the likelihood is a small number less than 1, it is customary to use -2 times the log likelihood (-2LL) as an estimate of how well the model fits the data.A good model is one that results in a high likelihood of the observed results.
Significance of explanatory variables and appropriate coefficients is provided by Wald test (see Bewick, et. al., 2005, pp. 112-118), which tests the null hypothesis that the constant equals 0. This hypothesis is rejected if the p-value is smaller than the critical p-value of .05.Hence, we conclude that the constant is not 0. Logit models are often compared to probit models.Probit regression is a specialized regression model of binomial response variables and is also used to analyse the relationship between dependent and explanatory variables.Although these methods are similar in their application, the process of model creation differs.Supposing that a binary dependent variable, y, takes only values 0 and 1 (same as in logit), the probit model is given by: where Φ is the cumulative distribution function of the standard normal distribution: Andrews and Hosmer-Lemeshow tests provide evaluation of goodnessof-fit of the proposed model.Additionally, there are several likelihoodbased statistics.Along with Log likelihood, Avg.log likelihood and Restr.log likelihood is recommended to assess according to McFadden Rsquared, which is the likelihood ratio index, and it is an analogy to the Rsquared reported in linear regression models.The discriminant ability of logistic regression model, as well as probit model, can be designed by ROC Curve (Received Operation Characteristic Curve).The ROC curve is a graphical technique allowing for visual analyses of the trade-offs between the sensitivity and the specificity of a test with regard to the various cutoffs that may be used.(see Fawcett, 2006, pp. 861-874) The curve is obtained by calculating the sensitivity and specificity of the test at every possible cut-off point, and plotting sensitivity (the proportion of true positive results) against 1-specificity (the proportion of false positive results).The curve may be used to select optimal cut-off values for a test result, to assess the diagnostic accuracy of a test, and to compare the usefulness of different tests.

Results
During the research process presented in this study, two models were constructed.One was developed on logistic regression and another one through probit regression.Firstly, we assessed our results separately for each model.According to the provided backward stepwise conditional method of logistic regression, logit function coefficients variables were estimated.(see Table 3) The significance of individual explanatory variable on dependent variable is performed by Wald´s test statistic and given that, the final logit function involves eight variables and constant, which are statistically significant.The resulting logit function providing the probability of default of the company is: Hosmer-Lemeshow tests signalize good conformity of the final model with given data.The P-value is according to Table 4 0.181, which is consistent with findings of Karan, et. al. (2009, pp. 9-26).
Following the suggestions of Menard (2000, pp. 17-24), the overall explanatory power of estimated model is provided in Table 5. Assessing through Nagelkerke´s R Square statistics the model explains 93,8% variability of binary dependent variable.This is confirmed also by the relatively high value of -2 Log likelihood statistics providing the residual deviance of the model with value 169.365.
In addition to logistic regression, the probit regression model was estimated to compare gained results. (see Table 6) In contrast with logit model, final probit models includes all 14 explanatory variables.Furthermore, variables R1, R2, L1, L2, Z3, Z4 and A1 are not statistically significant according to the p-value of z-statistics.However, developed probit model is statistically significant according to the value of McFadden R-squared statistic 87.59% indicating a good fit of the model.(following Hwang, et. al., 2010, pp. 120-137) Given that, the resulting probit function take the following form: The overall characteristics of probit model is similarly to logit evaluated by Hosmer-Lemeshow test supplemented by Andrews test proving the overall significance of estimated probit function.(see Table 7)

Discussion
In order to assess the overall performance of constructed models (logit and probit models), classification accuracy matrix and ROC curve were provided.There is a need to highlight the fact that overall classification accuracy of proposed models is assessed on the sample of testing data proved by a data sample of training data.The raining data sample, equal to the training data sample, consists of 500 default and 500 no default companies.Table 8 summarizes all classification results of two estimated models proving results of Jones et.al. (2015, pp.72-85) that classification accuracy of logit and probit function is quite similar.In the case of dataset consisting of Slovak companies, the overall prediction accuracy is high (logit 97% and probit 97.3%), which was confirmed by testing the prediction ability of these models resulting in more than 86.5% accuracy of both constructed models.
Comparing gained results with prediction accuracy of other models constructed in condition of Slovakia, it can be summarized that the accuracy of logit and probit models overdo prediction ability of multiple discriminant analysis (approximately 62%) and logistic regression (approximately 73%) provided by Mihalovic (2016, pp. 101-118).On the other hand, he suggested the use of other relevant mathematical statistical prediction techniques including artificial intelligence expert system.Furthermore, this is proved by Mendelova and Bielikova (2017, pp. 26-44) applying DEA analysis on the set of Slovak companies.The prediction accuracy of their model was lower (78,5%) than the prediction accuracy of models designed by us.The need for development of relevant bankruptcy prediction models based on the environment of Slovakia is proved by Delina and Packova (2013, pp. 101-112).Considering national environment and specific of individual economy is highlighted also by Szetela et. al. (2016, pp. 839-856) as well as Antonowicz (2014, pp. 35-45).Additionally, ROC curves providing graphic illustration of trade-offs between the sensitivity and the specificity of the classification table providing prediction accuracy of proposed models were constructed.Graphical presentation of four ROC curves constructed for each data set (training and test) of both models (logit and probit) are shown in Figure 1.
According to the graphic illustration, it is clear that the area under the ROC curve is higher for test data than for training data representing a metric for classification accuracy for various cut-off points.The following Table 9 provides the evidence of these results.According to obtained results in the case of logit model applied on test data set, there is 86.7% prob-ability of correct classification and probit model presents 86.6% probability.
In spite of that numerous bankruptcy prediction models have been created worldwide the originality and novelty of proposed models lie in combination of popular statistical methods while taking into account specific conditions of Slovak environment.Given the high prediction accuracy of proposed models, they have a potential to become generally accepted in the Slovak Republic.

Conclusions
Although the issue of bankruptcy prediction is widely spread worldwide, up till now there has been no generally accepted bankruptcy prediction model considering the specifics of Slovak national environment and economics.Therefore, the goal of the presented study was to construct models for bankruptcy prediction of Slovak companies.Thus, two prediction models based on logit regression and probit regression were projected to fill this gap.The proposed bankruptcy prediction models were developed using a data set of Slovak companies covering the period of the year 2015 and models have been evaluated by their classification accuracy and Receiver Operating Characteristic curves.The selection of input variables resulted in collection of the most relevant explanatory variables following by detection of outliers for starting the model creation.
According to provided logistic regression, one rentability, two liquidity, four debt and capital structure and one activity variables are statistically significant providing the best distribution between the group of default and no default companies.Additionally, the final model is also statistically significant providing high classification accuracy, 97.0% for training data and 86.7% for test data.In the case of probit model, we were aiming to study if there were any relevant differences in the obtained results between models, since those methods are both symmetric binary choice models.The results did not prove any significant dissimilarities as probit model obtained 97.3% prediction accuracy for training data and 86.6% prediction accuracy for test data.
Although the probit model is statistically significant, it included variables which are not all significant, excluding two rentability, two liquidity, two debt and capital structure, and one activity variable.In summary, given the fact that mostly discriminant analysis and logistic regression are used for the construction of bankruptcy prediction models, this study aims to overcome these standards.The proposed models can serve as a basis for

Figure 1 .
Figure 1.ROC curves for estimated models

Table 5 .
Logistic regression model summary

Table 6 .
Estimated probit function coefficients

Table 7 .
Goodness-of-Fit EvaluationGoodness-of-Fit Evaluation for Binary Specification Andrews and Hosmer-Lemeshow Tests Equation: UNTITLED Grouping based upon predicted risk (randomize ties)

Table 8 .
Classification results of logit and probit estimated models

Table 9 .
Classification results of logit and probit estimated models