Differences in income distributions for men and women in the European Union countries

Research background: Recently there has been an increase in interest in the studies of income inequalities. The findings of numerous empirical st udies show that males earn higher wages than females. A variety of techniques of income inequali ties decomposition are becoming popular. New procedures go far beyond the Oaxaca-Blinder dec omposition. They allow to study differences of income distributions for various groups of people and to decompose them at various quantile points. Purpose of the article: The aim of the paper is to compare personal income distributions in selected countries of the European Union, taking in to account gender differences. Methods: First, we examined the income inequalities between m and women in each country using the Oaxaca-Blinder decomposition procedure. T h unexplained part of the gender pay gap gave us information about the wage discrimination. Second, we extended the decomposition procedure to different quantile points along the wh ole income distribution. To describe differences between the incomes of men and women, we cons tructed the so-called counterfactual distribution, which is a mixture of a conditional d istribution of the dependent variable (income) and a distribution of the explanatory variables (in dividual people’s characteristics). Then, we utilized the residual imputation approach (JMP-appr oach). Findings & Value added: In the article data from EU-SILC (Statistics on Inc ome and Living Conditions) were used. We found that there exists a n important diversity in the size of the gender pay gap across members of the European Union. The r esults obtained for these countries allowed us to group them into clusters. In general, there a re two types of countries in Europe: the countries, where the bulk of the observed income differ ences cannot be explained by observed characteristics, and the countries, where the explained a n the unexplained effects are both positive, with even a bigger explained effect for the lower i ncome ranges. Equilibrium. Quarterly Journal of Economics and Eco nomic Policy, 14(1), 81–98


Introduction
Reducing the gender pay gap is one of the key priorities of gender policies. At EU level, the European Commission prioritised "reducing the gender pay, earnings and pension gaps and thus fighting poverty among women" as one of the key areas in the framework of the Strategic engagement for gender equality 2016-2019.
The gender pay gap has been a subject of interest for a long time. The findings of numerous empirical studies show that males earn higher wages than females. There are considerable differences in earnings between EU countries, with the gender pay gap ranging from just over 5% in Romania, Italy and Luxembourg, to more than 25% in Estonia, followed by the Czech Republic and Germany (both almost 22%) (see Figure 1). In 2016 women's gross hourly earnings were on average 16.2% below those of men in the European Union (EU-28) and 16.3% in the euro area (Eurostat, 2018).
The purpose of this study is to compare personal income distributions in selected countries of the European Union taking into account gender differences. We will examine the differences in the entire range of income values by the use of the residual imputation approach (JMP-approach) (see Juhn et al., 1993, pp. 410-442). In the article, data from EU-SILC will be used.
The paper is organized as follows. In the next section we provide a literature overview and show what are the results of the prior studies. In the research methodology section, we outline the econometric methods used in the analysis. In the results section we describe the empirical data set and present the obtained results. In the subsequent section, we discuss the findings and the in the last section we conclude.

Literature review
The existence of a significant gap in earnings between males and females is a well-documented stylized fact of modern labour markets. There is a huge amount of literature on the topic of gender gap. Numerous studies concentrate mainly on the decomposition of the average values for incomes. For example, Jurajda (2003, pp. 199-222) analyses in this way the gender pay gap in the Czech Republic and Slovakia, Pena-Boquete et al. (2010, pp. 109-137) in Italy and Spain, Chatterji et al. (2011, pp. 3819-3833) examine the gap in Britain, Śliwicki and Ryczkowski (2014, pp. 159-173) in Poland.
A number of papers adopt a cross-country perspective. Using European Union Statistics on Income and Living Conditions (EU-SILC) data for 24 European Union members, Hedija (2017Hedija ( , pp. 1804Hedija ( -1819 showed that the gender pay gap varies among the countries. Also Oczki (2016, pp. 106-113) presented similar results for 28 EU countries. In the study conducted by Boll and Lagemann (2018) the gap is analyzed based on the Structure of Earnings Survey (EU-SES).
Sometimes, the analysis goes beyond the simple comparison of average values (e.g., for Poland Landmesser et al., 2015, pp. 43-52;Landmesser, 2016, pp. 331-348). The results of numerous studies show evidence of the glass ceiling and the sticky floor effects. The glass ceiling effect means a greater wage gap at the top end of the income distribution range and the sticky floor effect means a wider wage gap at the bottom. The glass ceiling is a metaphor referring to an artificial barrier that prevents women from being promoted to managerial-level positions within an organization. The term sticky floor is used to point to a discriminatory employment pattern that keeps workers in the lower ranks of the job scale. It refers to lowpaying, low-prestige, and low-mobility jobs typically held by women (e.g., clerical or service jobs). Arulampalam et al. (2006, pp. 163-186) examined the gender wage gap in 11 European countries using the European Community Household Panel Survey (ECHPS). The gap widened towards the top of the wage distribution in most of countries and, in a few cases, it also widened at the bottom of the distribution. Nicodemo (2009) analyzed the gap in five Mediterranean EU countries (France, Greece, Italy, Portugal and Spain), using the EU-SILC and the ECHPS datasets. She found a positive wage gap in all countries, the greater part of which cannot be explained by observed characteristics. The gender gap was larger at the bottom and smaller at the top of the distribution in most countries. Also Christofides et al. (2013, pp. 86-102) used EU-SILC data and estimate the unexplained part of the gender pay gap for 26 European countries. Despite many differences among the individual studies, they all conclude that the gender pay gap exhibits a remarkable heterogeneity across European countries. The gender pay gap is the result of many factors, including occupational segregation, bias against working mothers, and direct pay discrimination. A certain part of the wage differences between men and women remains unexplained. The results of these studies depend on the used data set, the number of explanatory variables and the applied method of decomposition. An important diversity exists both in the size of the gender pay gap and in its underlying causes across transitional new members of the European Union.
In this paper, we do not compare the average income values, but examine the differences at various quantile points along the distribution using data for 19 EU countries from the EU-SILC project in 2014.

Research methodology
Our hypothesis is that there exists an important diversity in size and shape of the gender pay gap across members of the European Union. The observed differences should be analyzed in an adequate manner. In recent times, a variety of techniques of income inequalities decomposition are becoming more popular. The standard method for pay gap decomposition is the procedure proposed by Oaxaca (1973, pp. 693-709) and Blinder (1973, pp. 436-455), which is widely used to study mean outcome differences between groups. This statistical method allows to take into account the individual characteristics of male and female employees. It decomposes the gap in mean outcomes across two groups into a part that is due to group differences in the levels of explanatory variables and a part that is due to differential magnitudes of regression coefficients.
Let Y g denote the outcome variable in group g (e.g. the personal income in men's group, g=M, or in women's group, g=W) and X g the vector of individual characteristics of the person in group g (such as education, age, work experience). The expected value of y conditionally on X is a linear function , where g β are the returns to the characteristics. The Oaxaca-Blinder decomposition for the average income inequality between two groups at the aggregate level can be expressed as The first component, on the right side of the equation, called the unexplained effect (or the wage structure effect), is the result of differences in the returns to observables (the estimated coefficients), and so in the "prices" of individual characteristics of group representatives. The unexplained portion of the gap is usually attributed to discrimination, but may also results from the influence of unobserved variables. The second term, called the explained effect, gives the effect of characteristics and is explained by group differences in the predictors (the so-called explained effect). A drawback of the approach is that it focuses only on average effects, which may lead to a misleading assessment if the effects of covariates vary across the income distribution.
(1) Let ) (y F g Y be the distribution function for the dependent variable Y g , which can be expressed as the conditional distribution of Y g and the joint distribution ) (X F g X of all elements of X g (the explanatory variables): Now, the mean decomposition analysis can be extended to the case of differences between the two distributions using the so-called counterfactual distribution: Such a mixture distribution is constructed by integrating the conditional income distribution for men with respect to the distribution of characteristics for women (the distribution of incomes that would prevail for people in group W if they had the distribution of characteristics of group M). Then the difference in the observed income distributions between men and women can be decomposed in the spirit of Oaxaca and Blinder as follows: The counterfactual distribution can be constructed in various ways, e.g. using the residual imputation approach (see Juhn et al., 1993, pp. 410-442). In this method, one has to estimate the two equations: . Then, the income M y form the group M is replaced by a counterfactual income C W y , where both the returns to observables and residuals are set to be as in group W.
The implementation of the procedure is two-step. In the first step, the residuals are replaced by counterfactual residuals under the assumption of the rank preservation: , , , 1 , in the distribution of residuals for M. In the second step, the counterfactual returns to observables are imputed as In this study, after an assessment of the gender pay gap for selected EU countries, an attempt was made to group them using hierarchical clustering method. Hierarchical clustering is a widely used tool in data mining for grouping data into clusters, which exposes similarities or dissimilarities in the data. One set of approaches to hierarchical clustering is known as agglomerative, whereby in each step of the clustering process an observation or cluster is merged into another cluster. In this work, the 1-nearest neighbors algorithm for agglomerative clustering with Euclidean distance was applied.

Results
The empirical data used have been collected within the European Union Statistics on Income and Living Conditions project in 2014 (research proposal 234/2016-EU-SILC). For the analysis, we chose 19 EU countries. The main criterion for their selection was the desire to obtain the possibly diversified set of countries in terms of their size, level of economic development and the degree of discrimination on the labor market. The selected countries can be characterized with the following fundamental statistics: the highest number of inhabitants (Germany, France, United Kingdom, Italy, Spain, Poland), the highest GDP per capita in 2015 (Luxembourg, Ireland, the Netherlands, Austria, Germany), the lowest GDP per capita in 2015 (Bulgaria, Romania, Croatia, Hungary, Poland, Greece), the highest gender pay gap according to Eurostat in 2016 (Estonia, the Czech Republic, Germany, United Kingdom, Austria, Slovakia), and the lowest gender pay gap according to Eurostat in 2016 (Romania, Italy, Luxembourg, Belgium, Poland). The selected sample consisted of 122,756 observations: 62,856 men and 59,900 women.
The annual gross employee (cash or near cash) incomes (in thousands of Euro) of men were compared with those obtained by women. The gross employee income corresponds mainly to wages and salaries paid for the time worked, remuneration for the time not worked, enhanced rates of pay for overtime, payments for fostering children, supplementary payments (e.g. thirteenth month payment). It includes any social contributions and income taxes payable to social insurance schemes or tax authorities. In our empirical decomposition analysis, a logarithm of the annual income (log_income) constitutes the outcome variable. The sample size and average income in selected countries are presented in Table 1.
We have found that there is a positive difference between the mean values of log incomes for men and women for all 19 countries. The mean log income differential is the largest in Germany (0.625) and the smallest in Romania (0.141). The country heterogeneity is not limited to the size of the gap, but also concerns its composition. The difference between the mean log income values was decomposed into two components: the first one explaining the contribution of the different values of models coefficients (the unexplained part), and the second one explaining the contribution of the attributes differences (the explained part). The unexplained effect is huge (and positive) for the states with the low raw differential and is small for the states with the high raw differential. It ranges from 21% in Luxembourg to 125% in Bulgaria. This part of the gender pay gap gives us information about the discrimination. The explained gap is negative in six countries (among others in Bulgaria, Poland, Hungary), which have the lowest income discrepancies. The negative value of this component means that the difference of the average log incomes between men and women is reduced by the women's characteristics. In 13 countries, the explained part is positive, that is, it increases the overall gap, with a maximum explained gap in Luxembourg (79%). Only in 8 countries the explained part exceeds the unexplained part of the overall gap. However, the unexplained part is nowhere identified to be negative.
The detailed decomposition, which was also carried out, made it possible to isolate the factors explaining the inequality observed to a different extent. Because of lack of space in this paper, we present the results of the detailed decomposition only for 4 countries -Germany, the Czech Republic, Poland and Romania (see Table 3).
The strong effect of different education levels of men and women can be noticed, especially for Poland. The negative values of explained components mean that the differences of the average log incomes between men and women are reduced by the women's higher education levels. On the other hand, the values of yearswork, parttime, manager and big attributes possessed by men and women increase the income inequality in all countries (see the positive explained component values). In all states women are discriminated against men because of their marital status (the positive unexplained components values for variable married) and managerial position (if possessed) but not because of the education levels or years of work.
Since the Oaxaca-Blinder technique focuses only on average effects, we carried out the decomposition of inequalities along the distribution of log incomes for men and women using the residual imputation approach (JMPapproach). The total differences between the values of log incomes are computed and the results are shown in Table 4. They are expressed in terms of percentiles (the symbols p5, …, p95 stand for 5th, …, 95th percentile; e.g. the 25th percentile is the log income value below which 25% of the observations may be found).
For each country, there are positive differences between the values of log incomes for men and women along the whole log income distribution. Then the calculated differences were decomposed into the sum of the unexplained and explained components (the results are presented in Figures 3, 4, 5). After assessing the gender pay gap (the raw, the explained and the unexplained gap) for all 19 countries, an attempt was made to group them using hierarchical clustering method. The use of the 1-nearest neighbors algorithm for agglomerative clustering with Euclidean distance allowed the grouping of countries into four clusters, as shown in Figure 2.
The shapes of pay gap are examined in Figures 3-5, where solid lines represent the total income gap, the dashed lines denote the unexplained component and the dotted lines indicate the explained effect.
Group 1 consists mainly of the former socialist states of eastern Europe. It is characterized by the U-shaped total gender pay gap (except Romania and Bulgaria) and bigger unexplained effect than the explained one (see Figure 3). For most countries in this group, the total effect is low, but it widens at the bottom and at the top of the income distribution, suggesting sticky floor and glass ceiling effects. The share of the unexplained part is very high. The effect of coefficients is positive and constant in the whole range of the income distribution. This is the result of differences in the "market prices" of individual characteristics of men and women, interpreted as the labor market discrimination.
For 5 countries (Romania, Bulgaria, Hungary, Poland and Croatia), the explained differential (the effect of characteristics) is negative in the middle of the distribution, which means that the properties possessed by both people's groups decrease the inequalities. Except for Romania, the effect of characteristics is often positive at the bottom and at the top of the income distribution.
Group 2, the largest group, consists mainly of the large, highly developed countries of western Europe. In most from these countries, the total gender gap is larger at the bottom of the distribution and smaller at the top of the distribution (see Figure 4). The gender differences in characteristics are positive, which means that the different values of characteristics of men and women increase the income inequalities. The explained effect is bigger than the unexplained effect at the bottom of the log income distribution. For the higher income ranges, the unexplained effect often prevails. Both effects, the explained and the unexplained, are always positive, increasing the income discrepancies.
Group 3 consists only one country -Estonia, which is characterized by an increase of the income inequalities as we move toward the top of the income distribution ( Figure 5 (a)). We notice that the glass ceiling effect is present. Moreover, the share of the unexplained effect is very high. We observe the negative explained effect at the bottom of the distribution. It means that the income inequalities decrease for the poorer.
The last group, the group 4, is made up of Germany. In this case, the large total gap and the large explained effect have a decreasing shape and are rapidly falling as we move toward the top of the income distribution ( Figure 5 (b)). The unexplained part is positive and at a moderate level, presenting the existing effect of discrimination on the labor market.

Discussion
We started our analysis using the Oaxaca-Blinder method for the decomposition of the average values for log incomes. We found that there is a positive difference between the mean income values for men and women. Similar result has been documented in the previous research (e.g., compare Boll & Lagemann, 2018). The mean log income differential was the largest in Germany and the smallest in Romania. The unexplained effect, which gives us information about the discrimination, was huge for the states with the low raw differential and was small for the states with the high raw differen-tial. The explained gap was negative in the countries with the lowest income discrepancies (e.g., in Bulgaria, Poland, Hungary). The negative value of this component means that the difference of the average log incomes between men and women is reduced by the women's characteristics.
Then, we examined the differences in the entire range of income values by the use of the JMP-residual imputation approach. We extended the decomposition procedure to different quantile points along the whole income distribution. After assessing the raw, the explained and the unexplained gaps for all 19 countries, we grouped them into four clusters using the 1nearest neighbors algorithm for agglomerative clustering.
Group 1 consisted mainly of the former socialist states of eastern Europe (the Czech Republic, Romania, Slovakia, Bulgaria, Hungary, Poland and Croatia). The results obtained for that group indicated the low and Ushaped total effect (compare Arulampalam et al., 2006, 163-186). The conducted decomposition showed that the unexplained component quantitatively dominated in the whole range of the income distribution. Our analysis confirmed that the gender pay gap was poorly explained by gender differences in observable characteristics of people. But for several countries (Romania, Bulgaria, Hungary, Poland and Croatia), the explained differential was negative in the middle of the distribution, which means that female characteristics are superior to the male ones. Similar results in this field have been obtained in Christofides et al. (2013, pp. 86-102).
In group 2, consisting of highly developed countries of western Europe (France, the United Kingdom, the Netherlands, Austria, Luxembourg, Spain, Italy, Ireland, Belgium, Greece), the total gender gap was larger at the bottom of the log income distribution and smaller at the top. Those findings are similar to the results obtained by Nicodemo (2009). For the lower income ranges, the explained effect was bigger than the unexplained. Both effects, the explained and the unexplained, are always positive, increasing the income discrepancies.
Group 3, made up of Estonia, was characterized by an increase of the income inequalities at the top of the income distribution (like in Christofides et al. (2013, pp. 86-102)). There was a very high share of the unexplained effect and the negative explained effect for the poorer.
For group 4 (Germany), we noted the large total gap and the large explained effect rapidly was falling as we moved toward the top of the income distribution (this result, however, differs from the finds in Christofides et al. (2013, pp. 86-102)). The unexplained part is still positive but at a moderate stable level.

Conclusions
The goal of this paper was to compare personal income distributions in selected countries of the European Union taking into account gender differences. Using data from the EU-SILC project in 2014, the gender pay gap was examined for the set of 19 European countries.
Summarizing, there exists an important diversity in the size of the gender pay gap across members of the European Union. Excluding the extreme cases of Estonia and Germany, there are two types of countries in Europe: − the countries, where the bulk of the observed income differences cannot be explained by observed characteristics (the states with higher level of gender discrimination on the labor market), − the countries, where the explained and the unexplained effects are both positive, with even a bigger explained effect for the lower income ranges (the states with lower level of gender discrimination on the labor market).
One should be aware that the results obtained depend on the selection of explanatory variables to the estimated models. Future work should cover all EU countries and might consider how gender differences have changed over time. The gender discrimination may lead to loss in productivity and wealth. Inequalities induced in this way pose a serious challenge for the society. Therefore, it would be very interesting to analyze whether changes in policies can affect gender differences. Source: own elaboration using the Stata command 'decompose'. Source: own elaboration using the Stata command 'decompose'.