Google and Apple mobility data as predictors for European tourism during the COVID-19 pandemic: A neural network approach

Research background: The COVID-19 pandemic has caused unprecedented disruptions to the global tourism industry, resulting in significant impacts on both human and economic activities. Travel restrictions, border closures, and quarantine measures have led to a sharp decline in tourism demand, causing businesses to shut down, jobs to be lost, and economies to suffer. Purpose of the article: This study aims to examine the correlation and causal relationship between real-time mobility data and statistical data on tourism, specifically tourism over-nights, across eleven European countries during the first 14 months of the pandemic. We analyzed the short longitudinal connections between two dimensions of tourism and related activities. Methods: Our method is to use Google and Apple's observational data to link with tourism statistical data, enabling the development of early predictive models and econometric models for tourism overnights (or other tourism indices). This approach leverages the more timely and more reliable mobility data from Google and Apple, which is published with less delay than tourism statistical data. Findings & value added: Our findings indicate statistically significant correlations between specific mobility dimensions, such as recreation and retail, parks, and tourism statistical data, but poor or insignificant relations with workplace and transit dimensions. We have identified that leisure and recreation have a much stronger influence on tourism than the domestic and routine-named dimensions. Additionally, our neural network analysis revealed that Google Mobility Parks and Google Mobility Retail & Recreation are the best predictors for tourism, while Apple Driving and Apple Walking also show significant correlations with tourism data. The main added value of our research is that it combines observational data with statistical data, demonstrates that Google and Apple location data can be used to model tourism phenomena, and identifies specific methods to determine the extent, direction, and intensity of the relationship between mobility and tourism flows.


Introduction
The COVID-19 pandemic has had devastating effects on tourism. In April and May 2020, in most European countries, tourism arrivals fell to a 0-close value Mousazadeh et al., 2023). Globally, in 2020, tourism arrivals decreased by 73% compared to 2019 (UNWTO Dash-421 board). In Romania, the tourist arrival decrease was even more significant: in 2020, the downfall was 83%, or it was at the 17% level of the previous year (2019), according to the UNWTO data (World Tourism Organization, 2022). During the COVID pandemic and ever since, we have had a largescale open database about the movement and real motion of people in different types of urban and nonurban areas around the world. Since February 2020, Google has continuously published COVID-19 Community Mobility Reports, which contain data about the presence and mobility of persons in six different space categories: parks, transit stations, workplaces, retail & recreation, grocery & pharmacy and residential. At the same time, Apple has also launched a similar database, referring to the movement of people measured in only three categories (A3): transit, driving and walking. Both big data providers -Google and Apple -gather data on the individual level from smartphone devices that have set the location determination function "ON". The data are aggregated and published on the level of cities and states by both providers and on the county/regional level by Google.
Our aim is to analyze the relation between the GM6 & A3 dimensions of movement, measured by Google and Apple, and tourist overnight stays within a period of 12-14 months (according to the mobility dimensions compared) by using both statistical and econometric methods. Our intuitive research hypothesis is a powerful correlation or causal relationship between mobility indices and overnights, but some particularities regarding some dimensions of people mobility and the difference in correlations among some European countries are useful for future research by applying different types of quantitative and/or qualitative methods. For the present research, we compared the state-level indicators in European countries with tourism arrivals, available on the Eurostat website (Eurostat, 2021). For one single case, we compared the daily data of the Apple and Google dimensions, and we found statistically significant correlations in all analyzed countries.
It is important to mention that the present results are complementary to complex research made by the authors  for one European country (Romania), but using only Google Mobility dimensions and big data: daily data for all 41 Romanian counties and Bucharest. The statistical method to find the best predictor of overnight stays for Romanian tourism was the structural equation model . With the present research, the authors validate whether the relations between mobility data 422 and tourism indicators during the COVID-19 pandemic for Romania indicate a particular situation for an East European country or not. Therefore, we introduce in the study all European countries with complete data for the abovementioned indicators.
The present paper is structured as follows: first, we present the specific data used in the research and the statistical methods used, our own methodological approach for the research, the research results and discussions, conclusions, and limitations of the research. Our theoretical and practical contribution to the international scientific literature is from a methodological viewpoint and demonstrates that some specific mobility data are a good predictor, statistically significant, of tourism overnight stays for the European country From a methodological standpoint, we make a theoretical and practical contribution to the international scientific literature by demonstrating that some specific mobility data are a good predictor, statistically significant, of tourism overnight stays for the European countries studied. The main limitations of the research are linked to incomplete data for all European countries and the lack of similar research to construct the research hypothesis. The novelty of the results is that the present results can be used as core hypotheses for future research on other continents and/or at a specific country level.

Literature review
The new availability of these data about the movement and fluctuation of people in different types of locations is very interesting; nonetheless, several articles have been published in the last two years using these data sets. Most of these articles focus on COVID-19 infections and effects, the spread of new cases and the relationship between the mobility of people (modeled with Google mobility data) and the diffusion of the pandemic in different countries and other areas. First, Tamagusko and Ferreira (2020) sought a connection between the Rt value of the pandemic in Portugal -the scale of contagiousness -and the mobility of people, but they did not show a clear and direct relationship between the two lines of data. On the other hand, Irini et al. (2021) pointed out that there is a close connection between new COVID cases and some Google mobility dimensions: daily infections can be linked with lower rates of the residential index (staying at home). There are also other articles dealing with the link between the spread of the 423 virus or other effects of policies and the movement of people, measured with data from Google Community Reports: Saha et al. (2021) about the importance of policies for flattening the curve of the pandemic; Cot et al. (2021) about the imprint of social distancing in Europe and the US;  about the changes in socioeconomic activity since 2020; Sulyok and Walker (2021) about the mobility and mortality links across Scandinavia; Hakim et al. (2021) about the number of cases, the effects of restriction policies and the mobility of people in some Asian countries; Jacobsen and Jacobsen (2020) about the effects of stay-at-home orders and voluntary behavior changes and many others. In addition to the effects of mobility on pandemics, other subjects were explored in the last 2-3 years with the help of Google and, in some cases, Apple data. Munawar et al. (2021) analyze how the transport system is impacted because of the policies adopted by the Australian government for the containment of COVID-19. Yang et al. (2021a) compare three of the Google mobility dimensions (Retail and Recreation, Parks and Transit Stations) of 9 world cities using tourism and travel data gathered with Mastercard. They conclude that cities administered different policies during the severe periods of the pandemic, resulting in a different change point of mobility, and point out that tourism could predict changes in mobility (which is a reverse logic compared to the one used in the present study).
Another approach relates to tourism during and after the COVID period (post-COVID), which is summarized effectively by Yang et al. (2021b). They have gathered 249 articles in five key subject areas on tourism, such as psychological effects and behavior, risk perceptions, well-being and mental health, motivation and behavioral intention and responses, strategies, and resilience: organization and government, some of which also address economic effects. Geng et al. (2021) measure park attendance with the help of Google Community data and the impact of this on the social and mental wellbeing of visitors -this would be the only direct use of Google data for tourism-related research purposes, from what we have found thus far. An interesting new study that integrates the statistical data from Eurostat on retail with the retail data from Google Community Reports (Szász et al., 2022) observes the short-term drivers and long-term implications of the pandemic in the online retail sector. They construct a good model that could explain the short-term evolution of the online retail sector during the pandemic. 424 In the most recent studies, Google mobility dimensions are directly linked to the COVID-19 pandemic (dealing with the spread of the virus) and the social effects of this actual situation (Murray, 2021;Atalay & Solmazer, 2021;Ibarra-Espinosa et al., 2021). The economy and, within that, the tourism relations with the movement of people within a region, country or continent in various types of locations has not yet been researched, and only a very few papers have yet to be researched, with only a few papers published. One of these states that urban park visitations have increased since the beginning of the pandemic (Geng et al., 2020), and the other one searches for the possibility of forecasting tourist arrivals with SARIMA models, but not Google Mobility, which we plan to use; instead, they apply Google Trends data (which reflect the number of searches through Google search engines worldwide) (Bangwayo-Skeete & Skeete, 2015).
Previous research on the topic of enhancing the accuracy of COVID-19 prediction by integrating epidemiological and mobility data (García-Cremades et al., 2021) aimed to assess various models that can be used to make early predictions about the progression of the COVID-19 outbreak, with the ultimate goal of developing a decision support system to aid policy-makers. For example, spatiotemporal disease models and graph neural networks were integrated to enhance the forecasting accuracy of weekly COVID-19 cases in Germany (Fritz et al., 2022). This research demonstrated the essentiality of incorporating mobility data while also highlighting the adaptability and comprehensibility of the proposed methodology. Similarly, mobile phone data have been used to guide public health interventions throughout the various stages of the COVID-19 pandemic (Oliver et al., 2020).
Foundational research on human mobility has revealed that combined and identified mobile phone data can aid in modeling the geographic diffusion of epidemics (Finger et al., 2016;Tizzoni et al., 2014;Wesolowski et al., 2012;Bengtsson et al., 2015;Wesolowski et al., 2015).
Consequently, partnerships have emerged between researchers, governments, and private enterprises, particularly mobile network operators and location intelligence firms, to gauge the efficacy of mitigation measures in several nations, such as Austria, Belgium, Chile, China, Germany, France, Italy, Spain, the United Kingdom, and the United States (Oliver et al., 2020;Shortall et al., 2020;Kraemer et al., 2020;Lai et al., 2020;Lyons, 2020;Brodeur et al., 2021).

Research methods
Due to the lack of similar comparable research to construct the research hypothesis, our article/research main hypothesis is that tourism statistical data can be linked with different statistical methods using observational data from Google and Apple, published in early 2020 about the mobility of different levels of communities in six different dimensions by Google and three by Apple. Thus, if verified, early predictive models, other econometric models (based on causal relationship between variables) and machine learning methods can be constructed based on Google and Apple mobility data for tourism arrivals (or other tourism indices) since the tourism statistical data are published with a 2-4-month delay. Because tourism statistical data are published with a 2-4-month delay, early predictive models and other econometric models based on Google and Apple mobility data can be constructed for tourism arrivals (or other tourism indices).
The variables used as inputs in the present research were collected as follows: − Daily mobility data were collected from Google Mobility Community Reports (6 mobility indices) and Apple Mobility (3 mobility indices) reports from March 2020 to April 2021 (Table 1) for 11 European countries with completed data for all inputs in the study: Belgium, Finland, Norway, the Czech Republic, Germany, Estonia, Spain, Italy, Luxembourg, the Netherlands, and Slovakia. From daily data, the authors transformed it into monthly data by calculating the monthly mean average for each European country and every month, both for Google Mobility and for Apple Mobility. We chose these countries if their overnight data were available for the months, we analysed in the Eurostat database. − The number of overnight stays was collected. The Eurostat database was used to collect monthly data on the number of nights spent in tourist accommodations by tourists in every chosen European country. from the Eurostat database. For better and more comparable data, the authors also transformed the tourism indicators into relative indicators, choosing a fixed base for February 2020 because the mobility indicators are being presented in Google and Apple reports. For a better understanding of the measurements made by the Google and Apple mobility reports (Apple mobility reports, 2021), we present a short description of these indicators according to the initial reports. The authors ensured the representativeness of the data by introducing in the 426 study only 11 European countries with complete data, and it can be observed that these countries have a good distribution from a geographical perspective. We have a relatively good spatial representativeness of the research sample, and therefore, the present results can be inferred in the general population at the European level.
Based on the abovementioned particularities, the research flowchart together with statistical/econometrical/machine learning methods/analysis are presented in Figure 1.
As shown in Figure 1, a complex statistical methods were applied to analyze the relationship, causality, and association between the tourism indicator and the mobility of people measured by Apple and Google mobility dimensions: 1. Descriptive statistics (Gabor, 2013;2016) (presented as the mean ± standard deviation) for both Google and Apple mobility dimensions and overnight as a fixed base index for each of the 11 European countries (presented in Figure 2 from the next section); to analyze if they exist or not, along the 11 European countries, positive and/or negative linear standard deviations for the analyzed period and to establish what statistical methods to use. 2. The one-sample Kolmogorov-Smirnov test (Lambin, 1990;Aaker et al., 1998;Jolibert & Jourdan, 2006;Fenneteau & Bialés, 1993;Vendrine, 1991;Churchill, 2001) was used to test the normal distribution of the data and to choose adequate statistical methods and tests according to these results. 3. Spearman's rank correlation coefficient (Giannelloni & Vernette, 2003;Evrard et al., 2003;Pupion & Pupion, 1998;Saporta, 1990;Gabor, 2016) was used to analyze the sense, intensity, and statistical significance of the association between GM6 and A3 mobility dimensions and overnight stays for the monthly data of the 11 European countries. Heatmaps were used to make better comparisons of the correlations (see Figure 3 in the next section). 4. Pearson correlation coefficient (Giannelloni & Vernette, 2003;Evrard, Pras & Roux, 2003;Pupion & Pupion, 1998;Saporta, 1990;Gabor, 2016) to analyze the sense, intensity, and statistical significance of the link between all daily data of the GM6 and A3 mobility indices for four European countries: Belgium, the Czech Republic, Germany, and Italy. Heatmaps were used to make better comparisons of the correlations (see Figure 4 in the next section). 5. One-way ANOVA (Gauthy et al., 2005;Giannelloni & Vernette, 2003;Gabor, 2016;Hayes, 1998) to test if there is statistically significant variability between the 11 European countries according to the average value of the mobility data. 6. The Kruskal-Wallis (independent samples) test (Gauthy et al., 2005; d' Astous, 2005) is used to test if there are the same distributions along all 11 European countries for overnights and mobility indices and to determine whether the distributions of overnights and mobility indices are consistent across all 11 European countries. 7. An overall multilinear regression model (with collinearity diagnostic) (Baron et al., 1996;Giannelloni & Vernette, 2003;Malhorta, 2004;Vendrine, 1991;Jolibert & Jourdan, 2006) for all 11 European countries, with GM6 and A3 dimensions as independent variables and overnight stay as a fixed base index as the dependent variable, was used to identify the best and most statistically significant predictors of European tourism during the COVID-19 pandemic. 8. A set of methods using GM6 and A3 dimensions is as follows: − Factor analysis (PCA -Principal Component Analysis with Varimax rotation) (Malhorta, 2004;Jolibert & Jourdan, 2006;Fenneteau & Bialès, 1993) or regrouping the 9 mobility dimensions (GM6 and A3) into new factors (principal components) that will explain an important significant percentage of the total variance (at least 70%) and to determine, for the pandemic time, how mobility data are grouped into groups and which of them better explained the total variance, overall, for all 11 European countries from the research studied; − A proper regression model (Baron et al., 1996;Giannelloni & Vernette, 2001;Malhorta, 2004;Vendrine, 1991;Jolibert & Jourdan, 2006) starting with the principal components resulting from the factorial analysis (PCA) as independent variables and overnight stays as the fixed base index as the dependent variable is needed to validate (or not) the overall multilinear regression model from point 7. 9. The group of methods using only Google mobility data is as follows: − A factor analysis (PCA -principal component analysis with varimax rotation) (Malhorta, 2004;Jolibert & Jourdan, 2006;Fenneteau & Bialès, 1993) to regroup the GM6 mobility data into new factors that will explain at least 70% of the total variance (and to detect, for the COVID-19 pandemic time, in what way the mobility data are grouped, to detect how the mobility data are grouped for the COVID-19 pandemic time 428 and to identify GM6 and/or A3 data that better explained the total variance, overall, for all 11 European countries; − A proper regression model (Baron et al., 1996;Giannelloni & Vernette, 2001;Malhorta, 2004;Vendrine, 1991;Jolibert & Jourdan, 2006) using the principal components from PCA as independent variables and overnight stays as a fixed base index as a dependent variable. 10. Neural networks (Kolková & Ključnikov, 2021;Barclay et al., 1995;McCormick & Salcedo, 2017) with the multilayer perceptron (MLP) algorithm were used to find the best mobility data predictor overnight stays for all 11 European countries separately for Google Mobility data and Apple Mobility data. Since regression analysis, in many circumstances, is not sufficient to explain the association between the predictors and outcomes and is not sufficient to accommodate all the ways in which the values of one predictor may affect the impact of other predictors, we also applied neural network analysis by using SPSS software. The statistical methods from points 8 and 9 were used to analyze which of the two mobility data sets, Apple or Google, are more determinant and good predictor for European tourism during the COVID-19 pandemic.

Results
The starting point for the complex statistical analysis, using inferential statistics and descriptive and explicative methods for data analysis, is the results of the descriptive statistics indicators, the mean and standard deviation of each variable for all 11 European countries.
In Tables 2-5, there are the results for descriptive statistics for all variables from the study as linear mean deviation in Table 2-4 and mean+ SD in Table 5). In Tables 2-4, positive linear mean deviations are marked with blue and the negative one with red for each country. According to the results from Tables 2-4, there are many differences between the 11 countries. For example, from Figure 2, overnight (number), Germany has the largest positive deviation from the mean of all countries, but at the same time, it has a negative deviation for overnights as a fixed base index (with February 2020 as the fixed base). Only Italy has both positive linear deviations for overnights as a number and for overnights as a fixed base index. Additionally, the Netherlands has a positive linear mean deviation for overnights as a number and the largest positive linear mean deviation for over-429 nights as a fixed base index (+38.46%), followed by Italy (+29.70%) and Luxembourg (+21.54%). All these differences need to be analyzed with complex statistical methods to determine whether there are interdependence and/or determination relationships.
According to the results from Table 5, there are many differences between the 11 countries. Regarding the Google Mobility (GM) indices, there are two atypical countries: − For the GM Workplace index, it can be seen from Table 4 (Table 3), Germany and Estonia consistently have a positive deviation for all 3 Apple indices, with mention that for Apple Walking, Finland (+20.56) and Norway (+25.99) have the highest level of positive linear mean deviation. Italy and the Czech Republic are the only countries with a negative linear mean deviation for all 3 Apple indices, with mention that the Czech Republic has a low level of negative linear mean deviation (-60.34) for the Apple Transit index and Norway for apple driving (-23.16).
Regarding the Apple Mobility indices, Germany and Estonia consistently have a positive linear mean deviation for all three Apple indices, with Finland (+20.56) and Norway (+25.99) having the highest level of positive deviation for Apple Walking. Italy and the Czech Republic are the only countries with a negative deviation for all three Apple indices, with the Czech Republic having the lowest level of negative linear mean deviation (-60.34) for the index Apple Transit and Norway having the highest level of negative linear mean deviation (-23.16).
Due to the abovementioned positive and negative differences between the 11 countries for all 9 mobility indices and overnights, we test the normal distribution of data by using the one-sample Kolmogorov-Smirnov test. The results indicate that only for these variables do the data have a normal distribution: GM Retail & Recreation, GM Transit Stations, GM 430 Residential, and Apple Transit. Therefore, for statistical analysis, we predominantly used nonparametric methods, including correlations.
To determine the direction and intensity of the correlation between Google Mobility and Apple indices, we applied nonparametric Spearman correlations. At the level of all 11 European countries, the correlation matrices are presented below (Table 6) using heatmap diagrams (the red color for inverse correlations and the blue color for direct correlations), and the statistically significant correlations are also marked. At the level of the 11 European countries, all correlations are statistically significant either of low intensity or of high intensity. All correlations are statistically significant, either low or high intensity, at the level of the 11 European countries. It is also observed that the GM residential indicator negatively correlates with all other indicators, thus: − Strong intensity negative correlation with X1 (-0.914), X4 (-0.818), overnights as fixed base index (Feb. 2020) (-0.739). − The fixed base index (Feb. 2020) has a strong intensity negative correlation with X1 (-0.914), X4 (-0.818), and overnights. − Medium to a strong negative correlation with X3 (-0.712), X2 (-0.653), and X5 (-0.698).
There are also positive correlations: − Strong intensity correlation between Y and X1 (0.823), X7 (0.741), and X9 (0.758). − There are also some strong correlations between X1 and X3, X4, X7, and X9, which would suggest the higher necessity of mobility for shopping, recreation, and visiting parks. We have also chosen four European countries that have daily data for the mobility indicators for both Google and Apple: Belgium, the Czech Republic, Germany, and Italy (432 days between 2/15/2020 and 4/21/2021). We have checked the correlations between each Google and Apple mobility dimension. The synoptic table in the form of a heatmap of these values is presented in Table 7. Generally, these daily data have better correlations (and are more accurate due to the higher number of cases) than the monthly data. In general, daily data have better correlations (and are more accurate due to the higher number of cases) than monthly data. It can be said that in these four countries, there are good correlations between Apple Driving and GM Retail & Recreation, GM Parks and GM Transit stations, Apple Transit with GM retail and transit, Apple walking with GM Retail & Recreation, GM Parks, and GM Transit. GM Grocery & Pharmacy and GM 431 Workplace do not correlate too strongly with the Apple mobility dimensions, but they show a relatively stronger correlation with Apple transit. GM Residential has strong negative correlations with all three Apple mobility dimensions. GM Grocery & Pharmacy and GM Workplace have a weak correlation with the Apple mobility dimensions but a stronger correlation with Apple Transit. GM Residential has strong negative correlations with all three Apple mobility dimensions.
GM Retail & Recreation, which also contains a strong recreationoriented dimension, along with GM Parks and possibly transit, are the "fun" parts, the recreation dimensions, and the new places where people use their phone more for orientation. The grocery (and pharmacy) and workplace are the routine dimensions, where there is no need for location determination or route-finding applications.
We note that the negative correlation of the GM Residential index with all Apple mobility indices for each of the four countries is confirmed, pointing out, once again, that the Residential dimensions show the "notmobility" or the home staying trend of people. The negative correlation of the GM Residential index with all Apple mobility indices for each of the four countries is confirmed, emphasizing once again that the Residential dimensions reflect people's "not-mobility" or home-staying trend.
Because the descriptive statistics in Tables 2-4 show positive and negative deviations of significantly different values, we tested with nonparametric and parametric statistical tests if there were statistically significant differences in the values or averages of the indicators between countries. One-way ANOVA indicates results -except for the variable overnight stay -fixed base indicator (February 2020) -for all variables, there are or are not statistically significant differences in the variations from the average of the data depending on the country. Additionally, p values > 0.05 but below 0.1 were recorded for the Apple Driving (0.063) and GM Workplace (0.063) indicators.
The Kruskal-Wallis test (independent samples) was applied to test whether the distributions of Google Mobility and Apple indices are the same in the 11 countries from the study. The results indicate that for the following variables, there are statistically significant differences between countries: overnight stays -number (0.000), Google Mobility Retail Apple Walking (0.000). There are no differences between countries regarding only the distribution of overnight stays in the FB index (p value = 0.083) and for Google Mobility Workplace (p value = 0.066).
Because only 4/11 variables had normally distributed data (according to the results of the Kolmogorov-Smirnov test), we applied the median test to all variables. The results indicate that there are statistically significant differences between countries for all variables except Google Mobility Workplace (0.160) and Google Mobility Residential (0.249), which are practically for the most important indicators of the pandemic time for the entire world, not only for European countries.
To determine whether there is a causal relationship between Google and Apple mobility indices and overnight stays (point (7) from Methodolgy), we applied the multilinear regression model with overnight stays as dependent variables and all 9 mobility indices as independent variables for the 11 countries and for 14 months. First, we applied the method both overnight stay as a number and as a fixed base index as a dependent variable. Since for the model with an overnight stay as the dependent variable, the R 2 coefficient was low, we decided to continue the analysis only overnight stays as a fixed base index. Due to the powerful and statistically significant correlations of each mobility index with overnights as a fixed base, multicollinearity comes with the analysis. The results are presented in the tables below, after multicollinearity test we excluded GM Residential (X6) dimensions due to the collinearity of this variable (VIF = 19.320). For analysis, linear regression with the Enter method was used The determinant coefficients R 2 is 0.615 meaning that over 60% of total variance is explained by the GM and Apply data. Additionally, ANOVA test results indicate statistically significance of the model (p value = 0.000).
For the final model the results for the multicollinearity test indicate that all VIF values are between 1 and 10 (Table 8).
According to the Beta standardized coefficient values (Table 8) According to the results of the regression model, only 3 of the 8 independent variables have a statistically significant contribution to the explanation of overnight stays for the countries included in the study. Therefore, we continued the analysis, and we applied the factorial analysis using the 8 independent variables to find new factors (point (8) from Research Methods section). Therefore, we applied PCA with Varimax rotation and Kaiser normalization using the criteria of initial eigenvalues > 1. The results show that two principal components explain a significant percentage of the total variance, 79.73 percent.
According to the results of the rotated component matrix (Table 9), the 8 variables are grouped as follows (see also  GM_Workplace, GM_Grocery & Pharmacy, GM_Transit, practically all the indices that indicate domestic or routine activities. PC2 explains 13,52% of the total variance and can be named "Domestic and routine activities". To analyze which component resulting from PCA is a good predictor of overnight stays, we applied regression analysis (point (8) from Research Methods section), but used these two factors as independent variables and overnight stay as a dependent variable. The results show a good value for the R 2 determinant coefficient (0.556) and a statistical significance for the model (ANOVA p value = 0.000).
The regression coefficients (Table 10) indicate that both principal components are good predictors for overnight with a fixed base index (p value< 0.05).
According to the results from Table 10, PC1 -"Leisure and recreation activities"-has an important and direct contribution for overnights with a fixed base for all 11 European countries (0.729), followed by the second factor PC2 -"Domestic and routine activities" (0.143). Therefore, with this statistical approach, we discover that not only GM Retail & Recreation and Apple Driving but also GM Parks, Apple Walking and Apple Transit are good predictors of overnight stays for these countries. Below, we present the histogram (Figure 3) and normal P-P plot (Figure 4) for regression model with factors from PCA by using all mobility data (GM and Apple).
In the Research Methods section, last paragraph, point (9), we mention that we applied the same combination of methods (regression and factorial analysis) but only using GM data as independent variables. In the next paragraphs, we present these results. This part is dedicated to the validation analysis (or not) of the previous results obtained. For the first model of linear regression analysis with only five Google Mobility indices as independent variables (except Google Mobility Residential with collinearity) and overnights as a dependent variable, we obtain a good value of determinant coefficient of the model R 2 (0.578), the model statistically significant (p value = 0.000) according to the ANOVA results. In this model, there are no collinearity statistics, and all five VIF values are between 1 and 10. The results are presented in Table 11.
As shown in Table 11, 3 from 5 of the GM data are statistically significant (p value < 0.05): GM Retail & recreation, GM Grocery & Pharmacy, and GM Parks. However, if we apply the statistical regression rule, which accepts all variables with a p value of 0.1 as statistically significant, we find that all five GM data are statistically significant.
For the application of factor analysis (PCA with Varimax rotation and Kaiser normalization) accordingly with point (9) from research Methods section, we consider the rule of a minimum of six variables for PCA, and therefore, we rolled the method with all GM indices. Additionally, the number of factors was manually chosen, with the total variance explained being 88.23%. The rotated component matrix is presented in Table 12.
By comparing these results with the previous results of factorial analysis (with all mobility data, GM and Apple indices), we conclude that the groupings are the same (except for the Apple Mobility indices), but there are changes in the order of factors. PC1 from the previous analysis is now the second (PC2 in Table 12), formed by GM Parks and GM Retail & Recreation, and it explains 38.53% of the total variance explained. This principal component refers to the dynamics or movement of people for recreation and leisure activities specific to tourism. The second factor from the previous analysis is, in these conditions, the first one (PC1 in Table 12) with GM Residential. This principal component refers to the domestic/static activities specific to lockdown time during a pandemic. PC1 from this analysis ex-435 plains 49.70% of the total variance explained. In Figure 5, we present the component plot in the rotated space.
As in the previous analysis, we continue by applying the linear regression model with the principal components as independent variables and the overnights as dependent variables. The value of determinant coefficient R 2 is 0.533 for this model, and the ANOVA indicates a statistically significant model (p value = 0.000). The coefficient table is presented in Table 13.
The results indicate that both factors are statistically significant (0.007 and 0.000) and, most importantly, they validate the previous results. More specifically, the PC2 (GM) formed by GM Parks and GM Retail & Recreation is a good predictor alone (without Apple Mobility indices such as in the first analysis) of overnight stays as a fixed base index. For each one-unit increase in PC2 (GM), the overnight stays increase by 0.711 units.
Because regression analysis is, in many circumstances, not sufficient to explain the association between the predictors and outcomes and is not sufficient to accommodate all the ways in which the values of one predictor may affect the impact of other predictors, we also applied neural network analysis with the multilayer perceptron (MLP) algorithm by using SPSS software. We chose MLP because of its flexibility and lack of distribution assumptions for the data we were analyzing.
The results of the neural network for GM data indicate good predictors overnight stays as a fixed base indicator and are presented in Figure 6 and Table 14.
The input layers of the MLP neural network had all the GM data as a continuous variable. As a fixed base (February 2020), the output layer was the overnight stay. We choose one hidden layer and one node as the model architecture, and we also show the synaptic weights. The blue color of the synaptic weights in Figure 6 represents weight 0, while the gray color represents weight > 0. The difference between the contribution and prediction of GM Parks to European tourism (positive, gray synaptic weight) and, for example, GM Workplace (negative, blue synaptic weight) can be seen.
The model returns four hidden layer nodes: H (1:1), H (1:2), H (1:3), and H (1:4), but the synaptic weights are different, opposite for hidden layer H (1:4) compared with the other three hidden layers in the network. In Table  15, the parameter estimates are presented. The results for GM data show that hidden layer H (1:4), composed of GM Parks in collaboration with GM Retail & Recreation, is the best predictor of tourism during the COVID-19 pandemic for the 14 months included in the research, with a direct, positive contribution (0.991).
The indirect contribution as a predictor for tourism is given by the hidden layer H (1:1), composed of the opposite of GM Transit (-0.589) and GM Residential (0.595) In Figure 8, the normalized importance of independent variables for GM data indicates that GM Parks and GM Residential have a 100% percentage of importance opposite of GM Transit, which has more importance than other variables for the outcome, overnight stays. According to the findings, GM Parks and GM Residential are extremely important in all eleven European countries.
The results of the neural network for Apple Mobility data indicate good predictors overnight stays as a fixed base indicator and are presented in Figure 7 and Table 15.
For Apple Mobility data, the MLP -neural network returns two hidden layers, H (1:1) and H (1:2), but only one of them has a significant contribution as a predictor for tourism. The hidden layer H (1:2) (1.098) combines Apple driving (0.595) and Apple Walking (0.515), opposite to Apple Transit (-0.121).
In Figure 9, the normalized importance of independent variables for Apple Mobility data confirmed these results with 100% importance of Apple Driving for overnights as a predictor.
Therefore, we find that the best predictors for European tourism during the COVID-19 pandemic are as follows: − for Google Mobility data: the Google Mobility Parks together with Google Mobility Retail & Recreation; − for Apple Mobility data: the Apple Driving together with Apple Walking.

Discussions
In our exploratory research, we connected the observational daily data of As we presented before, Google collects mobility data in six dimensions: Retail and Recreations, Parks, Grocery and Pharmacy, Transit, Residential and Workplace, and Apple collects data in three different dimensions: Walking, Driving and Transit.
We also found strong parametric and nonparametric correlations between GM Retail&Recreation and GM Parks, GM Retail & Recreation and GM Transit stations, GM Retail &Recreation with Apple Walking and Apple Driving, and GM Parks and Apple Walking. These links made us believe that these dimensions can reflect the recreation time and activities of the populations. There were also some weak relations between GM Parks and GM Workplace. GM Workplace and Apple Walking, signaling that some other dimensions reflect the other type, not so much the leisure but routine-related activities of people. There was GM Residential, which correlates negatively and strongly with everything else, signaling that when people stay at home, they are not outdoors, so the GM Residential data are also reliable.
The same results and connections were also revealed from the daily data analysis. Here, we compared Apple data with Google data, which showed an even higher correlation, approximately 0.75 -0.85, between the following data pairs: Apple Driving with GM Retail & Recreation, GM Parks, GM Transit stations, Apple Transit with GM Retail & Recreation, GM Transit stations, and Apple walking with GM Retail & Recreation and GM Parks. GM Residential & Recreation correlates negatively and strongly with all three Apple dimensions at the level of daily data as well. These results show that people movement is reflected well and alike by the two big tech companies and their data, but some dimensions tend to "go together," such as Retail and Recreation with Walking, Driving, Parks and Walking, while Parks does not match with Transit, Workplace, etc.
However, the main question was whether these observational data match the statistical data of tourism, analyzed with a regression model. Our first regression model, which used all the Apple and Google dimensions (except the Residential), was below our expectations. Most of the dimensions were weak predictors and/or with no significance. However, when we reduced the number of factors with principal component analysis, we obtained two excellent predictors, one containing GM Parks, Apple Walking, Apple Driving, GM Retail & Recreations and Apple Transit, named PC1 -Leisure and recreation activities, and another containing GM Workplace, GM Grocery & Pharmacy, GM Transit, named PC2 -Domestic and routine activities. The first one (PC1-Leisure and recreation activities) explains more than 66% of the change in tourism arrivals, and the second (PC2 -Domestic and routine activities) explains only 13,5%. After all, we can conclude that some dimensions of mobility are strongly 438 related to tourism activities. In this way, these dimensions, especially GM Retail and Recreations and GM Parks, can predict (type nowcasting) tourism arrivals in most countries, since the GM data can be seen just days after the data were collected, but tourism arrivals are released 2-4 months later We discovered strong parametric and nonparametric correlations between GM Retail & Recreation and GM Parks, GM Retail & Recreation and GM Transit stations but also between GM Retail & Recreation and Apple Walking, Apple Driving, and GM Parks. These connections led us to believe that these dimensions could reflect people's leisure time and activities. There were also some strained relationships between GM Parks and GM Workplace. GM Workplace and Apple Walking, signaling that some other dimensions reflect other types of activities, not so much leisure but routinerelated ones. There was GM Residential, which strongly correlates negatively with everything else, signaling that when people stay at home, they are not outdoors, so the GM Residential data are also reliable.
The same results and connections were also revealed from the daily data analysis. Here, we compared Apple data with Google data, and they showed an even higher correlation, approximately 0.75 to 0.85, between the following data pairs: Apple Driving with GM Retail & Recreation, GM Parks, and GM Transit Stations; Apple Transit with GM Retail & Recreation, GM Transit Stations; and Apple Walking with GM Retail & Recreation and GM Parks To mention, GM Residential & Recreation correlates negatively and strongly with all three Apple dimensions at the level of daily data as well. These results show that people's movement is reflected well and alike by the two big tech companies and their data, but some dimensions tend to "go together," such as retail and recreation with walking, driving, parks, and walking, while parks do not match with transit, workplace, etc.

Conclusions
In the research, we connected the observational daily data from Google and Apple, resulting from location tracking technologies on smartphones, from 11 European countries during the COVID-19 pandemic's months in 2020 and 2021 with other tourism statistical data from the same countries. The study's goal was to determine whether there was a strong, medium, or consistent link between these data from various countries and seasons. Our 439 results show that there is a reliable connection between the movement of people in certain countries (Beckers et al., 2021), reflected in their real-time smartphone tracking technologies and statistical data from tourism service providers, food, hotels, and/or other accommodations (Hall et al., 2022). Nonetheless, these connections are different, not so much determined by the source (provider) as by the dimensions of observations. As we presented before, Google collects mobility data in six dimensions: retail and recreation, parks, grocery and pharmacy, transit, residential, and workplace, while Apple collects data in three different dimensions: walking, driving, and transit. It appears that there is a very strong link between tourism flows (overnight stays) and GM retail and recreation by all methods, as well as a strong link with Apple Walking, Driving, and GM parks. These categories best describe tourism activity (represented here by overnight stays) and appear to be the leisure and recreation categories of mobility. The two transit dimensions of Apple and Google only partially describe tourism flows, possibly because public transport transit also includes other types of movement. Additionally, there is a negative and strong relationship with the GM Residential dimension, which also seems logical: if people spend more time at home, they are not traveling. Finally, there is a weak link with the GM Grocery and Pharmacy and Workplace dimensions. These could be the routine dimensions of mobility, with little or no relation to tourism activities.
In terms of recommendations for practice, it can be interesting to download data for each locality -since the Google data is available at the level of towns -and compare it with different tourism data (overnights, arrivals, but also with tourism resources and supply, such as accommodation services, food and beverage locations, cultural productions). There is a possible connection with these statistical data, and if one can construct a similar model, it can be used for predicting a more precise short-term tourism evolution, since it will be linked only to the local mobility data.
Our model also allows to predict on short term the evolution of statistical data for overnights. Especially GM Retail and Recreation and GM Parks can predict (type nowcast) the tourism overnights in most of the countries, since the GM data can be seen just days after the data were collected, but the tourism arrivals are released 2-4 months later.
We can also add to the implications the possibility of identifying statistical data collection problems at the local level. If there is a good model between mobility and tourism data at the national level, but there is a sig-nificant gap between the two dimensions at the local level, it may indicate a data collection problem. This is because mobility data is more reliable as they are exhaustive and not declarative (but observational).
The limitations of the present research are as follows: (1) Google and Apple mobility data includes other dimensions in such as shopping, online retail, accommodation, urban transit travel, etc.; (2) there are missing values for many European countries (members and non-members of the European Union), this way we could not run the comparisons in the case of other countries; (3) lack of similar research results to compare the present research results; and (4) lack of a similar situation worldwide for objective comparisons of the present results; (5) We had chronological data for only 14 months (in addition to the base period), which is relatively short; (6) Different countries usually represent different cultures, which can create different types of tourism and different patterns of mobility (and different locations), which could be a challenge in comparing and measuring the same flows).
Similar research has been conducted in European countries, but with online retail as the dependent variable, high-frequency data on GPS-based population mobility and government stringency during the COVID-19 pandemic, and only one dimension from Google Mobility Reports, GM Residential (Szász et al., 2022). Therefore, we consider the present research an important contribution to the existing theoretical and practical knowledge of the relationship between mobility data (Apple and Google) and tourism indicators (overnight stays).
For future research, the authors intend to conduct a detailed analysis at the national level for Romania, with data being available at the county level, to apply other quantitative methods (such as gray relational analysis) and to include other important available dimensions for the tourism industry. Other further research is necessary to compare the results across different time periods and geographical locations to establish a short-term forecasting model for tourism. Another important question remains on what influences mobility? We have successfully pointed out the extent to which mobility can affect tourism flows, but the question remains: what defines mobility as a social characteristic? What are the dimensions of more movement in some countries and how are these determined by other economic, social and cultural factors? 441 This is because Google and Apple mobility data appear immediately, while tourism statistical data are typically published with a delay of two to three months.