A New Approach to the Maximum Quarterly Water Consumption Modeling on the Example of Individual Water Consumers in a Small Water Supply System

: Quarterly water consumption data collected in a small water supply system were used for elaboration of a new water consumption modeling approach. In this paper, multi-distribution statistical analysis was performed. As the Anderson-Darling test proved, at least a half out of the ten tested theoretical probability distributions can be used for description of the water consumption. The application of the PWRMSE criterion made it possible to determine, which of the tested theoretical distributions is the best-fitted to the empirical data set. In the case of total daily water consumption for the group of the households, it was Johnson distribution, whereas for the average daily water consumption per capita, it was GEV distribution. Based on the best-fitted probability distribution, a 25-year water consumption simulation with the Monte Carlo method was conducted. Because methodology of this study is based on the probability distributions, even if the type of theoretical distribution of the water consumption will change, it will be still possible to use this simulation method by assuming the other distribution


Introduction
Rational water resources management and at the same time, providing the required drinking water quantity, is not easy task for the water supply service entities.It must be stressed that during water supply systems planning and designing, the first steps tending towards the rational water management should be taken; otherwise, this can result in the future in some operation and maintenance problems.In the case of the undersized water pipelines, high water velocity causes increased water flow resistances.This contributes to decrease the water pressure at the point of use below the required level; sometimes, it can result in a lack of the water in some water network areas.On the other hand, water pipes oversizing causes decrease in water flow velocity and thus, water retention time in the network is extend.Because under such conditions deposits are accumulated, besides the hydraulic resistances growth, there is a risk of a secondary water contamination.In order to avoid these problems, water network operators must provide their proper maintenance by regular pipelines' flushing and disinfection.
Currently, it is observed that many water supply systems are oversized.This is the result of some past activities, while the water pipelines were designed for the greater water demand than the required now.Although the water consumption in households depends on many factors and therefore, some differences between them may be observed (Pasela & Gorączko 2013, Sikora et al. 2006), in general, both in Poland (Gorączko & Pasela 2015, Pawełek 2015;2016) and in many other countries (Baldino & Sauri 2018, Barraqué et al. 2011, Cahill & Lund 2013, Donnelly & Cooley 2015, Sauri 2019, Schleich & Hillenbrand 2009), decrease in water consumption is noted.This results in maladjustment of some technical parameters of the existing pipelines to the amount of the transported water.Decrease in water consumption is mainly due to the growth of the price for water supply and sewage disposal.In addition, common access to the water-saving devices and obligatory water meters installation make it easy to control the amount of the water used; this encourages tap water users for water saving.In order to avoid some operational and maintenance problems, water supply facilities should be planned and designed carefully.However as Bartkowska (2014), Bergel (2017) and Bergel et al. (2016a;b) suggest, current methods for the water demand prediction need to be modified and updated.This is because many methods are based on the water consumption indicators elaborated in the past; in many cases, these indicators not reflect an actual water demand properly.This is one of the main reasons of some designing faults, resulting in operational problems and high costs of the water supply systems maintenance.In order to verify these problems and identify their reasons, numerous studies of water consumption have been conducted for years (Bartkowska 2014, Bergel 2017, Bergel et al. 2016a;b, Bergel et al. 2017).
Because this paper is based on the quarterly water consumption, we intended to pay attention to the other literature studies related to the quarterly water demand.For example, Batóg and Foryś (2009) proved that only some of the tested water consumption variables in residential buildings were characterized by seasonal (quarterly) fluctuations.The results presented by Bergel et al. (2016a) show some disproportions between the quarterly water consumption in the households noted in a four-year period, but these were especially related to the water consumption for additional purposes.Finally, Reynaud et al. (2018) observed a strong quarterly seasonality of the water consumption in the case of single-family water users, in contrast to the multi-family water users.
Water systems' modeling in terms of the water consumption is still current issue.This is because the use of some statistical tools makes planning the new investments easier and provides many technical and financial benefits for the operated water supply facilities.For example, the report of the John Research Centre (the European Commission's in-house science service), contains very comprehensive analysis of the water consumption modeling issue in the 28 countries of the European Union (Reynaud 2015).As some literature studies show (Boryczko 2017, Cieżak & Cieżak 2015, Froelich 2015, Huang et al. 2017, Mombeni et al. 2013, Rathnayaka et al. 2017, Romano & Kapelan 2014, Tiwari & Adamowski 2015, Vijai & Sivakumar 2018), different methods are studied by many researchers in order to find some mathematical tools that can be the best for a reliable water consumption prediction.In the group of known methods we can find short-term-, intermediate-term-and long-term prediction methods.When it comes to the models, we can consider e.g.temporal extrapolation models, models based on 'unit water demand', multivariate statistical models, micro-component modeling or estimation based on projections for urbanization and land use (House-Peters & Chang 2011, Rinuado 2015).
As it turns out, there are no literature reports regarding to the water consumption modeling using probability distributions.Although this method should be considered as a reliable statistical tool for prediction, it must be noted that the assumption of a unique form of the water consumption probability distribution can be inappropriate; stationary mechanisms for process shaping are suggested then.Because the empirical distributions can be described by many theoretical functions, in order to avoid the prediction errors, it is important to select the best-fitted one.Considering the above, as part of this paper, elaboration of a new approach to the water consumption modeling based on the multidistribution analysis was performed.An additional novelty of this paper is application of the Peak-Weighted Root Mean Square Error (PWRMSE) for selection of the best-fitted theoretical distribution.

Case study
In this paper, households' water consumption taken from a small rural water supply system located in the Southern Poland (Wołowice village) was tested; as part of this study, 34 selected households were analyzed.Because the households located in the study area are connected to the collective water system and they are equipped with toilet, bathroom and local source of hot water, they are classified into the fourth group of the standard of water and sewage devices equipping.In this case, as the Polish Regulation of the Minister of Infrastructure (2002) determines, average standard for water consumption per capita is between 80 dm 3 /d (non-sewered areas) and 100 dm 3 /d (sewered areas).Over the research period, each of the tested households was inhabited by one to seven persons.Most of the households (68%) were inhabited by two, four and five persons, whereas only 17% households were inhabited by one, six and seven persons.It must be stressed, although the rural water system was tested, water taken from the network was not used for agricultural purposes, but only for household purposes; if any additional purposes appeared (e.g. home gardens irrigation), own water sources were used.

Materials and methods
Statistical analysis and modeling was performed based on the water consumption data collected in the 34 selected households between the January 2011 and December 2015.Data for analysis refers to the quarterly water consumption and were elaborated both for total daily water consumption for the whole group of the tested households and for the average daily water consumption per capita.
At the beginning, preliminary statistical analysis of the water consumption was performed.The values of some descriptive statistics, such as minimum (Min), maximum (Max), average (Avg), standard deviation (S), coefficient of variation (CV), skewness (Sk) and kurtosis (Kurt) were determined.
Statistical homogeneity of the water consumption data was examined using a non-parametric Kruskal-Wallis test.Investigation of the quarterly data series homogeneity consisted in assigning the ranks to the ordered elements in all tested samples (quarters); then, the sum of the ranks for each sample was determined.If differences between the calculated sums of the ranks were small, null hypothesis H 0 assuming origination all the samples from the same general population was considered as true (samples are homogeneous).A critical region of the test was defined by Pearson's statistic χ 2 with k-1 degrees-of-freedom, where k is number of the compared samples (Wałęga et al. 2016).For k-1 = 3 degrees-of-freedom, critical statistic χ 2 was 7.518.In this paper, both for total daily water consumption for the group of the tested households and for the average daily water consumption per capita, the Kruskal-Wallis test was also used for investigation of the significance of water consumption differences between the quarters.Hypothesis H 0 was verified for the significance level α = 0.05.The Kruskal-Wallis statistic is described as follow: where: H -Kruskal-Wallis statistic, n -total number of components for all samples, R i -sum of the ranks in a given sample, n i -number of components in a given sample.
The assessment of theoretical and empirical water consumption distributions compatibility was performed using Anderson-Darling test (A-D); for this purpose, Equation (12) (Kvam & Vidakovic 2007) was used.Compared with other, Anderson-Darling test is considered as better one for assessment of the measured and predicted values compatibility (Engmann & Cousineau 2011, Islam 2011).Anderson-Darling test statistic is sensitive in the whole distribution range; thus, this is more likely to identify some differences between distributions.Verification of the A-D test was performed for the significance level of α = 0.05 and based on the probability p.This is because critical values of A-D test depend on the type of the tested probability distribution.For the Anderson-Darling statistic, null hypothesis H 0 (data follow a specified distribution) and alternative hypothesis H 1 (data not follow a specified distribution) must be defined.If the p-value is less than α = 0.05, hypothesis H 0 about the data's compatibility with the tested distribution is rejected.Otherwise, i.e. if the p-value is greater than α = 0.05, it can be assumed that variables follow a specified distribution and there is no reason to reject a hypothesis H 0 . where: A-D -Anderson-Darling statistic, i -number of ordered data, n -number of components, F -cumulative distribution function.
As part of this paper, the assessment of the best-fitted theoretical and empirical distributions was conducted.Although the results of the Anderson-Darling test (i.e.p-value), also gives a such possibility (the higher p-value, the better theoretical distribution fitting), however, Peak-Weighted Root Mean Square Error (PWRMSE) method is considered as more precisely one for theoretical and empirical distributions fitting.The values of PWRMSE were calculated based on the Equation (13).The same formula was also used for hydrological modeling (Koch & Bene 2013, Młyński et al. 2019, Wałęga 2016).The best-fitted theoretical distribution is this one with the lowest PWRMSE value.
where: PWRMSE -Peak-Weighted Root Mean Square Error, i -number of ordered data, n -number of components, x -measured value, y -predicted value, μ -mean of measured values.
Water consumption simulation was conducted based on the best-fitted theoretical distribution by using a Monte Carlo method.This method is used for mathematical modeling of complex processes.Obtained results are presented as parameters of a hypothetical population.Based on the created population sample, it is possible to make a statistical estimation of the tested parameter (Halton 1970).Simulation was performed for sample consisted of 100 random variables, both in relation to the total daily water consumption for the group of the tested households and in relation to the average daily water consumption per capita.Assuming that one variable is equivalent to the one quarter of the year, water consumption prediction was performed for a 25-year period.As it can be observed, Monte Carlo method has already been used e.g. for the wastewater treatment plant reliability modeling (Taheriyoun & Moradinejad 2015) or for studies on sewer systems (Ribeiro et al. 2009) and for water demand modeling in an office building (Wu et al. 2017).

Preliminary statistical data analysis
The results of the preliminary statistical data analysis showed that in the each quarter of the 2011-2015, for the whole group of the 34 tested households, daily water consumption ranged from about 7 500 dm 3 /d to 11 180 dm 3 /d, with the average value of 10 100 dm 3 /d.Coefficient of variation (CV = 0.082) indicates on small variation of the tested parameter in a five-year period.Calculated skewness (Sk = -1.65)indicates on asymmetry of the tested variables around the average.In turn, kurtosis greater than zero (Kurt = 3.55), is the result of the concentration of the measured values close to the mean value (Fig. 1a, Table 1).Based on the calculated coefficient of variation (CV = 0.078) (Table 1), it can be stated that the average water consumption per capita was not subjected a significant variability.Average daily water consumption per capita for a 2011-2015 period was 84.1 dm 3 /d (Fig. 1b), which means that this value was in the range of the average water consumption for dwellings from the fourth category (80-100 dm 3 /d), as the Polish Regulation ( 2002) determines.Kurtosis for the asymmetric water consumption distribution (Kurt = 4.90) indicates clearly on the concentration of the variables close to the mean value (Table 1).where: Min -minimum, Max -maximum, Avg -average, S -standard deviation, CV -coefficient of variation, Sk -skewness, Kurt -kurtosis.

Testing of the quarterly water consumption homogeneity
The results of the homogeneity testing basis on the Kruskal-Wallis test (Table 2) proved that there is no reason to reject null hypothesis H 0 assuming a homogeneity of the tested quarterly data series.Both for total daily water consumption for the group of the tested households and for the average daily water consumption per capita, the values of statistics H were lower than the Pearson's statistic adopted during the determination of the critical region of the test; for four compared time series, χ 2 statistic was 7.518.Based on the obtained results it can be stated that there are no significant differences between the quarterly water consumption.Therefore, it was concluded that in the analyzed multi-year period, any significant factors not affected the water consumption.

Analysis of theoretical and empirical distributions fitting
The results of theoretical and empirical distributions fitting obtained by using the Anderson-Darling test showed that as many as six out of the ten tested probability distributions can be used for description of the total daily water consumption for the group of the 34 tested households (Table 3).Both in the case of GMM, GEV, Johnson, Weibull, Normal and Log-normal distribution, p-values for A-D test statistic were greater than the assumed significance level of α = 0.05.Null hypothesis H 0 about theoretical and empirical distributions compatibility was rejected in the case of the other four distributions (Halfnormal, Triangular, Rayleigh and Pareto).In turn, observational data series for the average daily water consumption per capita can be described by using five probability distributions; these include GMM, GEV, Weibull, Normal and Lognormal distribution (Table 3).Lack of the possibility of using Half-normal, Triangular, Rayleigh, Pareto and also Johnson distribution may be due to their characteristics.Namely, these functions are homogenous; in turn, water consumption is a dynamic process and many time-variable factors may affect this.

Selection of the best-fitted theoretical distribution
As it was presented in chapter 4.3., water consumption variables can be described by using several different theoretical distributions.However, it must be determined, which of the tested theoretical distributions is the best-fitted to the empirical data.Although the p-values coming from the Anderson-Darling test can be used for this purpose, however, in this paper, PWRMSE criterion was used.In the case of total water consumption in the group of the tested households, the best-fitted theoretical distribution turned out to be Johnson distribution (PWRMSE = 164.73dm 3 /d).In turn, performed analysis showed that for the average daily water consumption per capita, the best-fitted theoretical distribution was GEV distribution (PWRMSE = 1.64 dm 3 /d) (Table 4).For comparison, if for the selection of the best-fitted theoretical distribution we would use pvalue instead the PWRMSE criterion, in both cases, it would be GMM distribution (Table 3).

Simulation of the water consumption
Simulation of the water consumption was conducted using a Monte Carlo method (Fig. 4a, b).The best-fitted theoretical distributions coming from the PWRMSE testing were used.Just for the record, in the case of total daily water consumption in the group of the tested households it was a Johnson distribution, whereas for the average daily water consumption per capita, it was GEV distribution.Performed simulation showed that for the assumed a 25-year prediction period, the average value of total water consumption in the whole group of the households (10 066.9 dm 3 /d) (Fig. 4a) is close to the average water consumption (10 111.8 dm 3 /d) (Fig. 1a) noted in the period of 2011-2015.Similarly, in the case of the average water consumption per capita, a predicted average water consumption (84.3 dm 3 /d) (Fig. 4b) is compatible with the average daily water consumption per capita (84.1 dm 3 /d) noted in the period of 2011-2015 (Fig. 1b).It can be stated that the presented in this paper simulation's method gives a real possibility for a long-term water consumption prediction, even if the type of theoretical distribution of the water consumption will change over time.In this case, it will be still possible to use this simulation method as a reliable forecasting tool by assuming the other best-fitted distribution.

Summary and conclusions
For sure, mathematical models development for a long-term water consumption prediction will let to avoid some water system's design mistakes, resulting in many operational and maintenance problems.These are often caused by incorrect universality of using the out-of-date water demand indicators elaborated in the past; in many cases, they don't reflect an actual water demand properly and contribute to the water pipelines' oversizing or undersizing.
In this paper, a new approach to the maximum quarterly water consumption modeling based on the probability distributions was developed.Because the random variables are indeterminate and they can support hypothesis with different distributions, a multi-distribution analysis was performed.The essential part of this paper was preceded by the preliminary statistical data analysis.Based on the coefficient of variation, a small variability of the water consumption in a five-year period was stated.The obtained results of the Kruskal-Wallis test have proved no significant differences between the quarterly water consumption; homogeneity of the water consumption time series was found.In turn, when it comes to the main findings of this study, the results of the Anderson-Darling test showed that at least a half out of the ten tested theoretical probability distributions can be used for the description of the water consumption variables.Both in the case of total daily water consumption in the group of the tested households and the average daily water consumption per capita, it was GMM, GEV, Weibull, Normal and Log-normal distribution; for the first case, there was additionally Johnson distribution.The using of the PWRMSE criterion has proved that Johnson distribution was the best one for the description of total water consumption in the group of the tested households, whereas for the average daily water consumption per capita, it was GEV distribution.The obtained in this paper modeling's results carried out with the Monte Carlo method can be used for designing the other water systems supplying a similar group of the individual water users as the subjected one.But what's important, because the methodology presented in this paper is based on the probability distributions, the proposed simulation method can be used, even if the type of the bestfitted theoretical distribution of the water consumption will change over time; this is because the possibility of assuming the other theoretical distribution.
Thank to this, it is believed that the statistical tools and methodology presented in this paper can be used for a reliable water systems planning and designing.

Fig. 1 .
Fig. 1.Average daily water consumption for the each quarter of the 2011-2015: (a) for the group of the tested households; (b) per capita

Figures
Figures 2a-f and Figures3a-eshow the quantile-quantile graphs of the probability distributions that can be used for description of the water consumption.

Fig. 2 .
Fig. 2. Quantile-quantile graphs of theoretical and empirical distributions fitting for total daily water consumption in the group of the tested households: (a) GMM; (b) GEV; (c) Johnson; (d) Weibull; (e) Normal; (f) Log-normal

Fig. 4 .
Fig. 4. The results of the water consumption simulation based on the Monte Carlo method: (a) for the group of the tested households; (b) per capita

Table 2 .
The results of the Kruskal-Wallis homogeneity testing for the quarterly water consumption in the period of 2011-2015

Table 3 .
The results of theoretical and empirical water consumption distributions fitting

Table 4 .
PWRMSE values for the best-fitted theoretical distribution selection