ANALYSIS OF CLIMATE DATA USING SPATIAL TECHNIQUES TO ESTIMATE RAINFALL IN THE NORTH WEST OF NIGERIA

The study inspect the spatial variation of Rainfall in different localities in the North West of Nigeria, Rainfall data present the basic metrological role in many field of geostatistical and practice, that is why it’s one of the major climate resources that can be used as a measuring tool of climate change. The aim was to analyzed the data of one decade for thirty sample locations from (2010 – 2019) obtained from NIMET using three different spatial models and compare the models performance in order to obtained the optimal model that can be used for rainfall prediction in the study Area. The assessment of the optimal model is based on the validation methods used in the research that is the method of RMSE and R. The supportive auxiliary variables which have been used in estimating neighboring locations are Humidity, Temperature, Pressure and Wind speed. The predicted Rainfall in the models has proved the theory of ITCZ, and the locations with a higher predicted Rainfall are in the southern part while the locations with a lower predicted Rainfall are in the northern part of the study Area in all the models, regarding the validation methods used in the research, Geographically weighted Regression (GWR) outperform Ordinary Kring (O.K) and Inverse Distance Weighting (IDW) in terms of RMSE and R.


INTRODUCTION
The research focus on rainfall prediction and some meteorological variables such as Temperature, Pressure, Humidity and Wind-Speed that contributes towards the annual precipitation in the north western part of Nigeria. Rainfall is the major climate resources that can be used as an index of climate change (Adhikary et al. 2016). Rainfall by definition is a liquid in the form of droplets that has a condensed from the atmospheric water vapor and then became heavy enough to fall under gravity. The region under study was blessed with a fertile land, and if there is enough Rainfall and other supportive agricultural factors are okay, then there will be a bumper harvest. Rainfall is the most essential aspect in a farming system as it determines the accessibility of soil needed for maximum yield (Niles et al. 2015). Ismail, and Oke (2012) Crops Animals and Humans derived their water resources mainly from it and Irrigation scheduling depends on the correct estimation of the spatial distribution of rainfall and it also determines the time in which some crops types can be cultivated and the appropriate farming system for optimum yields. In this research, we compared the performance of Ordinary Kriging (OK), Geographically Weighted Regression (GWR) and Inverse Distance Weight (IDW) as the models are vital in spatial analysis. The major advantage of kriging is that, it takes into the account of spatial correlation between the data points and provides unbiased estimates with a minimum variance. The spatial variability in Kriging is quantified by using variogram that defines the degree of spatial correlation between the data points (Webster & Oliver, 2007). Ordinary kriging (OK) is one of the most preferred stochastic interpolation methods for spatial rainfall estimation. One advantage of IDW is that, it's easier to understand and it has simple procedures and fewer steps when compare with kriging. It explicitly implements the assumptions that things that are close to one another are more alike than those that are farther apart. The underlying idea of GWR is that, the parameters may be estimated anywhere in the study area given a dependent variable and a set of one or more independent variables which have been measured at places whose location is known. Taking Tobler's observation about nearness and similarity into account, we might expect that if we wish to estimate parameters for a model at some location xi, then the observations which are nearer to that location should have a greater weight in the estimation than observations which are far away. However, geographically weighted regression (GWR) was specifically designed to deal with issues of spatial non-stationarity by measuring local relationships between the target and explanatory variables, which differ from location to location (Fotheringham et al., 2002). Unlike OK which depends on the set of variogram and regression parameters to summarize global relationships, GWR estimates local regression parameters and its model performance varies across a study region. In addition to that, the GWR model offer a better detail for spatial data which the researcher can easily apply it. According to Yu et al. (2009), GWR is one of the newly spatial regression framework that has been introduce to deal with spatial non-Stationarity analysis for the regression that gives different relationship to occur a different points in space. Hence, the best interpolation method for a particular study area is usually established through the comparative assessment of different interpolation methods (Delbari et al., 2013;Dirks et al., 1998;Goovaerts, 2000;Hsieh, Cheng, Liou, Chou, & Siao, 2006;Mair & Fares, 2011, Moral, 2010. Spatial variability is the most challenging part in meteorological variables, that is why generating the most accurate Model from any existing spatial data as well as describing the error and variability of the analytical surface becomes a key challenge facing climatologist. In some literatures reviews, the results of comparison in the spatial models differs from one study to another and the variation did not demarcates a certain pattern. Some researchers says, the estimate of climate data depends on a specific study area and the nature of environmental factors contributing to the climate change in that area. There is no single spatial method that can work well everywhere (Daly, 2006). Delbari, Afrasiab, and Jahani (2016), measure the analysis of spatial variation of rainfall using Geostatistic and Deterministics Models but eventually recommend the use of Geostatistical Methods, while Dirks et al. (1998), compared IDW with Thiessen polygon and OK in estimating the rainfall data, but finally recommended the use of IDW for interpolations which is deterministic Model. Menmeng et al. (2017), compared three spatial interpolation (i.e. Kriging, Splines and IDW) and two Regression Model (i.e Multiple Linear Regression and GWR) for predicting monthly Minimum, Average and Maximum Near Surface Temperature (NSAT) concluding that GWR is better than Kriging in the warm months, and kriging outperform GWR in the colder months. Many research on spatial interpolation that incorporate elevations as their auxiliary variable to see how rainfall varies with elevation in their study area (Sajal kumar et al 2017). In addition to that, this research adds to the existing literatures as the key step to the spatial Rainfall Prediction that is the impact of using Regression model in estimating Rainfall dataset, as rainfall normally comes in-between the warm and cold season in the study region considering four metrological variables mentioned above and the model can account for spatial heterogeneity which will allow the researcher to captures information of different Locations.
Many years back, numerous studies have been dedicated to the comparison of different deterministic and geo-statistical in different regions around the world. Many studies have reported that rainfall is generally characterized by a significant spatial variation (e.g., Delbari, Afrasiab, & Jahani, 2013;Lloyd, 2005), and they advises that spatial methods which are capable of incorporating the spatial variability of rainfall into the estimation process should be engaged. In view of that, kriging becomes the most widely used geostatistical method for spatial interpolation/prediction of rainfall, the ability of kriging to produce spatial predictions of rainfall has been distinguished in many studies (e.g., Adhikary et al. 2016;Goovaerts, 2000;Jeffrey, Carter, Moodie, & Beswick, 2001;Lloyd, 2005;Moral, 2010;Yang, Xie, Liu, Ji & Wang, 2015). Goovaerts (2000) used three multivariate geostatistical methods (OCK, KED, simple kriging with varying local means [SKVM]), which include a DEM as secondary variable and three univariate methods (OK, TP, and IDW) that do not consider elevation in to account for spatial prediction of monthly and annual rainfall data. Martínez-cob (1996) compared OK, OCK, and improved residual kriging to interpolate annual rainfall in Spain, and the results indicated that OCK was better for rainfall estimation; reducing estimation error when compared with OK and modified residual kriging respectively. Hsieh et al. (2006) evaluated OK and IDW methods using daily rainfall records to assess the spatial distribution of rainfall in the Shih-Men Watershed in Taiwan. The results proved that IDW produced more sensible demonstrations than OK. Moral (2010) compared three univariate kriging (simple kriging [SK], universal kriging, and OK) with three multivariate kriging methods (OCK, SKVM, and regression kriging) to interpolate monthly and annual rainfall data from 136 raingauges in Extremadura region of Spain. The results shows that multivariate kriging outperformed univariate kriging and among multivariate kriging, SKVM and regression kriging performed better than OCK. Ly, Charles, and Degré (2011) used IDW, TP, and several kriging methods to interpolate daily rainfall at a catchment scale in Belgium. The results showed that integrating elevation into KED and OCK did not provide improvement in the interpolation accuracy for daily rainfall estimation. OK and IDW were considered to be the suitable methods as they gave the smallest error for almost all cases. Mair and Fares (2011) compared TP, IDW, OK, linear regression, SKVM to estimate seasonal rainfall in a mountainous watershed concluding that OK comes with the minimum error for nearly all cases. They also found that incorporating elevation did not increase the prediction accuracy over OK for the correlation between rainfall and elevation lower than 0.82. Delbari et al. (2013) Compared two univariate methods (IDW and OK), and four multivariate methods (OCK, KED, SKVM, and linear regression) for mapping monthly and annual rainfall over the Golestan Province in Iran. They stated that KED and OK outperformed all other methods in terms of root mean square error (RMSE). Jeffrey et al. (2001) derived a comprehensive archive of Australian rainfall and climate data using a thin plate smoothing spline to interpolate daily climate variables and OK to interpolate daily and monthly rainfall. The aforementioned studies on spatial interpolation of rainfall indicate that each method has its advantages and disadvantages in different regions. There is no single spatial method that can work well everywhere (Daly, 2006). The best method should basically be achieved through the comparative assessment of different interpolation methods. To date, many studies have been conducted on spatial interpolation of rainfall at a regional and national scale in Australia (Gyasi-Agyei, 2016; Hancock & Hutchinson, 2006;Hutchinson, 1995;Jeffrey et al., 2001;Johnson et al., 2016;Jones, Wang, & Fawcett, 2009;Li & Shao, 2010;Woldemeskel, Sivakumar, & Sharma, 2013;Yang et al., 2015). Geographically weighted regression (GWR) is gradually use model for climatic variables [e.g., Brunsdon et al., 1996;Fotheringham et al., 2002]. Szymanowski and Kryza (2012) found that GWR performed better than linear multiple regression (LMR) for the modeling of temperature at different time scales, including daily. To generate their final data set, they joined GWR with kriging. Another example is that of Bostan et al. (2012), who uses GWR to estimate the spatial distribution of average annual rainfall over Turkey, and compared it to other techniques such as LMR and different types of kriging. For their case study, GWR performed better than LMR, but universal kriging was recommended as the best overall technique. Chen et al. (2011), pointed out that, the estimation effect of (High Accuracy Surface Modeling), HASM on annual precipitation in the Dongjiang River Basin of China was significantly better than the three classical algorithms of Inverse Distance Weighting (IDW), OK, and Spline. Li (2018), Compared GWR and GTWR (Geographically Temporal Weighted Regression) for Rainfall estimate in Huaihe River Basin in eastern China, but GTWR outperform GWR.
The aim of this research is to determine the optimal spatial model that can be used in assessing Annual Rainfall dataset in the Study Area. And this can be achieved through the following objectives, fitting the Variogram Model, estimate the rainfall data in different location using spatial models, lastly comparing the performance of models base on their output in order to determine which Model produces the most efficient result for the rainfall dataset.

METHODOLOGY
In attempt to achieve the key goal of this work, the methodological framework adopt the use of three different spatial models for the average Rainfall data of 30 selected towns in the north western part of Nigeria in order to determine the best outperforming model that can be used for Rainfall prediction in the study region.
( + ℎ) − ( ) Is define as the variance of increments, and h is the spatial distance between two points (Lam, 1983 (Webster and Oliver, 2001).
To predict the value Z(x0) using the known values Where, 0 () zx is the predicted variable of Z, (Rainfall in the study) at a target position x0, indicate kriging weight link with the sample location xi, n is the number of neighboring points used to determine the rainfall at x0 point, are the interpolation weights and we assume a constant mean value .
Inverse Distance Weighting (IDW) A deterministic interpolation in which the value at un-known locations/Points are estimated from known locations/points using weight function in a search neighborhood (Collins, 1995 andTomczak 1998). IDW interpolation clearly stated the assumptions that the closer thins are to one another, the likely they seems to be than those that are farther apart (Johnston, Ver Hoef, Krivoruchko, and Lucas (2001) and Anderson, (2003). To predict the value for any unknown location, IDW uses the known values surrounding the target location. The known values closer to the location to be predicted have more influence than those x y z , p ranges as ( 2) op  which is the power parameter also known as (factor reducing weight) a power of 2 is the most commonly used in IDW

Geographically weighted regression (GWR):
Is an extension of the traditional regression in which variations in rates of change are allowed in order, meaning the regression coefficients are specific to a location rather than being global estimates and it's designed to deal with issues of spatial nonstationary by measuring local relationships between the target variable and explanatory variables, which differ from location to location (Fotheringham et al., 2002). Unlike RK, which depends on a single set of variogram and regression parameters to summarize global relationships, . It runs a regression for all the local instead of a sole regression for the entire study area. In GWR each location in the study area has its own coefficients, which allow the model to generate separate R 2 values to shows how the relationship between the dependent and independent variables varies throughout the study area. GWR also offers the overall R 2 output value that can be compared to R 2 values obtained from different model. Geographically weighted regression allows the parameter estimates to be a function of location. The local estimation of the parameters with GWR is express by the following equation.
Where yi are the number of observations of the dependent variable y.  is the regression residual at the i th point.
() ii uv are the coordinates of the i th points define by latitude and longitude.
X and Y are the vectors of explanatory and dependent variables.
The parameters of the GWR model can be calibrated using the weighted least square approach in matrix form, the parameters of the GWR model at each location i are estimated by 1 ( , ) ( , ) ( , ) Where ̂( , ) represent the local coefficient to be estimated at location (ui,vi). W(ui,vi) is the weight function [ ( , ) ] is the geographically weighted variance-covariance matrix. The Gaussian kernel weights gradually decrease from the center of the kernel, but never reach zero. The bi-square kernel function has a clear-cut range where the weighting is non-zero. In this study, the adaptive bi-square function is used to derive the weight matrix: Where dij is the Euclidian distance between i point and neighboring observation j. h is a kernel bandwidth. The method used to measure the accuracy and evaluate performance of the models are, method of RMSE and R 2 . RMSE this is basically used to checked the estimation accuracy between the observed and estimated data. The RMSE closer to zero indicates higher accuracy in estimation. Where O is Observed Values while P is predicted Values.

ANALYSIS OF CLIMATE
R 2 is a measure that represents the amount of variance for a dependent variable explained by an independent variables. It's also called as coefficient of determination. The most common interpretation of R 2 is how well the regression model fit the observed data.
Smaller RMSE with a higher R 2 value indicates the best out performing estimation corresponding to the interpolator.

RESULTS AND DISCUSSION
In this section, we presented the diagnostic part of the research, where the findings have been used to evaluate the optimal model base on the validation methods discussed earlier. Other 24 Table 1 shows the summary statistic for the average Data variables (Climate factors) from 30 selected towns in the Study Area, and this highlight the nature of data to be use in the research. LG shows the highest Kriging prediction error due to the assumption that, the far predicted location is to the sample location, the higher the prediction error due to low correlation between the points and this cause a little violation in the assumption of nearest neighborhood.  From the results in table 3, it shows that, the Temperature, Pressure and Humidity has a predictive/explanative power over Rainfall as all their p-values are less-than the significant level at 0.05 except for the wind-speed which shows low predictive power, meaning, it's not statistically significant. Also the overall p-value for the whole GWR (1.027e-10), has proved that, all the auxiliary variables used for the prediction are associated with Rainfall in the Study Area. But in a short or long run, the Temperature, Pressure, Humidity has contributed more toward the annual Rainfall prediction than Wind-speed. In all, the R 2 for GWR proved that 88% of the total variation in the dependents variable (Rainfall), was explained by the combination of the independent variables.

CONCLUSION
Three spatial models are used to analysed the average rainfall dataset for the period of ten years from thirty known and Unknown location to determine the best spatial model based on the validation method used in the research, for GWR, we used variables such as temperature, pressure, humidity and win-speed as an independent variables for predicting Rainfall. From the findings, the results shows that, the Regression model outperformed geostatistical and deterministic model, but all Univariate models has comeout with correlated result, however the outcomes of GWR are best in terms of consistency in prediction, followed by OK and lastly IDW. GWR has proved the used of supplementary variables for rainfall estimation in every location would improve the prediction accuracy. The study recommended the use of GWR for Rainfall estimation in the North West region. Also the research present and discussed the aforementioned reviews related to rainfall studies, also the findings has proved the theory of ITCZ toward the rainfall oscillation in different locations of the study region considering all the models, lastly GWR appeared as the optimal model based on RMSE and R2, also the average Rainfall mean from the sample Locations come too closer with the predicted mean of GWR.

Further Research
Generally statistical data changes over time, and spatial data changes due to the event of ENSO. Therefore considering the role of auxiliary variables in Rainfall estimate should be more precise considering Models that will incorporate environmental factors as independent variables in their estimation process instead of using univariate models that only depends on its variable of study and the weight to estimate the neighboring Locations. Therefore comparing spatial Models that incorporate auxiliary variables in rainfall estimation will be suitable for obtaining good result in the study area.