Geographically Weighted Regression for Community Health Index (Part 4)

Posted on

The GWR Model

A model can be either global or local. Classical linear regression is a global model. It is called global because it has a single model that applies generally to all observations. A local model is more flexible, meaning that in a spatial context, each region/location can have its own model.

Geographically Weighted Regression (GWR) is a local model. Some of the advantages of using this model include:

  1. Estimating local standard errors
  2. Calculating local leverage measures
  3. Testing the significance of spatial variation in local parameter estimates
  4. Testing whether a local model is superior to a global model

Determining Optimal Bandwidth

Bandwidth is a circle with a radius of r from the center point of a location that serves as the basis for determining the weight of each observation in the regression model for each location.

The optimal bandwidth selection method is crucial for accurate function (kernel) estimation. A very small bandwidth value will result in increased variance. Conversely, a large bandwidth value will result in increased bias. Therefore, bandwidth selection must be carried out carefully to ensure the optimum bandwidth is selected, thereby ensuring high model accuracy. One method for selecting the optimum bandwidth is cross-validation (CV). The bi-square kernel weighting function is used to select the optimum bandwidth.

b.gwr <- gwr.sel(Y~X1+X2+X3,data=dfx,coords = centroid_coords,gweight=gwr.bisquare)

#result:
Bandwidth: 3.467319 CV score: 0.06295686 
Bandwidth: 5.604639 CV score: 0.07338961 
Bandwidth: 2.146383 CV score: 0.05566974 
Bandwidth: 1.329999 CV score: 0.0832352 
Bandwidth: 2.650935 CV score: 0.05856289 
Bandwidth: 1.834552 CV score: 0.05448587 
Bandwidth: 1.64183 CV score: 0.0563147 
Bandwidth: 1.918378 CV score: 0.05443389 
Bandwidth: 1.892471 CV score: 0.05436471 
Bandwidth: 1.881922 CV score: 0.05435183 
Bandwidth: 1.871842 CV score: 0.05435071 
Bandwidth: 1.875859 CV score: 0.0543497 
Bandwidth: 1.875963 CV score: 0.0543497 
Bandwidth: 1.8759 CV score: 0.0543497 
Bandwidth: 1.875818 CV score: 0.0543497 
Bandwidth: 1.875859 CV score: 0.0543497 
b.gwr

#result:
[1] 1.875859

Based on the output above, the optimum bandwidth obtained is 1.875859.

GWR Modeling with Optimum Bandwidth

set.seed(777)
rtg.model <- gwr(Y~X1+X2+X3,data=dfx1, coords = centroid_coords, gweight=gwr.bisquare,bandwidth = b.gwr,hatmatrix=TRUE) 
rtg.model

#result:
Call:
gwr(formula = Y ~ X1 + X2 + X3, data = dfx1, coords = centroid_coords, 
    bandwidth = b.gwr, gweight = gwr.bisquare, hatmatrix = TRUE)
Kernel function: gwr.bisquare 
Fixed bandwidth: 1.875859 
Summary of GWR coefficient estimates at data points:
                    Min.     1st Qu.      Median     3rd Qu.        Max.  Global
X.Intercept.  0.33264887  0.51019690  0.58587648  0.60154910  0.62750248  0.6103
X1           -0.00093176  0.00014080  0.00024606  0.00072156  0.00125197 -0.0002
X2            0.00204668  0.00297894  0.00345955  0.00394030  0.00906528  0.0038
X3           -0.00081886 -0.00057252 -0.00048241 -0.00018790  0.00128358 -0.0006
Number of data points: 119 
Effective number of parameters (residual: 2traceS - traceS'S): 24.24101 
Effective degrees of freedom (residual: 2traceS - traceS'S): 94.75899 
Sigma (residual: 2traceS - traceS'S): 0.01918757 
Effective number of parameters (model: traceS): 19.38272 
Effective degrees of freedom (model: traceS): 99.61728 
Sigma (model: traceS): 0.01871383 
Sigma (ML): 0.01712208 
AICc (GWR p. 61, eq 2.33; p. 96, eq. 4.21): -580.6356 
AIC (GWR p. 96, eq. 4.22): -610.9478 
Residual sum of squares: 0.03488672 
Quasi-global R2: 0.752573 

Based on the results of the RTG modeling above, the parameter estimates for explanatory variables X1 and X2 are obtained which have negative minimum and maximum values so that explanatory variables X1 and X2 contribute negatively to the response variable Y at all observation locations. Meanwhile, for explanatory variable X3, the minimum and maximum values of the parameter estimates have a positive sign, so that explanatory variable X3 contributes positively to the response variable Y at all observation locations.

Calculate the [latex]k_{GWR}[/latex] for F-Test

k_GWR <- rtg.model$results$edf
k_GWR

#result:
[1] 94.75899

This GWR result values are used to calculate the Test of Spatial Heterogeneity (F-Test).