Geographically Weighted Regression for Community Health Index (Part 3)

Posted on

We were modelling the Public Health Development Index (PHDI) using the Geographically Weighted Regression which generate distinct regression models for each specific location. In the previous article, we have done step 2 of Geographically Weighted Regression (GWR) for Community Health Index which was Test of Spatial Dependence (Global Moran’s I) on the Dependent Variable: To see if there are clusters of values on the variable you want to explain. Now we’re going to Step 3, step 4, and step 5 which are:

  1. OLS Model (Classical Regression): Run an OLS model as a baseline.
  2. Test of Spatial Dependence (Global Moran’s I) on OLS Residuals: This is a crucial step. If the OLS residuals show significant spatial autocorrelation, it indicates the presence of unmodeled spatial effects.
  3. Test of Spatial Heterogeneity (F-Test for Non-Stationarity): If the OLS residuals show autocorrelation, the next step is to test whether spatial heterogeneity is the cause. This will tell you whether GWR is the right choice.

OLS Model (Classical Regression)

reg.klasik = lm(Y~X1+X2+X3, data = dfx)
summary(reg.klasik)

result:

Call:
lm(formula = Y ~ X1 + X2 + X3, data = dfx)

Residuals:
      Min        1Q    Median        3Q       Max 
-0.078580 -0.013753  0.002806  0.018838  0.046118 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept)  0.6102508  0.0224565  27.175  < 2e-16 ***
X1          -0.0001549  0.0002572  -0.602    0.548    
X2           0.0038138  0.0004331   8.806 1.56e-14 ***
X3          -0.0006290  0.0001428  -4.406 2.37e-05 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.02596 on 115 degrees of freedom
Multiple R-squared:  0.4505,	Adjusted R-squared:  0.4362 
F-statistic: 31.43 on 3 and 115 DF,  p-value: 6.506e-15

calculate the Sum Square of Residual (SSR) or [latex]SSR_{global}[/latex] of the model:

# Extract residuals
residu_model <- residuals(reg.klasik)
# Calculate RSS
RSS <- sum(residu_model^2)
print(RSS)

#result:
[1] 0.07747458

We can also do the multicolinear test to find out if there are correlation beween predictors

car::vif(reg.klasik)

#result:
      X1       X2       X3 
1.127747 1.125477 1.039452

The VIF values for x1, x2, and x3 are less than 5, so it can be concluded that there is no multicollinearity between the explanatory variables.

We can also do the normality test of the residuals by hypothesis.
H0: The model residuals are normally distributed
H1: The model residuals are not normally distributed

err.regklasik<-reg.klasik$residuals
ad.test(err.regklasik)

#result:
	Anderson-Darling normality test

data:  err.regklasik
A = 0.88636, p-value = 0.02272

We can also plot the residual

hist(err.regklasik)

Plot the residual using the qq-plot

qqnorm(err.regklasik,datax=T)
qqline(rnorm(length(err.regklasik),mean(err.regklasik),sd(err.regklasik)),datax=T, col="red")

The QQ-plot shows many points around the line, indicating that normality is met. Furthermore, the Anderson-Darling test yields a p-value >0.05, indicating that the normality assumption is met (the model residuals are normally distributed).

Test of Spatial Dependence (Global Moran’s I) on OLS Residuals

Spatial Autocorrelation Test Hypothesis
H0: There is no autocorrelation in the residuals
H1: There is autocorrelation in the residuals

moran.test(reg.klasik$residuals, listw=Woptimum, alternative="two.sided", zero.policy = TRUE)

#result:
	Moran I test under randomisation

data:  reg.klasik$residuals  
weights: Woptimum    

Moran I statistic standard deviate = 6.1911, p-value = 5.976e-10
alternative hypothesis: two.sided
sample estimates:
Moran I statistic       Expectation          Variance 
      0.414423867      -0.008474576       0.004665964 

Based on the test results above, the p-value <0.05 was obtained, which means H0 is rejected, so there is autocorrelation in the residuals of the classical regression model at a real level of 5%.

Test of Spatial Heterogeneity (F test)

H0: There is no spatial heterogeneity, meaning the global regression model is sufficient (the regression coefficients are the same across all locations).
H1: There is spatial heterogeneity, meaning the GWR model is better (the regression coefficients vary across locations).

The SSR of GWR model ([latex]SSR_{GWR}[/latex])is calculated in modelling the data by GWR.

Calculate the F-Test Statistic:
The F-statistic formula is calculated by comparing the SSE of the global model and the GWR model.

[latex]
\ F=\frac{\ (SSR_{global} – SSR_{gwr})/(k_{gwr}-k_{global})}{SSR_{gwr}/(n-k_{gwr})}
[/latex]
Where, [latex]SSR_{global}[/latex] : Sum of Squared Residual of the global regression model. [latex]SSR_{gwr}[/latex]: Sum of Squared Residual of the GWR model.
[latex]k_{global}=p + 1[/latex]: Degrees of freedom of the global model (p slopes + 1 intercept). [latex]k_{gwr}=tr(S)[/latex]: Effective degrees of freedom of the GWR model. [latex]S[/latex] is the weight matrix used in GWR, whose value depends on the optimal bandwidth.

Or using alternative formula:

[latex]
\ F=\frac{\ (SSR_{global} - SSR_{gwr})/(df_{global}-df_{gwr})}{SSR_{gwr}/(df_{gwr})}  
[/latex]

[latex]SSR_{global}[/latex] = 0.07747458
[latex]SSR_{gwr}[/latex] = 0.03488672
[latex]k_{global}[/latex] = length(coef(ols.model)) = 4
[latex]k_{gwr}[/latex] = rtg.model$results$edf = 94.75899

# k_global
k_global <- length(coef(reg.klasik)) 
k_global

# k_lokal dari gwr
k_lokal <- rtg.model$results$edf 
k_lokal 

# jumlah observasi
n <- nrow(dfx1)
n

# SSR global & lokal
SSR_global <- sum(resid(reg.klasik)^2)
SSR_global
SSR_gwr <- sum(rtg.model$SDF$gwr.e^2)
SSR_gwr

# F-test
F_stat <- ((SSR_global - SSR_gwr) / (k_lokal - k_global)) / 
          (SSR_gwr / (n - k_lokal))
F_stat

# p-value
pval <- 1 - pf(F_stat, df1 = (k_lokal - k_global), df2 = (n - k_lokal))
pval

#result:
[1] 0.3260518
[1] 0.9999385

p-value = 0.9999385 (≈ 1)

Hypothesis test:
H0: The GWR model does not provide a significant improvement over the global model (no spatial heterogeneity).
H1: The GWR model is better (there is significant spatial heterogeneity).

Interpretation
Because the p-value is ≫ 0.05 (almost 1), H0 is rejected.
This means there is no strong evidence of spatial heterogeneity in the mass data → the global model (OLS) adequately explains the variation in the data; GWR does not provide a significant improvement.

Conclution:

The global model (OLS) adequately explains the variation in the data; GWR does not provide a significant improvement

Leave a Reply

Your email address will not be published. Required fields are marked *