We were modelling the Public Health Development Index (PHDI) using the Geographically Weighted Regression which generate distinct regression models for each specific location. In the previous article, we have done step 2 of Geographically Weighted Regression (GWR) for Community Health Index which was Test of Spatial Dependence (Global Moran’s I) on the Dependent Variable: To see if there are clusters of values on the variable you want to explain. Now we’re going to Step 3, step 4, and step 5 which are:
- OLS Model (Classical Regression): Run an OLS model as a baseline.
- Test of Spatial Dependence (Global Moran’s I) on OLS Residuals: This is a crucial step. If the OLS residuals show significant spatial autocorrelation, it indicates the presence of unmodeled spatial effects.
- Test of Spatial Heterogeneity (F-Test for Non-Stationarity): If the OLS residuals show autocorrelation, the next step is to test whether spatial heterogeneity is the cause. This will tell you whether GWR is the right choice.
OLS Model (Classical Regression)
reg.klasik = lm(Y~X1+X2+X3, data = dfx)
summary(reg.klasik)
result:
Call:
lm(formula = Y ~ X1 + X2 + X3, data = dfx)
Residuals:
Min 1Q Median 3Q Max
-0.078580 -0.013753 0.002806 0.018838 0.046118
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.6102508 0.0224565 27.175 < 2e-16 ***
X1 -0.0001549 0.0002572 -0.602 0.548
X2 0.0038138 0.0004331 8.806 1.56e-14 ***
X3 -0.0006290 0.0001428 -4.406 2.37e-05 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.02596 on 115 degrees of freedom
Multiple R-squared: 0.4505, Adjusted R-squared: 0.4362
F-statistic: 31.43 on 3 and 115 DF, p-value: 6.506e-15
calculate the Sum Square of Residual (SSR) or [latex]SSR_{global}[/latex] of the model:
# Extract residuals
residu_model <- residuals(reg.klasik)
# Calculate RSS
RSS <- sum(residu_model^2)
print(RSS)
#result:
[1] 0.07747458
We can also do the multicolinear test to find out if there are correlation beween predictors
car::vif(reg.klasik)
#result:
X1 X2 X3
1.127747 1.125477 1.039452
The VIF values for x1, x2, and x3 are less than 5, so it can be concluded that there is no multicollinearity between the explanatory variables.
We can also do the normality test of the residuals by hypothesis.
H0: The model residuals are normally distributed
H1: The model residuals are not normally distributed
err.regklasik<-reg.klasik$residuals
ad.test(err.regklasik)
#result:
Anderson-Darling normality test
data: err.regklasik
A = 0.88636, p-value = 0.02272
We can also plot the residual
hist(err.regklasik)

Plot the residual using the qq-plot
qqnorm(err.regklasik,datax=T)
qqline(rnorm(length(err.regklasik),mean(err.regklasik),sd(err.regklasik)),datax=T, col="red")
The QQ-plot shows many points around the line, indicating that normality is met. Furthermore, the Anderson-Darling test yields a p-value >0.05, indicating that the normality assumption is met (the model residuals are normally distributed).
Test of Spatial Dependence (Global Moran’s I) on OLS Residuals
Spatial Autocorrelation Test Hypothesis
H0: There is no autocorrelation in the residuals
H1: There is autocorrelation in the residuals
moran.test(reg.klasik$residuals, listw=Woptimum, alternative="two.sided", zero.policy = TRUE)
#result:
Moran I test under randomisation
data: reg.klasik$residuals
weights: Woptimum
Moran I statistic standard deviate = 6.1911, p-value = 5.976e-10
alternative hypothesis: two.sided
sample estimates:
Moran I statistic Expectation Variance
0.414423867 -0.008474576 0.004665964
Based on the test results above, the p-value <0.05 was obtained, which means H0 is rejected, so there is autocorrelation in the residuals of the classical regression model at a real level of 5%.
Test of Spatial Heterogeneity (F test)
H0: There is no spatial heterogeneity, meaning the global regression model is sufficient (the regression coefficients are the same across all locations).
H1: There is spatial heterogeneity, meaning the GWR model is better (the regression coefficients vary across locations).
The SSR of GWR model ([latex]SSR_{GWR}[/latex])is calculated in modelling the data by GWR.
Calculate the F-Test Statistic:
The F-statistic formula is calculated by comparing the SSE of the global model and the GWR model.
[latex]
\ F=\frac{\ (SSR_{global} – SSR_{gwr})/(k_{gwr}-k_{global})}{SSR_{gwr}/(n-k_{gwr})}
[/latex]
Where, [latex]SSR_{global}[/latex] : Sum of Squared Residual of the global regression model. [latex]SSR_{gwr}[/latex]: Sum of Squared Residual of the GWR model.
[latex]k_{global}=p + 1[/latex]: Degrees of freedom of the global model (p slopes + 1 intercept). [latex]k_{gwr}=tr(S)[/latex]: Effective degrees of freedom of the GWR model. [latex]S[/latex] is the weight matrix used in GWR, whose value depends on the optimal bandwidth.
Or using alternative formula:
[latex]
\ F=\frac{\ (SSR_{global} - SSR_{gwr})/(df_{global}-df_{gwr})}{SSR_{gwr}/(df_{gwr})}
[/latex]
[latex]SSR_{global}[/latex] = 0.07747458
[latex]SSR_{gwr}[/latex] = 0.03488672
[latex]k_{global}[/latex] = length(coef(ols.model)) = 4
[latex]k_{gwr}[/latex] = rtg.model$results$edf = 94.75899
# k_global
k_global <- length(coef(reg.klasik))
k_global
# k_lokal dari gwr
k_lokal <- rtg.model$results$edf
k_lokal
# jumlah observasi
n <- nrow(dfx1)
n
# SSR global & lokal
SSR_global <- sum(resid(reg.klasik)^2)
SSR_global
SSR_gwr <- sum(rtg.model$SDF$gwr.e^2)
SSR_gwr
# F-test
F_stat <- ((SSR_global - SSR_gwr) / (k_lokal - k_global)) /
(SSR_gwr / (n - k_lokal))
F_stat
# p-value
pval <- 1 - pf(F_stat, df1 = (k_lokal - k_global), df2 = (n - k_lokal))
pval
#result:
[1] 0.3260518
[1] 0.9999385
p-value = 0.9999385 (≈ 1)
Hypothesis test:
H0: The GWR model does not provide a significant improvement over the global model (no spatial heterogeneity).
H1: The GWR model is better (there is significant spatial heterogeneity).
Interpretation
Because the p-value is ≫ 0.05 (almost 1), H0 is rejected.
This means there is no strong evidence of spatial heterogeneity in the mass data → the global model (OLS) adequately explains the variation in the data; GWR does not provide a significant improvement.
Conclution:
The global model (OLS) adequately explains the variation in the data; GWR does not provide a significant improvement