Geographically Weighted Regression (GWR) for Community Health Index – Part 2

Posted on

We were modelling the Public Health Development Index (PHDI) using the Geographically Weighted Regression which generate distinct regression models for each specific location. In the previous article, we have done step 1 of Geographically Weighted Regression (GWR) for Community Health Index which was Initial Exploration: Visualize your data on a map to look for clear patterns. Now we’re going to Step 2 which is Test of Spatial Dependence (Global Moran’s I) on the Dependent Variable: To see if there are clusters of values on the variable you want to explain.

We calculate the Moran’s index using various weight matrices, both distance-based and neighborhood-based. The most popular distance weight matrices are Key Nearest Neighbor (KNN), Radial Distance Weight, Power Distance Weight, and Exponential Distance Weight. Neighborhood weight matrices include Rook Contiguity and Queen Contiguity. The weight matrix that produces the highest Moran’s index value will be selected for testing.

To calculate the weight matrix based on distance, we need to determine the coordinates of each location. The longitude and latitude coordinates can be extracted from the geometry attributes in the .shp file, which can be downloaded in the previous article.

#Extract centroid for each MULTIPOLYGON
centroids_sf <- st_centroid(mapx)
centroid_coords <- st_coordinates(centroids_sf)
head(centroid_coords)

#result:
            X         Y
[1,] 106.5794 -5.700150
[2,] 106.8100 -6.272549
[3,] 106.9003 -6.255373
[4,] 106.8351 -6.181230
[5,] 106.7483 -6.165469
[6,] 106.8695 -6.129185

next, we can merge the coordinates into the variable mapx.

mapx$long<-centroid_coords[,1]
mapx$lat<-centroid_coords[,2]
mapx

#result:
Simple feature collection with 119 features and 22 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: 105.0998 ymin: -8.78036 xmax: 116.2702 ymax: -5.048857
Geodetic CRS:  WGS 84
First 10 features:
   PROVNO KABKOTNO    PROVINSI           KABKOT ID2013                       geometry      Y    X1     X2  X3     long       lat
1      31       01 DKI JAKARTA KEPULAUAN SERIBU   3101 MULTIPOLYGON (((106.3842 -5... 0.6385 85.00 17.971   6 106.5794 -5.700150
2      31       71 DKI JAKARTA  JAKARTA SELATAN   3171 MULTIPOLYGON (((106.8491 -6... 0.6737 90.00 30.138  72 106.8100 -6.272549
3      31       72 DKI JAKARTA    JAKARTA TIMUR   3172 MULTIPOLYGON (((106.9719 -6... 0.6804 91.00 31.450  84 106.9003 -6.255373
4      31       73 DKI JAKARTA    JAKARTA PUSAT   3173 MULTIPOLYGON (((106.8788 -6... 0.6503 90.00 20.456  35 106.8351 -6.181230
5      31       74 DKI JAKARTA    JAKARTA BARAT   3174 MULTIPOLYGON (((106.711 -6.... 0.6494 90.00 24.460  73 106.7483 -6.165469
6      31       75 DKI JAKARTA    JAKARTA UTARA   3175 MULTIPOLYGON (((106.969 -6.... 0.6365 90.00 28.309  45 106.8695 -6.129185
7      32       01  JAWA BARAT            BOGOR   3201 MULTIPOLYGON (((106.994 -6.... 0.6098 81.06 16.918 101 106.7675 -6.559979
8      32       02  JAWA BARAT         SUKABUMI   3202 MULTIPOLYGON (((106.9652 -6... 0.6057 74.58 13.788  58 106.7076 -7.076208
9      32       03  JAWA BARAT          CIANJUR   3203 MULTIPOLYGON (((107.2843 -6... 0.6016 93.62 10.065  45 107.1578 -7.133713
10     32       04  JAWA BARAT          BANDUNG   3204 MULTIPOLYGON (((107.75 -6.8... 0.6065 77.81 15.152  62 107.6108 -7.099969

we van also save the centroids as a new shapefile (optional):

st_write(centroids_sf, "D:/javamap/jawawithcentroids.shp")

Plot the centroid and calculate the distance.

plot(centroid_coords)
djarak<-dist(longlat) 
m.djarak<-as.matrix(djarak)

K-Nearest Neighbor Weight

#k=3 
W.knn<-knn2nb(knearneigh(longlat,k=3,longlat=TRUE)) 
W.knn

#result:
Neighbour list object:
Number of regions: 119 
Number of nonzero links: 357 
Percentage nonzero weights: 2.521008 
Average number of links: 3 
Non-symmetric neighbours list
W.knn1 <- nb2listw(W.knn,style='W') 
W.knn1
#result:

Radial Distance Weight

#d=110, The dmax threshold value is set at 110 km
W.dmax<-dnearneigh(longlat,0,110,longlat=TRUE)
W.dmax.s <- nb2listw(W.dmax,style='W') 
#W is row standardised (sums over all links to n). 
W.dmax.s

Power Distance Weight / Invers Distance Weight

alpha1=1 
W.idw <-1/(m.djarak^alpha1)

#normalization
#dinormalisasi 
diag(W.idw)<-0 
rtot<-rowSums(W.idw,na.rm=TRUE)
W.idw.sd<-W.idw/rtot #row-normalized 
rowSums(W.idw.sd,na.rm=TRUE)

W.idw.1 = mat2listw(W.idw.sd,style='W') 
summary(W.idw.1)
alpha1=2 
W.idw2 <-2/(m.djarak^alpha1) 
diag(W.idw2)<-0 
rtot<-rowSums(W.idw2,na.rm=TRUE)
W.idw.sd2<-W.idw2/rtot #row-normalized 
rowSums(W.idw.sd2,na.rm=TRUE)
W.idw.22 = mat2listw(W.idw.sd2,style='W') 
summary(W.idw.22)

Exponential Distance Weight

alpha=1 
W.e<-exp((-alpha)*m.djarak)
diag(W.e)<-0 
rtot<-rowSums(W.e,na.rm=TRUE) 
W.e.sd<-W.e/rtot #row-normalized 
rowSums(W.e.sd,na.rm=TRUE)

W.ed1 = mat2listw(W.e.sd,style='W') 
summary(W.ed1)
#for alpha=2
alpha=2
W.e2<-exp((-alpha)*m.djarak) 
diag(W.e2)<-0 
rtot<-rowSums(W.e2,na.rm=TRUE) 
W.e.sd2<-W.e2/rtot #row-normalized 
rowSums(W.e.sd2,na.rm=TRUE)

W.ed2 = mat2listw(W.e.sd2,style='W') 
summary(W.ed2)

Matriks Bobot berdasarkan ketetanggaan (Spatial Contiguity Weight)

Rook Contiguity

sf::sf_use_s2(FALSE)
# Spherical geometry (s2) switched off
W.rook<-poly2nb(mapx,queen=FALSE) 
W.rook
#result:
Neighbour list object:
Number of regions: 119 
Number of nonzero links: 518 
Percentage nonzero weights: 3.657934 
Average number of links: 4.352941 
1 region with no links:
1
3 disjoint connected subgraphs
W.rook.s <- nb2listw(W.rook,style='W', zero.policy = TRUE) #W is row standardised (sums over all links to n). Standardisasi Baris 
print(W.rook.s, zero.policy=TRUE)

Queen Contiguity

W.queen<-poly2nb(peta,queen=TRUE)
W.queen
#result:
Neighbour list object:
Number of regions: 119 
Number of nonzero links: 522 
Percentage nonzero weights: 3.68618 
Average number of links: 4.386555 
1 region with no links:
1
3 disjoint connected subgraphs
W.queen.s <- nb2listw(W.queen,style='W', zero.policy = TRUE) #W is row standardised (sums over all links to n). 
print(W.queen.s, zero.policy=TRUE)

Now we choose weight matrices

#KNN
MI.knn <- moran(dfx$Y, W.knn1, n=length(W.knn1$neighbours), S0=Szero(W.knn1))
MI.knn

#result:
$I
[1] 0.480363
$K
[1] 3.678483
#Radial Distance Weight
MI.radial <- moran(dfx$Y, W.dmax.s, n=length(W.dmax.s$neighbours), S0=Szero(W.dmax.s))
MI.radial

#result:
$I
[1] 0.2453089
$K
[1] 3.678483
#Power Distance Weight
MI.power1 <- moran(dfx$Y, W.idw.1, n=length(W.idw.1$neighbours), S0=Szero(W.idw.1))
MI.power1

#result:
$I
[1] 0.1323094
$K
[1] 3.678483
MI.power2 <- moran(dfx$Y, W.idw.22, n=length(W.idw.22$neighbours), S0=Szero(W.idw.22))
MI.power2

#result:
$I
[1] 0.3299451
$K
[1] 3.678483
#Exponential Distance Weight
MI.exp1 <- moran(dfx$Y, W.ed1, n=length(W.ed1$neighbours), S0=Szero(W.ed1))
MI.exp1

#result:
$I
[1] 0.1238541
$K
[1] 3.678483
MI.exp2 <- moran(dfx$Y, W.ed2, n=length(W.ed2$neighbours), S0=Szero(W.ed2))
MI.exp2
#result:
$I
[1] 0.2328439
$K
[1] 3.678483
#Rook Contiguity
MI.rook <- moran.test(dfx$Y, W.rook.s, randomisation = TRUE, zero.policy = TRUE)
MI.rook$estimate
#result:
Moran I statistic       Expectation          Variance 
      0.505966592      -0.008547009       0.004569424 
#Queen Contiguity
MI.queen <- moran.test(dfx$Y, W.queen.s, randomisation = TRUE, zero.policy = TRUE)
MI.queen$estimate
#result:
Moran I statistic       Expectation          Variance 
      0.504911494      -0.008547009       0.004549995 

Comparison of Moran’s Index Values

moranindeks<-data.frame(
  "Matriks Bobot"=c("KNN (k=5)", "Radial distance weight", "Power distance weight (alpha=1)", "Power distance weight (alpha=2)", "Exponential distance weight (alpha=1)", "Exponential distance weight (alpha=2)", "Rook Contiguity", "Queen Contiguity"),
  "Nilai Indeks Moran"=c(MI.knn$I, MI.radial$I, MI.power1$I, MI.power2$I, MI.exp1$I, MI.exp2$I, -0.008530337, -0.010844530))

moranindeks

#result:
Matriks.Bobot
<chr>
Nilai.Indeks.Moran
<dbl>
KNN (k=5)	0.480363044			
Radial distance weight	0.245308916			
Power distance weight (alpha=1)	0.132309418			
Power distance weight (alpha=2)	0.329945074			
Exponential distance weight (alpha=1)	0.123854065			
Exponential distance weight (alpha=2)	0.232843892			
Rook Contiguity	-0.008530337			
Queen Contiguity	-0.010844530			
8 rows

Conclusion:
The weight matrix that produces the largest Moran’s Index value is the knn weight matrix k=3 (W.knn1).
Next, the Moran’s Index will be tested using the knn matrix:
H0: I=0 (no spatial autocorrelation between locations)
H1: I>0 (there is spatial autocorrelation between locations)

Woptimum <- W.knn1
moran.test(dfx$Y, Woptimum, randomisation = TRUE, zero.policy = TRUE)

#result:
	Moran I test under randomisation

data:  dfx$Y  
weights: Woptimum    

Moran I statistic standard deviate = 7.176, p-value = 3.59e-13
alternative hypothesis: greater
sample estimates:
Moran I statistic       Expectation          Variance 
      0.480363044      -0.008474576       0.004640560 

Conclusion:
Based on the results above, a p-value <0.05 is obtained, meaning H0 is rejected. The alternative hypothesis is greater, meaning that at the 5% significance level, there is positive spatial autocorrelation.

Moran Plot

moran.plot(dfx$Y, Woptimum, labels=dfx$'city')

Horizontal and vertical lines intersect the average X and Y values, dividing the graph into four quadrants: Quadrant I (top right, High-High / HH)
Areas with high values, surrounded by high-value areas. Example: Magelang, Sleman.
This means there is a positive cluster → high values tend to be close to other high values.

Quadrant II (top left, Low-High / LH)
Areas with low values, but surrounded by high-value areas. This means these areas are negative outliers amidst high-value areas.

Quadrant III (bottom left, Low-Low / LL)
Areas with low values, surrounded by low-value areas. Example: Bangkalan, Tasikmalaya, Garut, Lebak, Pandeglang, Sumenep.
This means there is a positive cluster but for low values.

Quadrant IV (bottom right, High-Low / HL)
Areas with high scores, but surrounded by low-value areas. Example: Banyuwangi, Cilegon City
This means these areas are positive outliers amidst low-value areas.

A positively sloping regression line → there is a global positive spatial autocorrelation, meaning that areas with high values tend to be adjacent to areas with high values, and areas with low values to areas with low values. However, points in quadrants II and IV indicate spatial outliers (areas with different patterns from their surroundings).

Next, OLS Model (Classical Regression): Run an OLS model as a baseline.