Introduction
Health development is a planned and sustainable effort to improve the health status of society through various strategies, policies and interventions. The main goal of health development is to achieve optimal levels of health for the entire population, including increasing access to health services, disease prevention, health promotion, and improving the general quality of life. Therefore, the Indonesian Ministry of Health’s Health Research and Development Agency (BALITBANGKES) compiled the Public Health Development Index (PHDI). PHDI is a collection of health indicators that can be easily and directly measured to describe health problems.
We will modelling the PHDI using the Geographically Weighted Regression which generate distinct regression models for each specific location. The case study chosen is the PHDI on the island of Java using 2018 data collected from the Central Statistics Agency (BPS).
Data
The data used is 2018 PHDI data sourced from the Central Statistics Agency of Indonesia collected from 119 regencies/cities on the island of Java with average PHDI was 0.6087. The variables are:
Y : Community Health Development Index
X1: Percentage of Population Having Access to Clean Water
X2: Egg and Milk Consumption (Per Capita Per Week)/1000
X3: number of health centers
PHDI data source can be download here.
shp map for java island can be download here.
Summary of Logical Sequence in Analysis:
- Initial Exploration: Visualize your data on a map to look for clear patterns.
- Test of Spatial Dependence (Global Moran’s I) on the Dependent Variable: To see if there are clusters of values on the variable you want to explain.
- OLS Model (Classical Regression): Run an OLS model as a baseline.
- Test of Spatial Dependence (Global Moran’s I) on OLS Residuals: This is a crucial step. If the OLS residuals show significant spatial autocorrelation, it indicates the presence of unmodeled spatial effects.
- Test of Spatial Heterogeneity (F-Test for Non-Stationarity): If the OLS residuals show autocorrelation, the next step is to test whether spatial heterogeneity is the cause. This will tell you whether GWR is the right choice.
- GWR Model: If the F-test is significant, run GWR.
- GWR Residual Analysis: Test Moran’s I on the GWR residuals. Ideally, the spatial autocorrelation in the GWR residuals should be significantly reduced or insignificant, indicating that the GWR has successfully captured most of the spatial pattern.
- Interpreting the Local GWR Coefficients: Analyze and plot the local GWR coefficients to understand how the relationship varies across space.
The analysist will be done by R. By following these steps, we can systematically identify the presence and nature of spatial dependencies and heterogeneity in your data.
First we load the R packages:
library(readxl)
library(openxlsx)
library(spdep)
library(raster)
library(corrplot)
library(DescTools)
library(nortest)
library(car)
library(spatialreg)
library(sf)
library(nortest)
library(DescTools)
library(lmtest)
library(tidyverse)
library(spgwr)
library(ggplot2)
Next, read the file PHDI data:
dfx <- read_excel("D:/dataset-chdi.xlsx")
head(dfx)
str(dfx)
Select columns/attributes/variables from data. We only need city_code, Y, X1, X2, X3
dfx1<-dfx%>% dplyr::select(city_code,Y,X1,X2,X3)
str(dfx1)
tibble [119 × 5] (S3: tbl_df/tbl/data.frame)
$ city_code: num [1:119] 3101 3171 3172 3173 3174 ...
$ Y : num [1:119] 0.638 0.674 0.68 0.65 0.649 ...
$ X1 : num [1:119] 85 90 91 90 90 ...
$ X2 : num [1:119] 18 30.1 31.4 20.5 24.5 ...
$ X3 : num [1:119] 6 72 84 35 73 45 101 58 45 62 ...
show summary data
summary(dfx1)
city_code Y X1 X2 X3
Min. :3101 Min. :0.5345 Min. : 61.91 Min. : 7.191 Min. : 3.0
1st Qu.:3276 1st Qu.:0.6253 1st Qu.: 79.40 1st Qu.:12.475 1st Qu.: 21.0
Median :3327 Median :0.6437 Median : 90.72 Median :15.104 Median : 27.0
Mean :3386 Mean :0.6406 Mean : 87.71 Mean :16.534 Mean : 30.4
3rd Qu.:3516 3rd Qu.:0.6633 3rd Qu.: 96.06 3rd Qu.:19.150 3rd Qu.: 36.5
Max. :3674 Max. :0.7319 Max. :100.00 Max. :33.651 Max. :101.0
Step 1: Initial Exploration: Visualize your data on plots and map to look for clear patterns and relationship.
The Boxplot
par(mfrow = c(1, 4)) # set layout for 3 plots in 1 row
boxplot(dfx$Y,main="Y distribution")
boxplot(dfx$X1,main="X1 distribution")
boxplot(dfx$X2,main="X2 distribution")
boxplot(dfx$X3,main="X3 distribution")

Based on the Boxplot above, it can be seen that there are no outlier values in variable X1, whereas in Y, X2 and X3 there are outliers.
Relationship of the data
We use Scatter Plot to show linear relationship of data X1 -> Y, X2 -> Y, and X3 -> Y
par(mfrow = c(1, 3)) # set layout for 3 plots in 1 row
plot(dfx1$X1, dfx1$Y,
xlab="X1",
ylab="Y",
main="Scatter Plot of X1 and Y",
pch=20, col="orange", cex=2)
reg.klasik = lm(Y~X1, data=dfx1)
x_num <-sapply(reg.klasik,is.numeric)
abline(reg.klasik)
plot(dfx$X2, dfx$Y,
xlab="X2",
ylab="Y",
main="Scatter Plot of X2 and Y",
pch=20, col="orange", cex=2)
reg.klasik = lm(Y~X2, data=dfx)
abline(reg.klasik)
plot(dfx$X3, dfx$Y,
xlab="X3",
ylab="Y",
main="Scatter Plot of X3 and Y",
pch=20, col="orange", cex=2)
reg.klasik = lm(Y~X3, data=dfx)
abline(reg.klasik)
Use Correlation matrix to show the correlation between Y and variables X1, X2, X3
corrplot(cor(dfx1[,2:5]), method = “number”)
Based on the correlation matrix above, it can be seen that Y has a positive correlation with X1 and X2, and has a negative correlation with X3.
We can do also correlation tests to find out numerical correaltion between Y to X1, X2, and X3.
For X1 vs Y, we use hypotesist:
H0: There is no correlation between variables X1 and Y
H1: There is a correlation between variables X1 and Y
cor.test(dfx$Y, dfx1$X1)
result:
Pearson's product-moment correlation
data: dfx$Y and dfx1$X1
t = 2.0507, df = 117, p-value = 0.04253
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
0.006492699 0.354384355
sample estimates:
cor
0.1862706
conclution: p-value < 0.05, it can be concluded that H0 is rejected, which means there is a correlation between variables X1 and Y.
For X2 vs Y
H0: There is no correlation between variables X2 and Y
H1: There is a correlation between variables X2 and Y
cor.test(dfx1$Y, dfx1$X2)
result:
Pearson's product-moment correlation
data: dfx1$Y and dfx1$X2
t = 8.072, df = 117, p-value = 6.864e-13
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
0.4685205 0.7024543
sample estimates:
cor
0.5980791
conclution: p-value < 0.05, it can be concluded that H0 is rejected, which means there is a correlation between variables X2 and Y.
For X3 vs Y
H0: There is no correlation between variables X3 and Y
H1: There is a correlation between variables X3 and Y
cor.test(dfx$Y, dfx$X3)
result:
Pearson's product-moment correlation
data: dfx$Y and dfx$X3
t = -2.5987, df = 117, p-value = 0.01056
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
-0.39690734 -0.05595789
sample estimates:
cor
-0.2336006
Kesimpulan : p-value < 0.05, it can be concluded that H0 is rejected, which means there is a correlation between variables X3 and Y.
Data Spatial Distribution
First of all we load packages from R
library(tmap)
library(tmaptools)
library(RColorBrewer)
#Input map shp
mapx <- st_read(file.path("D:/javamap/jawa.shp"))
result:
Reading layer `jawa' from data source `D:\javamap\jawa.shp' using driver `ESRI Shapefile'
Simple feature collection with 119 features and 5 fields
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: 105.0998 ymin: -8.78036 xmax: 116.2702 ymax: -5.048857
Geodetic CRS: WGS 84
We set tmap mode to plotting and switch off spherical geometry (s2) in order to run the tm_shape()
tmap_mode("plot")
Colours<- rev(brewer.pal(8, "RdYlBu"))
sf_use_s2(FALSE)
next, we merge column Y, X1, X2, X3 from dfx to mapx
mapx$Y<- dfx$Y
mapx$X1<- dfx$X1
mapx$X2<- dfx$X2
mapx$X3<- dfx$X3
mapx
result:
Simple feature collection with 119 features and 9 fields
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: 105.0998 ymin: -8.78036 xmax: 116.2702 ymax: -5.048857
Geodetic CRS: WGS 84
First 10 features:
PROVNO KABKOTNO PROVINSI KABKOT ID2013 geometry Y X1 X2 X3
1 31 01 DKI JAKARTA KEPULAUAN SERIBU 3101 MULTIPOLYGON (((106.3842 -5... 0.6385 85.00 17.971 6
2 31 71 DKI JAKARTA JAKARTA SELATAN 3171 MULTIPOLYGON (((106.8491 -6... 0.6737 90.00 30.138 72
3 31 72 DKI JAKARTA JAKARTA TIMUR 3172 MULTIPOLYGON (((106.9719 -6... 0.6804 91.00 31.450 84
4 31 73 DKI JAKARTA JAKARTA PUSAT 3173 MULTIPOLYGON (((106.8788 -6... 0.6503 90.00 20.456 35
5 31 74 DKI JAKARTA JAKARTA BARAT 3174 MULTIPOLYGON (((106.711 -6.... 0.6494 90.00 24.460 73
6 31 75 DKI JAKARTA JAKARTA UTARA 3175 MULTIPOLYGON (((106.969 -6.... 0.6365 90.00 28.309 45
7 32 01 JAWA BARAT BOGOR 3201 MULTIPOLYGON (((106.994 -6.... 0.6098 81.06 16.918 101
8 32 02 JAWA BARAT SUKABUMI 3202 MULTIPOLYGON (((106.9652 -6... 0.6057 74.58 13.788 58
9 32 03 JAWA BARAT CIANJUR 3203 MULTIPOLYGON (((107.2843 -6... 0.6016 93.62 10.065 45
10 32 04 JAWA BARAT BANDUNG 3204 MULTIPOLYGON (((107.75 -6.8... 0.6065 77.81 15.152 62
Now we can plot the mapx data to a map based on the response variable Y
tm_shape(mapx) +
tm_compass(position = c("left", "bottom"))+
tm_polygons("Y", title='legend note', style='quantile',
#n=7, palette=c("red","pink","orange","yellow","yellowgreen","springgreen3","royalblue1"), title='X2',
n=5, palette=c("mistyrose1","pink","tomato1","red3","darkred"),
border.col='grey27', alpha=.9)+
tm_text(
# The key argument: specify the column for the text labels
text = "ID2013",
# Optional arguments to customize the text appearance
size = 0.5,
fontface = "bold",
col = "black",
shadow = FALSE
) +
tm_layout(title = "response Y",
title.position = c("center", "top"),
legend.position = c("left", "bottom"),
legend.text.size = 0.8,
frame = FALSE,
inner.margins = c(0, 0, 0, 0),
outer.margins = c(0, 0, 0, 0)
)
Based on 2018 PHDI data on the island of Java, several districts still have low PHDI levels spread across 4 provinces, namely West Java Province, Central Java Province, East Java Province and Banten Province. In West Java Province there are Sukabumi Regency, Cianjur Regency, Bandung Regency, Garut Regency, Tasikmalaya Regency. In Central Java Province there are Banjarnegara Regency and Jepara Regency. For East Java Province there are Bondowoso Regency, Situbondo Regency, Probolinggo Regency, Bangkalan Regency, Sampang Regency and Pamekasan Regency. Meanwhile, in Banten Province there are Pandeglang Regency, Lebak Regency, Serang Regency and Serang City.