The smerc package is focused on Statistical Methods for Regional Counts. It particularly focuses on the spatial scan method (Kulldorff, 1997) and many of its extensions.
In what follows, we demonstrate some of the basic functionality of the smerc package using 281 observations related to leukeumia cases in an 8 county area of the state of New York. The data were made available in Waller and Gotway (2005) and details are provided there.
The leukemia data are available in the nydf
dataframe. Each row of the dataframe contains information regarding a different region. Each row of the dataframe includes longitude
and latitude
information for the centroid of each region, transformed x
and y
coordinates available in the original dataset provided by Waller and Gotway (2005), as well as the population
of each region and the number of leukemia cases
.
## # This research was partially supported under NSF grant 1463642
## 'data.frame': 281 obs. of 6 variables:
## $ longitude : num -75.9 -75.9 -75.9 -75.9 -75.9 ...
## $ latitude : num 42.1 42.1 42.1 42.1 42.1 ...
## $ population: num 3540 3560 3739 2784 2571 ...
## $ cases : num 3.08 4.08 1.09 1.07 3.06 ...
## $ x : num 4.07 4.64 5.71 7.61 7.32 ...
## $ y : num -67.4 -66.9 -67 -66 -67.3 ...
A SpatialPolygonsDataFrame
associated with the regions is included in the nypoly
data set, which we plot below.
## Warning: package 'sp' was built under R version 4.0.3
## Warning in wkt(obj): CRS object has no comment
The spatial scan test can be perfomed using the scan.test
function. The user must supply the coordinates of the region centroids, the number of cases in each region, and the population (i.e., the number persons at risk). There are many additional optional arguments described in the documentation. In the following example, we reduce the number of Monte Carlo simulations used to estimate the p-value.
coords = nydf[,c("x", "y")] # extract coordinates
cases = nydf$cases # extract cases
pop = nydf$population # extract population
scan_out = scan.test(coords, cases, pop, nsim = 99) # perform scan test
## computing statistics for simulated data:
The scan.test
function (and most of the other *.test
functions in the smerc package) will return an object of class smerc_cluster
.
## [1] "smerc_cluster"
A smerc_cluster
object has a default print option that summarizes relevant details about the object produced by the function.
In this case, we receive the following:
## method: circular scan
## statistic: poisson
## simulation: multinomial
## realizations: 99
## population upperbound: 50%
## minimum cases: 2
In this particular case, we see that the scan_out
object summarizes the results of the circular scan
method using a poisson
statistic (the other choice is binomial
). The Monte Carlo p-value was estimated using 99
realizations from a multinomial
distribution (poisson
and binomial
are the other possibilities). The candidate zones were limited to having no more than 50%
of the total population and at least 2
cases had to be in a candidate zone to be considered a cluster. Other methods will provide some of the same information, but some of the information will be different based on the relevant parameters of the models.
A smerc_cluster
also has a summary
function that summarizes the significant (or most likely) clusters returned by the method.
## regions max_dist cases ex rr stat p
## 1 24 12.3 95.33108 55.8 1.8 13.1 0.01
## 2 11 26.5 49.71990 27.1 1.9 8.0 0.02
The summary
of scan_out
reveals that there were two significant clusters. The first cluster was comprised of 24 regions with a maximum distance of 12.3 between centroids. The number of cases observed in the cluster is 95.33, while 55.8 cases were expected. The relative risk of the cluster is 1.8, with an associated test statistic of 13.1, and a Monte Carlo p-value of 0.01. A second cluster is also summarized with similar information.
A very basic plot
mechanism is provided for smerc_cluster
objects. The plot will show the locations of every centroid, with significant clusters shown with different colors and symbols. An attempt is also made to show the connectivity of the regions. In the plot below, we see the two significant clusters shown in yellow (middle) and purple (bottom).
If a polygon-like structure is associated with each region, then the color.clusters
can be used to easily produce nicer plots. E.g., we can use the nypoly
object to create a nicer map of our results.
## Warning in wkt(obj): CRS object has no comment
Other scan methods available include:
elliptic.test
}.flex.test
}.rflex.test
}.uls.test
}.dmst.test
}.edmst.test
}.dc.test
}.mlink.test
}.fast.test
}.mlf.test
}.Other non-scan method for the detection of clusters and/or clustering of cases for regional data are provided.
bn.test
performs the Besag-Newell test (Besag and Newell, 1991) and returns a smerc_cluster
. The print
, summary
, and plot
methods are available, as before.
bn_out = bn.test(coords = coords, cases = cases, pop = pop, cstar = 20,
alpha = 0.01) # perform besag-newell test
bn_out # print results
## method: Besag-Newell
## case radius: 20
## modified p-value: FALSE
## regions max_dist cases ex rr stat p
## 1 5 3.0 20.42516 10.2 2.0 5 0.004132413
## 2 5 13.4 20.15705 10.5 1.9 5 0.006017272
## 3 3 2.0 20.39443 10.8 1.9 3 0.008039295
## 4 5 2.7 25.45904 11.0 2.4 5 0.009113555
## 5 5 2.4 20.29863 11.1 1.9 5 0.009962904
The Cluster Evaluation Permutation Procedure (CEPP) proposed by Turnbull et al. (1990) can be performed using
cepp.test
. The function returns a smerc_cluster
object with print
, summary
, and plot
functions.
# perform CEPP test
cepp_out = cepp.test(coords = coords, cases = cases, pop = pop,
nstar = 5000, nsim = 99, alpha = 0.1)
cepp_out # print results
## method: CEPP
## simulation: multinomial
## realizations: 99
## population radius: 5000
## regions max_dist cases ex rr stat p
## 1 2 3.6 10.37872 3.4 3 9.6 0.06
Tango’s clustering detection test (Tango, 1995) can be performed via the
tango.test
function. The dweights
function can be used to construct the weights for the test. The tango.test
function produces and object of class tango
, which has a native print
function and a plot
function that compares the goodness-of-fit and spatial autocorrelation components of Tango’s statistic for the observed data and the simulated data sets.
w = dweights(coords, kappa = 1) # construct weights matrix
tango_out = tango.test(cases, pop, w, nsim = 49) # perform tango's test
tango_out # print results
## method: Tango's index
## index: 0.0029
## goodness-of-fit component: 0.0026
## spatial autocorrelation component: 0.00033
## chi-square statistic: 210
## chi-square df: 110
## chi-square p-value: 1.4e-08
## Monte Carlo p-value: 0.02
## Additional details about cluster detection methods
Nearly all cluster detection methods have both a *.zones
function that returns all the candidate zones for the method and a *.sim
function that is used to produce results for data simulated under the null hypothesis. These are unlikely to interest most users, but may be useful for individuals wanting to better understand how these methods work or are interested in developing new methods based off existing ones. The *.sim
functions are not meant for general use, as they are written to take very specific arguments that the user could easily misuse.
As an example, the following code produces all elliptical-shaped zones considered by the elliptic scan method (Kulldorff et al., 2006) using a population upperbound of no more than 10%. The function returns a list with the 248,213 candidate zones, as well as the associated shape and angle.
# obtain zones for elliptical scan method
ezones = elliptic.zones(coords, pop, ubpop = 0.1)
# view structure of ezones
str(ezones)
## List of 3
## $ zones:List of 239228
## ..$ : int 1
## ..$ : int [1:2] 1 2
## ..$ : int [1:3] 1 2 15
## ..$ : int [1:4] 1 2 15 49
## ..$ : int [1:5] 1 2 15 49 50
## ..$ : int [1:6] 1 2 15 49 50 13
## ..$ : int [1:7] 1 2 15 49 50 13 3
## ..$ : int [1:8] 1 2 15 49 50 13 3 14
## ..$ : int [1:9] 1 2 15 49 50 13 3 14 48
## ..$ : int [1:10] 1 2 15 49 50 13 3 14 48 47
## ..$ : int [1:11] 1 2 15 49 50 13 3 14 48 47 ...
## ..$ : int [1:12] 1 2 15 49 50 13 3 14 48 47 ...
## ..$ : int [1:13] 1 2 15 49 50 13 3 14 48 47 ...
## ..$ : int [1:14] 1 2 15 49 50 13 3 14 48 47 ...
## ..$ : int [1:15] 1 2 15 49 50 13 3 14 48 47 ...
## ..$ : int [1:16] 1 2 15 49 50 13 3 14 48 47 ...
## ..$ : int [1:17] 1 2 15 49 50 13 3 14 48 47 ...
## ..$ : int [1:18] 1 2 15 49 50 13 3 14 48 47 ...
## ..$ : int [1:19] 1 2 15 49 50 13 3 14 48 47 ...
## ..$ : int [1:20] 1 2 15 49 50 13 3 14 48 47 ...
## ..$ : int [1:21] 1 2 15 49 50 13 3 14 48 47 ...
## ..$ : int [1:22] 1 2 15 49 50 13 3 14 48 47 ...
## ..$ : int [1:23] 1 2 15 49 50 13 3 14 48 47 ...
## ..$ : int [1:24] 1 2 15 49 50 13 3 14 48 47 ...
## ..$ : int [1:25] 1 2 15 49 50 13 3 14 48 47 ...
## ..$ : int [1:26] 1 2 15 49 50 13 3 14 48 47 ...
## ..$ : int [1:27] 1 2 15 49 50 13 3 14 48 47 ...
## ..$ : int [1:28] 1 2 15 49 50 13 3 14 48 47 ...
## ..$ : int [1:29] 1 2 15 49 50 13 3 14 48 47 ...
## ..$ : int 2
## ..$ : int [1:3] 2 1 3
## ..$ : int [1:4] 2 1 3 13
## ..$ : int [1:5] 2 1 3 13 15
## ..$ : int [1:6] 2 1 3 13 15 14
## ..$ : int [1:7] 2 1 3 13 15 14 49
## ..$ : int [1:8] 2 1 3 13 15 14 49 47
## ..$ : int [1:9] 2 1 3 13 15 14 49 47 12
## ..$ : int [1:10] 2 1 3 13 15 14 49 47 12 50
## ..$ : int [1:11] 2 1 3 13 15 14 49 47 12 50 ...
## ..$ : int [1:12] 2 1 3 13 15 14 49 47 12 50 ...
## ..$ : int [1:13] 2 1 3 13 15 14 49 47 12 50 ...
## ..$ : int [1:14] 2 1 3 13 15 14 49 47 12 50 ...
## ..$ : int [1:15] 2 1 3 13 15 14 49 47 12 50 ...
## ..$ : int [1:16] 2 1 3 13 15 14 49 47 12 50 ...
## ..$ : int [1:17] 2 1 3 13 15 14 49 47 12 50 ...
## ..$ : int [1:18] 2 1 3 13 15 14 49 47 12 50 ...
## ..$ : int [1:20] 2 1 3 13 15 14 49 47 12 50 ...
## ..$ : int [1:21] 2 1 3 13 15 14 49 47 12 50 ...
## ..$ : int [1:22] 2 1 3 13 15 14 49 47 12 50 ...
## ..$ : int [1:23] 2 1 3 13 15 14 49 47 12 50 ...
## ..$ : int [1:24] 2 1 3 13 15 14 49 47 12 50 ...
## ..$ : int [1:25] 2 1 3 13 15 14 49 47 12 50 ...
## ..$ : int [1:27] 2 1 3 13 15 14 49 47 12 50 ...
## ..$ : int [1:28] 2 1 3 13 15 14 49 47 12 50 ...
## ..$ : int 3
## ..$ : int [1:2] 3 13
## ..$ : int [1:3] 3 13 2
## ..$ : int [1:4] 3 13 2 12
## ..$ : int [1:5] 3 13 2 12 14
## ..$ : int [1:6] 3 13 2 12 14 5
## ..$ : int [1:7] 3 13 2 12 14 5 1
## ..$ : int [1:8] 3 13 2 12 14 5 1 10
## ..$ : int [1:9] 3 13 2 12 14 5 1 10 11
## ..$ : int [1:10] 3 13 2 12 14 5 1 10 11 4
## ..$ : int [1:11] 3 13 2 12 14 5 1 10 11 4 ...
## ..$ : int [1:12] 3 13 2 12 14 5 1 10 11 4 ...
## ..$ : int [1:13] 3 13 2 12 14 5 1 10 11 4 ...
## ..$ : int [1:14] 3 13 2 12 14 5 1 10 11 4 ...
## ..$ : int [1:15] 3 13 2 12 14 5 1 10 11 4 ...
## ..$ : int [1:16] 3 13 2 12 14 5 1 10 11 4 ...
## ..$ : int [1:17] 3 13 2 12 14 5 1 10 11 4 ...
## ..$ : int [1:18] 3 13 2 12 14 5 1 10 11 4 ...
## ..$ : int [1:19] 3 13 2 12 14 5 1 10 11 4 ...
## ..$ : int [1:20] 3 13 2 12 14 5 1 10 11 4 ...
## ..$ : int [1:21] 3 13 2 12 14 5 1 10 11 4 ...
## ..$ : int [1:22] 3 13 2 12 14 5 1 10 11 4 ...
## ..$ : int [1:23] 3 13 2 12 14 5 1 10 11 4 ...
## ..$ : int [1:24] 3 13 2 12 14 5 1 10 11 4 ...
## ..$ : int [1:25] 3 13 2 12 14 5 1 10 11 4 ...
## ..$ : int [1:26] 3 13 2 12 14 5 1 10 11 4 ...
## ..$ : int [1:27] 3 13 2 12 14 5 1 10 11 4 ...
## ..$ : int [1:29] 3 13 2 12 14 5 1 10 11 4 ...
## ..$ : int 4
## ..$ : int [1:2] 4 6
## ..$ : int [1:3] 4 6 5
## ..$ : int [1:4] 4 6 5 12
## ..$ : int [1:5] 4 6 5 12 35
## ..$ : int [1:6] 4 6 5 12 35 7
## ..$ : int [1:7] 4 6 5 12 35 7 10
## ..$ : int [1:8] 4 6 5 12 35 7 10 3
## ..$ : int [1:9] 4 6 5 12 35 7 10 3 11
## ..$ : int [1:10] 4 6 5 12 35 7 10 3 11 9
## ..$ : int [1:11] 4 6 5 12 35 7 10 3 11 9 ...
## ..$ : int [1:12] 4 6 5 12 35 7 10 3 11 9 ...
## ..$ : int [1:13] 4 6 5 12 35 7 10 3 11 9 ...
## ..$ : int [1:14] 4 6 5 12 35 7 10 3 11 9 ...
## ..$ : int [1:15] 4 6 5 12 35 7 10 3 11 9 ...
## ..$ : int [1:16] 4 6 5 12 35 7 10 3 11 9 ...
## ..$ : int [1:17] 4 6 5 12 35 7 10 3 11 9 ...
## .. [list output truncated]
## $ shape: num [1:239228] 1 1 1 1 1 1 1 1 1 1 ...
## $ angle: num [1:239228] 90 90 90 90 90 90 90 90 90 90 ...
Assuncao, R.M., Costa, M.A., Tavares, A. and Neto, S.J.F. (2006). Fast detection of arbitrarily shaped disease clusters, Statistics in Medicine, 25, 723-742.
Besag, J. and Newell, J. (1991). The detection of clusters in rare diseases, Journal of the Royal Statistical Society, Series A, 154, 327-333.
Costa, M.A. and Assuncao, R.M. and Kulldorff, M. (2012) Constrained spanning tree algorithms for irregularly-shaped spatial clustering, Computational Statistics & Data Analysis, 56(6), 1771-1783.
Kulldorff, M. (1997) A spatial scan statistic. Communications in Statistics - Theory and Methods, 26(6): 1481-1496,
Kulldorff, M., Huang, L., Pickle, L. and Duczmal, L. (2006) An elliptic spatial scan statistic. Statististics in Medicine, 25:3929-3943.
Neill, D. B. (2012), Fast subset scan for spatial pattern detection. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 74: 337-360.
Patil, G.P. & Taillie, C. Upper level set scan statistic for detecting arbitrarily shaped hotspots. Environmental and Ecological Statistics (2004) 11(2):183-197.
Tango, T. (1995) A class of tests for detecting “general” and “focused” clustering of rare diseases. Statistics in Medicine. 14, 2323-2334.
Tango, T., & Takahashi, K. (2005). A flexibly shaped spatial scan statistic for detecting clusters. International journal of health geographics, 4(1), 11. Kulldorff, M. (1997) A spatial scan statistic. Communications in Statistics – Theory and Methods 26, 1481-1496.
Tango, T. and Takahashi, K. (2012), A flexible spatial scan statistic with a restricted likelihood ratio for detecting disease clusters. Statist. Med., 31: 4207-4218.
Turnbull, Bruce W., Iwano, Eric J., Burnett, William S., Howe, Holly L., Clark, Larry C. (1990). Monitoring for Clusters of Disease: Application to Leukemia Incidence in Upstate New York, American Journal of Epidemiology, 132(supp1):136-143.
Waller, L.A. and Gotway, C.A. (2005). Applied Spatial Statistics for Public Health Data. Hoboken, NJ: Wiley.
Yao, Z., Tang, J., & Zhan, F. B. (2011). Detection of arbitrarily-shaped clusters using a neighbor-expanding approach: A case study on murine typhus in South Texas. International journal of health geographics, 10(1), 1.