2022 – 2023 Acad. Year

Tuesday, 10/04/2022, Time: 11:00am – 12:15pm PT
The role of Preferential Sampling in Spatial and Spatio-temporal Geostatistical Modeling

Alan E. Gelfand, Distinguished Emeritus Professor
Department of Statistical Science, Duke University

Location : Geology Building, Room 4660


The notion of preferential sampling was introduced into the literature in the seminal paper of Diggle et al.(2010). Subsequently, there has been considerable follow up research. A standard illustration arises in geostatistical modeling. Consider the objective of inferring about environmental exposures. If environmental monitors are only placed in locations where environmental levels tend to be high, then interpolation based upon observations from these locations will necessarily produce only high predictions. A remedy lies in suitable spatial design of the locations, e.g., a random or space-filling design for locations over the region of interest is expected to preclude such bias. However, in practice, sampling may be designed in order to learn about areas of high exposure.

While the set of sampling locations may not have been developed randomly, we study it as if it was a realization of a spatial point process. That is, it may be designed/specified in some fashion but not necessarily with the intention of being roughly uniformly distributed over D. Then, the question becomes a stochastic one: is the realization of the responses independent of the realization of the locations? If no, then we have what is called preferential sampling. Importantly, the dependence here is stochastic dependence. Notationally/functionally, the responses are associated with the locations.

Another setting is the case of species distribution modeling with a binary response, presence or absence, recorded at locations. Here, bias can arise when sampling is designed such that ecologists will tend to sample where they expect to find individuals. This setting can be extended to data fusion where we have both presence/absence data and presence-only data. Other potential applications include missing data settings and hedonic modeling for price with property sales. Very recent work explores preferential sampling in the context of multivariate geostatistical modeling.

Fundamental issues are: (i) can we identify the occurrence of a preferential sampling effect, (ii) can we adjust inference in the presence of preferential sampling, and (iii) when can such adjustment improve predictive performance over a customary geostatistical model? We consider these issues in a modeling context and illustrate with application to presence/absence data, to property sales, and to tree data where we observe mean diameter at breast height (MDBH) and trees per hectare (TPH). (This is joint work with Shinichiro Shirota and Lucia Paci.)


Alan E. Gelfand is The James B Duke Professor Emeritus of Statistical Science at Duke University. He is former chair of the Department of Statistical Science (DSS) and enjoys a secondary appointment as Professor of Environmental Science and Policy in the Nicholas School. Author of more than 320 papers (more than 260 since 1990), Gelfand is internationally known for his contributions to applied statistics, Bayesian computation and Bayesian inference. (An article in Science Watch found him to be the tenth most cited mathematical scientist in the world over the period 1991-2001). Gelfand is an Elected Fellow of the American Statistical Association, the Institute of Mathematical Statistics, and the International Society for Bayesian Analysis. He is an Elected Member of the International Statistical Institute. He is a former President of the International Society for Bayesian Analysis and in 2006 he received the Parzen Prize for a lifetime of research contribution to Statistics. In 2012, he was chosen to give the distinguished Mahalanobis lectures. In 2013, he received a Distinguished Achievement Medal from the ASA Section on Statistics in the Environment. In 2019, he received the S.S. Wilks Memorial Award from the American Statistical Association.

Gelfand’s primary research focus for the past twenty five years has been in the area of statistical modeling for spatial and space-time data. Through a collection of more than 150 papers he has advanced methodology, using the Bayesian paradigm, to associate fully model-based inference with spatial and space-time data displays. His chief areas of application include environmental exposure, spatio-temporal ecological processes, and climate dynamics. He has four books in this area, including the successful “Hierarchical Modeling and Analysis for Spatial Data” with Sudipto Banerjee and Brad Carlin (now second edition), “Hierarchical Modeling for Environmental Data; Some Applications and Perspectives” with James Clark, the “Handbook of Spatial Statistics” with Peter Diggle, Montserrat Fuentes, and Peter Guttorp and the “Handbook of Environmental and Ecological Statistics” with Montserrat Fuentes, Jennifer Hoeting, and Richard Smith. In addition, he has a NSF-CBMS monograph with Erin Schliep entitled, “Bayesian Analysis and Computation for Spatial Point Patterns.”