subscribe to arXiv mailings

Large-scale Epidemiological modeling: Scanning for Mosquito-Borne Diseases Spatio-temporal Patterns in Brazil

Authors: Eduardo C. Araujo, Claudia T. Codeço, Sandro Loch, Luã B. Vacaro, Laís P. Freitas, Raquel M. Lana, Leonardo S. Bastos, Iasmim F. de Almeida, Fernanda Valente, Luiz M. Carvalho, Flávio C. Coelho

Abstract: The influence of climate on mosquito-borne diseases like dengue and chikungunya is well-established, but comprehensively tracking long-term spatial and temporal trends across large areas has been hindered by fragmented data and limited analysis tools. This study presents an unprecedented analysis, in terms of breadth, estimating the SIR transmission parameters from incidence data in all 5,570 muni… ▽ More The influence of climate on mosquito-borne diseases like dengue and chikungunya is well-established, but comprehensively tracking long-term spatial and temporal trends across large areas has been hindered by fragmented data and limited analysis tools. This study presents an unprecedented analysis, in terms of breadth, estimating the SIR transmission parameters from incidence data in all 5,570 municipalities in Brazil over 14 years (2010-2023) for both dengue and chikungunya. We describe the Episcanner computational pipeline, developed to estimate these parameters, producing a reusable dataset describing all dengue and chikungunya epidemics that have taken place in this period, in Brazil. The analysis reveals new insights into the climate-epidemic nexus: We identify distinct geographical and temporal patterns of arbovirus disease incidence across Brazil, highlighting how climatic factors like temperature and precipitation influence the timing and intensity of dengue and chikungunya epidemics. The innovative Episcanner tool empowers researchers and public health officials to explore these patterns in detail, facilitating targeted interventions and risk assessments. This research offers a new perspective on the long-term dynamics of climate-driven mosquito-borne diseases and their geographical specificities linked to the effects of global temperature fluctuations such as those captured by the ENSO index. △ Less

Submitted 30 July, 2024; originally announced July 2024.

Comments: Submitted for peer review

arXiv:2112.12101 [pdf, other]

Faster indicators of dengue fever case counts using Google and Twitter

Authors: Giovanni Mizzi, Tobias Preis, Leonardo Soares Bastos, Marcelo Ferreira da Costa Gomes, Claudia Torres Codeço, Helen Susannah Moat

Abstract: Dengue is a major threat to public health in Brazil, the world's sixth biggest country by population, with over 1.5 million cases recorded in 2019 alone. Official data on dengue case counts is delivered incrementally and, for many reasons, often subject to delays of weeks. In contrast, data on dengue-related Google searches and Twitter messages is available in full with no delay. Here, we describe… ▽ More Dengue is a major threat to public health in Brazil, the world's sixth biggest country by population, with over 1.5 million cases recorded in 2019 alone. Official data on dengue case counts is delivered incrementally and, for many reasons, often subject to delays of weeks. In contrast, data on dengue-related Google searches and Twitter messages is available in full with no delay. Here, we describe a model which uses online data to deliver improved weekly estimates of dengue incidence in Rio de Janeiro. We address a key shortcoming of previous online data disease surveillance models by explicitly accounting for the incremental delivery of case count data, to ensure that our approach can be used in practice. We also draw on data from Google Trends and Twitter in tandem, and demonstrate that this leads to slightly better estimates than a model using only one of these data streams alone. Our results provide evidence that online data can be used to improve both the accuracy and precision of rapid estimates of disease incidence, even where the underlying case count data is subject to long and varied delays. △ Less

Submitted 22 December, 2021; originally announced December 2021.

Comments: 25 pages, 7 figures (3 in supplementary information)

arXiv:2101.04253 [pdf]

doi 10.1080/13645579.2022.2031153

Evaluation of Logistic Regression Applied to Respondent-Driven Samples: Simulated and Real Data

Authors: Sandro Sperandei, Leonardo S. Bastos, Marcelo Ribeiro-Alves, Arianne Reis, Francisco I. Bastos

Abstract: Objective: To investigate the impact of different logistic regression estimators applied to RDS samples obtained by simulation and real data. Methods: Four simulated populations were created combining different connectivity models, levels of clusterization and infection processes. Each subject in the population received two attributes, only one of them related to the infection process. From each p… ▽ More Objective: To investigate the impact of different logistic regression estimators applied to RDS samples obtained by simulation and real data. Methods: Four simulated populations were created combining different connectivity models, levels of clusterization and infection processes. Each subject in the population received two attributes, only one of them related to the infection process. From each population, RDS samples with different sizes were obtained. Similarly, RDS samples were obtained from a real-world dataset. Three logistic regression estimators were applied to assess the association between the attributes and the infection status, and subsequently the observed coverage of each was measured. Results: The type of connectivity had more impact on estimators performance than the clusterization level. In simulated datasets, unweighted logistic regression estimators emerged as the best option, although all estimators showed a fairly good performance. In the real dataset, the performance of weighted estimators presented some instabilities, making them a risky option. Conclusion: An unweighted logistic regression estimator is a reliable option to be applied to RDS samples, with similar performance to random samples and, therefore, should be the preferred option. △ Less

Submitted 11 January, 2021; originally announced January 2021.

Comments: 24 pages, 8 figures, 1 table

arXiv:1804.04678 [pdf, other]

Fast approaches for Bayesian estimation of size of hard-to-reach populations using Network Scale-up

Authors: Leonardo S Bastos, Natalia S Paiva, Francisco I Bastos, Daniel A M Villela

Abstract: The Network scale-up method is commonly used to overcome difficulties in estimating the size of hard-to-reach populations. The method uses indirect information based on social network of each participant taken from the general population, but in some applications a fast computational approach would be highly recommended. We propose a Gibbs sampling method and a Monte Carlo approach to sample from… ▽ More The Network scale-up method is commonly used to overcome difficulties in estimating the size of hard-to-reach populations. The method uses indirect information based on social network of each participant taken from the general population, but in some applications a fast computational approach would be highly recommended. We propose a Gibbs sampling method and a Monte Carlo approach to sample from the random degree model. We applied the abovementioned analytical strategies to previous data on heavy drug users from Curitiba, Brazil. △ Less

Submitted 12 April, 2018; originally announced April 2018.

Comments: 13 pages, 1 figure

arXiv:1711.00162 [pdf, other]

Dynamic quantile linear models: a Bayesian approach

Authors: Kelly C. M. Gonçalves, Helio S. Migon, Leonardo S. Bastos

Abstract: A new class of models, named dynamic quantile linear models, is presented. It combines dynamic linear models with distribution free quantile regression producing a robust statistical method. Bayesian inference for dynamic quantile linear models can be performed using an efficient Markov chain Monte Carlo algorithm. A fast sequential procedure suited for high-dimensional predictive modeling applica… ▽ More A new class of models, named dynamic quantile linear models, is presented. It combines dynamic linear models with distribution free quantile regression producing a robust statistical method. Bayesian inference for dynamic quantile linear models can be performed using an efficient Markov chain Monte Carlo algorithm. A fast sequential procedure suited for high-dimensional predictive modeling applications with massive data, in which the generating process is itself changing overtime, is also proposed. The proposed model is evaluated using synthetic and well-known time series data. The model is also applied to predict annual incidence of tuberculosis in Rio de Janeiro state for future years and compared with global strategy targets set by the World Health Organization. △ Less

Submitted 18 February, 2018; v1 submitted 31 October, 2017; originally announced November 2017.

Comments: 38 pages, 13 figures

arXiv:1502.04206 [pdf, other]

doi 10.1214/22-BA1311

Combining probability distributions: Extending the logarithmic pooling approach

Authors: Luiz Max de Carvalho, Daniel A. M. Villela, Flavio Codeco Coelho, Leonardo Soares Bastos

Abstract: Combining distributions is an important issue in decision theory and Bayesian inference. Logarithmic pooling is a popular method to aggregate expert opinions by using a set of weights that reflect the reliability of each information source. However, the resulting pooled distribution depends heavily on set of weights given to each opinion/prior and thus careful consideration must be given to the ch… ▽ More Combining distributions is an important issue in decision theory and Bayesian inference. Logarithmic pooling is a popular method to aggregate expert opinions by using a set of weights that reflect the reliability of each information source. However, the resulting pooled distribution depends heavily on set of weights given to each opinion/prior and thus careful consideration must be given to the choice of weights. In this paper we review and extend the statistical theory of logarithmic pooling, focusing on the assignment of the weights using a hierarchical prior distribution. We explore several statistical applications, such as the estimation of survival probabilities, meta-analysis and Bayesian melding of deterministic models of population growth and epidemics. We show that it is possible learn the weights from data, although identifiability issues may arise for some configurations of priors and data. Furthermore, we show how the hierarchical approach leads to posterior distributions that are able to accommodate prior-data conflict in complex models. △ Less

Submitted 30 December, 2020; v1 submitted 14 February, 2015; originally announced February 2015.

Comments: Massively updated manuscript; submitted for publication

arXiv:1409.6239 [pdf, ps, other]

doi 10.1590/0102-311X00175413

Obtaining adjusted prevalence ratios from logistic regression model in cross-sectional studies

Authors: Leonardo Soares Bastos, Raquel de Vasconcellos Carvalhaes de Oliveira, Luciane de Souza Velasque

Abstract: In the last decades, it has been discussed the use of epidemiological prevalence ratio (PR) rather than odds ratio as a measure of association to be estimated in cross-sectional studies. The main difficulties in use of statistical models for the calculation of PR are convergence problems, availability of adequate tools and strong assumptions. The goal of this study is to illustrate how to estimate… ▽ More In the last decades, it has been discussed the use of epidemiological prevalence ratio (PR) rather than odds ratio as a measure of association to be estimated in cross-sectional studies. The main difficulties in use of statistical models for the calculation of PR are convergence problems, availability of adequate tools and strong assumptions. The goal of this study is to illustrate how to estimate PR and its confidence interval directly from logistic regression estimates. We present three examples and compare the adjusted estimates of PR with the estimates obtained by use of log-binomial, robust Poisson regression and adjusted prevalence odds ratio (POR). The marginal and conditional prevalence ratios estimated from logistic regression showed the following advantages: no numerical instability; simple to implement in a statistical software; and assumes the adequate probability distribution for the outcome. △ Less

Submitted 22 September, 2014; originally announced September 2014.

arXiv:1206.5681 [pdf, other]

Binary regression analysis with network structure of respondent-driven sampling data

Authors: Leonardo S. Bastos, Adriana A. Pinho, Claudia Codeço, Francisco I. Bastos

Abstract: Respondent-driven sampling (RDS) is a procedure to sample from hard-to-reach populations. It has been widely used in several countries, especially in the monitoring of HIV/AIDS and other sexually transmitted infections. Hard-to-reach populations have had a key role in the dynamics of such epidemics and must inform evidence-based initiatives aiming to curb their spread. In this paper, we present a… ▽ More Respondent-driven sampling (RDS) is a procedure to sample from hard-to-reach populations. It has been widely used in several countries, especially in the monitoring of HIV/AIDS and other sexually transmitted infections. Hard-to-reach populations have had a key role in the dynamics of such epidemics and must inform evidence-based initiatives aiming to curb their spread. In this paper, we present a simple test for network dependence for a binary response variable. We estimate the prevalence of the response variable. We also propose a binary regression model taking into account the RDS structure which is included in the model through a latent random effect with a correlation structure. The proposed model is illustrated in a RDS study for HIV and Syphilis in men who have sex with men implemented in Campinas (Brazil). △ Less

Submitted 25 June, 2012; originally announced June 2012.

Showing 1–8 of 8 results for author: Bastos, L S