subscribe to arXiv mailings

Estimating the household secondary attack rate with the Incomplete Chain Binomial model

Authors: Jonas Christoffer Lindstrøm, Terese Bekkevold, Cathinka Halle Julin, Anna Hayman Robertson, Lisbeth Meyer Næss

Abstract: The Secondary Attack Rate (SAR) is a measure of how infectious a communicable disease is, and is often estimated based on studies of disease transmission in households. The Chain Binomial model is a simple model for disease outbreaks, and the final size distribution derived from it can be used to estimate the SAR using simple summary statistics. The final size distribution of the Chain Binomial mo… ▽ More The Secondary Attack Rate (SAR) is a measure of how infectious a communicable disease is, and is often estimated based on studies of disease transmission in households. The Chain Binomial model is a simple model for disease outbreaks, and the final size distribution derived from it can be used to estimate the SAR using simple summary statistics. The final size distribution of the Chain Binomial model assume that the outbreaks have concluded, which in some instances may require long follow-up time. We develop a way to compute the probability distribution of the number of infected before the outbreak has concluded, which we call the Incomplete Chain Binomial distribution. We study a few theoretical properties of the model. We develop Maximum Likelihood estimation routines for inference on the SAR and explore the model by analyzing two real world data sets. △ Less

Submitted 6 March, 2024; originally announced March 2024.

Comments: 16 pages

arXiv:2401.11735 [pdf, other]

doi 10.1145/3658644.3690356

zkLogin: Privacy-Preserving Blockchain Authentication with Existing Credentials

Authors: Foteini Baldimtsi, Konstantinos Kryptos Chalkias, Yan Ji, Jonas Lindstrøm, Deepak Maram, Ben Riva, Arnab Roy, Mahdi Sedaghat, Joy Wang

Abstract: For many users, a private key based wallet serves as the primary entry point to blockchains. Commonly recommended wallet authentication methods, such as mnemonics or hardware wallets, can be cumbersome. This difficulty in user onboarding has significantly hindered the adoption of blockchain-based applications. We develop zkLogin, a novel technique that leverages identity tokens issued by popular… ▽ More For many users, a private key based wallet serves as the primary entry point to blockchains. Commonly recommended wallet authentication methods, such as mnemonics or hardware wallets, can be cumbersome. This difficulty in user onboarding has significantly hindered the adoption of blockchain-based applications. We develop zkLogin, a novel technique that leverages identity tokens issued by popular platforms (any OpenID Connect enabled platform e.g., Google, Facebook, etc.) to authenticate transactions. At the heart of zkLogin lies a signature scheme allowing the signer to sign using their existing OpenID accounts and nothing else. This improves the user experience significantly as users do not need to remember a new secret and can reuse their existing accounts. zkLogin provides strong security and privacy guarantees. Unlike prior works, zkLogin's security relies solely on the underlying platform's authentication mechanism without the need for any additional trusted parties (e.g., trusted hardware or oracles). As the name suggests, zkLogin leverages zero-knowledge proofs (ZKP) to ensure that the sensitive link between a user's off-chain and on-chain identities is hidden, even from the platform itself. zkLogin enables a number of important applications outside blockchains. It allows billions of users to produce \textit{verifiable digital content leveraging their existing digital identities}, e.g., email address. For example, a journalist can use zkLogin to sign a news article with their email address, allowing verification of the article's authorship by any party. We have implemented and deployed zkLogin on the Sui blockchain as an additional alternative to traditional digital signature-based addresses. △ Less

Submitted 27 September, 2024; v1 submitted 22 January, 2024; originally announced January 2024.

Comments: Full version of the CCS paper

arXiv:2206.08728 [pdf, other]

doi 10.1016/j.csda.2022.107558

Iterative importance sampling with Markov chain Monte Carlo sampling in robust Bayesian analysis

Authors: Ivette Raices Cruz, Johan Lindström, Matthias C. M. Troffaes, Ullrika Sahlin

Abstract: Bayesian inference under a set of priors, called robust Bayesian analysis, allows for estimation of parameters within a model and quantification of epistemic uncertainty in quantities of interest by bounded (or imprecise) probability. Iterative importance sampling can be used to estimate bounds on the quantity of interest by optimizing over the set of priors. A method for iterative importance samp… ▽ More Bayesian inference under a set of priors, called robust Bayesian analysis, allows for estimation of parameters within a model and quantification of epistemic uncertainty in quantities of interest by bounded (or imprecise) probability. Iterative importance sampling can be used to estimate bounds on the quantity of interest by optimizing over the set of priors. A method for iterative importance sampling when the robust Bayesian inference rely on Markov chain Monte Carlo (MCMC) sampling is proposed. To accommodate the MCMC sampling in iterative importance sampling, a new expression for the effective sample size of the importance sampling is derived, which accounts for the correlation in the MCMC samples. To illustrate the proposed method for robust Bayesian analysis, iterative importance sampling with MCMC sampling is applied to estimate the lower bound of the overall effect in a previously published meta-analysis with a random effects model. The performance of the method compared to a grid search method and under different degrees of prior-data conflict is also explored. △ Less

Submitted 17 June, 2022; originally announced June 2022.

Comments: 19 pages, 3 figures, 4 tables

MSC Class: 62L20 (Primary); 62P12 (Secondary) ACM Class: G.3

Journal ref: Computational Statistics & Data Analysis 176 (2022) 107558

arXiv:2204.10645 [pdf, other]

doi 10.1002/sim.9422

A robust Bayesian bias-adjusted random effects model for consideration of uncertainty about bias terms in evidence synthesis

Authors: Ivette Raices Cruz, Matthias C. M. Troffaes, Johan Lindström, Ullrika Sahlin

Abstract: Meta-analysis is a statistical method used in evidence synthesis for combining, analyzing and summarizing studies that have the same target endpoint and aims to derive a pooled quantitative estimate using fixed and random effects models or network models. Differences among included studies depend on variations in target populations (i.e. heterogeneity) and variations in study quality due to study… ▽ More Meta-analysis is a statistical method used in evidence synthesis for combining, analyzing and summarizing studies that have the same target endpoint and aims to derive a pooled quantitative estimate using fixed and random effects models or network models. Differences among included studies depend on variations in target populations (i.e. heterogeneity) and variations in study quality due to study design and execution (i.e. bias). The risk of bias is usually assessed qualitatively using critical appraisal, and quantitative bias analysis can be used to evaluate the influence of bias on the quantity of interest. We propose a way to consider ignorance or ambiguity in how to quantify bias terms in a bias analysis by characterizing bias with imprecision (as bounds on probability) and use robust Bayesian analysis to estimate the overall effect. Robust Bayesian analysis is here seen as Bayesian updating performed over a set of coherent probability distributions, where the set emerges from a set of bias terms. We show how the set of bias terms can be specified based on judgments on the relative magnitude of biases (i.e., low, unclear and high risk of bias) in one or several domains of the Cochrane's risk of bias table. For illustration, we apply a robust Bayesian bias-adjusted random effects model to an already published meta-analysis on the effect of Rituximab for rheumatoid arthritis from the Cochrane Database of Systematic Reviews. △ Less

Submitted 22 April, 2022; originally announced April 2022.

Comments: 21 pages, 6 figures

MSC Class: 62P10 ACM Class: G.3

arXiv:2202.10305 [pdf, other]

doi 10.1109/TASC.2022.3154334

Design of a Canted-cosine-theta orbit corrector for the High Luminosity LHC

Authors: K. Pepitone, G. Kirby, R. Ruber, A. Ahl, M. Canale, I. Dugic, L. Gentini, M. Johansson, G. Karlsson, J. Kovacikova, J. Lindström, A. Olsson, M. Olvegård

Abstract: The High Luminosity LHC requires dipole orbit correctors grouped in double aperture magnet assemblies. They provide a field of 3.1 T at 100 A in an aperture of 70 mm. The current standard design is a classical cosine-theta layout made with ribbon cable. However, the electric insulation of the ribbon cable is not radiation-resistant enough to withstand the radiation load expected in the coming year… ▽ More The High Luminosity LHC requires dipole orbit correctors grouped in double aperture magnet assemblies. They provide a field of 3.1 T at 100 A in an aperture of 70 mm. The current standard design is a classical cosine-theta layout made with ribbon cable. However, the electric insulation of the ribbon cable is not radiation-resistant enough to withstand the radiation load expected in the coming years of LHC operation. A new design, based on a radiation-resistant cable with polyimide insulator, that can replace the existing orbit correctors at their end-of-life, is needed. The challenge is to design a magnet that fits directly into the existing positions and that can operate with the same busbars, passive quench protection, and power supplies as existing magnets. We propose a self-protected canted-cosine-theta (CCT) design. We take the opportunity to explore new concepts for the CCT design to produce a cost-effective and high-quality design with a more sustainable use of resources. The new orbit corrector design meets high requirements on the field quality while keeping within the same mechanical volume and maximum excitation current. A collaboration of Swedish universities and Swedish industry has been formed for the development and production of a prototype magnet following a concurrent engineering (CE) methodology to reduce the time needed to produce a functional CCT magnet. The magnet has a 1 m long CCT dipole layout consisting of two coils. The superconductor is a commercially available 0.33 mm wire with polyimide insulation in a 6-around-1 cable. The channels in the coil formers, that determine the CCT layout, allow for 2 x 5 cable layers. A total of 70 windings makes that the coil current can be kept below 100 A. We will present the detailed design and preliminary quench simulations. △ Less

Submitted 21 February, 2022; originally announced February 2022.

Report number: vol32 no 6

Journal ref: IEEE Transactions on Applied Superconductivity - 2022

arXiv:1911.10038 [pdf, ps, other]

Multilingual Culture-Independent Word Analogy Datasets

Authors: Matej Ulčar, Kristiina Vaik, Jessica Lindström, Milda Dailidėnaitė, Marko Robnik-Šikonja

Abstract: In text processing, deep neural networks mostly use word embeddings as an input. Embeddings have to ensure that relations between words are reflected through distances in a high-dimensional numeric space. To compare the quality of different text embeddings, typically, we use benchmark datasets. We present a collection of such datasets for the word analogy task in nine languages: Croatian, English,… ▽ More In text processing, deep neural networks mostly use word embeddings as an input. Embeddings have to ensure that relations between words are reflected through distances in a high-dimensional numeric space. To compare the quality of different text embeddings, typically, we use benchmark datasets. We present a collection of such datasets for the word analogy task in nine languages: Croatian, English, Estonian, Finnish, Latvian, Lithuanian, Russian, Slovenian, and Swedish. We redesigned the original monolingual analogy task to be much more culturally independent and also constructed cross-lingual analogy datasets for the involved languages. We present basic statistics of the created datasets and their initial evaluation using fastText embeddings. △ Less

Submitted 27 March, 2020; v1 submitted 22 November, 2019; originally announced November 2019.

Comments: 7 pages, LREC2020 conference

ACM Class: J.5

Journal ref: Proceedings of the 12th Conference on Language Resources and Evaluation (LREC 2020), pages 4074-4080

arXiv:1910.10993 [pdf, other]

Reconstruction of Past Human land-use from Pollen Data and Anthropogenic land-cover Changes Scenarios

Authors: Behnaz Pirzamanbein, Johan Lindström

Abstract: Accurate maps of past land cover and human land-use are necessary when studying the impact of anthropogenic land-cover changes on climate. Ideally the maps of past land cover would be separated into naturally occurring vegetation and human induced changes, allowing us to quantify the effect of human land-use on past climate. Here we investigate the possibility of combining regional, fossil pollen… ▽ More Accurate maps of past land cover and human land-use are necessary when studying the impact of anthropogenic land-cover changes on climate. Ideally the maps of past land cover would be separated into naturally occurring vegetation and human induced changes, allowing us to quantify the effect of human land-use on past climate. Here we investigate the possibility of combining regional, fossil pollen based, land-cover reconstructions with, population based, estimates of past human land-use. By merging these two datasets and interpolating the pollen based land-cover reconstructions we aim at obtaining maps that provide both past natural land-cover and the anthropogenic land-cover changes. We develop a Bayesian hierarchical model to handle the complex data, using a latent Gaussian Markov random fields (GMRF) for the interpolation. Estimation of the model is based on a block updated Markov chain Monte Carlo (MCMC) algorithm. The sparse precision matrix of the GMRF together with an adaptive Metropolis adjusted Langevin step allows for fast inference. Uncertainties in the land-use predictions are computed from the MCMC posterior samples. The model uses the pollen based observations to reconstruct three composition of land cover; Coniferous forest, Broadleaved forest and Unforested/Open land. The unforested land is then further decomposed into natural and human induced openness by inclusion of the estimates of past human land-use. The model is applied to five time periods - centred around 1900 CE, 1725 CE, 1425 CE, 1000 and, 4000 BCE over Europe. The results suggest pollen based observations can be used to recover past human land-use by adjusting the population based anthropogenic land-cover changes estimates. △ Less

Submitted 24 October, 2019; originally announced October 2019.

Comments: 5 Human land-use maps of Europe (1900 CE, 1725 CE, 1425 CE, 1000 BCE and, 4000 BCE)

arXiv:1812.02771 [pdf, other]

Neural Word Search in Historical Manuscript Collections

Authors: Tomas Wilkinson, Jonas Lindström, Anders Brun

Abstract: We address the problem of segmenting and retrieving word images in collections of historical manuscripts given a text query. This is commonly referred to as "word spotting". To this end, we first propose an end-to-end trainable model based on deep neural networks that we dub Ctrl-F-Net. The model simultaneously generates region proposals and embeds them into a word embedding space, wherein a searc… ▽ More We address the problem of segmenting and retrieving word images in collections of historical manuscripts given a text query. This is commonly referred to as "word spotting". To this end, we first propose an end-to-end trainable model based on deep neural networks that we dub Ctrl-F-Net. The model simultaneously generates region proposals and embeds them into a word embedding space, wherein a search is performed. We further introduce a simplified version called Ctrl-F-Mini. It is faster with similar performance, though it is limited to more easily segmented manuscripts. We evaluate both models on common benchmark datasets and surpass the previous state of the art. Finally, in collaboration with historians, we employ the Ctrl-F-Net to search within a large manuscript collection of over 100 thousand pages, written across two centuries. With only 11 training pages, we enable large scale data collection in manuscript-based historical research. This results in a speed up of data collection and the number of manuscripts processed by orders of magnitude. Given the time consuming manual work required to study old manuscripts in the humanities, quick and robust tools for word spotting has the potential to revolutionise domains like history, religion and language. △ Less

Submitted 31 March, 2020; v1 submitted 6 December, 2018; originally announced December 2018.

Comments: Extension of arXiv:1703.07645. This version adds results on two additional benchmark datasets (Botany and Konzilsprotokolle) and improves the experiment done in section 5.3.1

arXiv:1703.07645 [pdf, other]

Neural Ctrl-F: Segmentation-free Query-by-String Word Spotting in Handwritten Manuscript Collections

Authors: Tomas Wilkinson, Jonas Lindström, Anders Brun

Abstract: In this paper, we approach the problem of segmentation-free query-by-string word spotting for handwritten documents. In other words, we use methods inspired from computer vision and machine learning to search for words in large collections of digitized manuscripts. In particular, we are interested in historical handwritten texts, which are often far more challenging than modern printed documents.… ▽ More In this paper, we approach the problem of segmentation-free query-by-string word spotting for handwritten documents. In other words, we use methods inspired from computer vision and machine learning to search for words in large collections of digitized manuscripts. In particular, we are interested in historical handwritten texts, which are often far more challenging than modern printed documents. This task is important, as it provides people with a way to quickly find what they are looking for in large collections that are tedious and difficult to read manually. To this end, we introduce an end-to-end trainable model based on deep neural networks that we call Ctrl-F-Net. Given a full manuscript page, the model simultaneously generates region proposals, and embeds these into a distributed word embedding space, where searches are performed. We evaluate the model on common benchmarks for handwritten word spotting, outperforming the previous state-of-the-art segmentation-free approaches by a large margin, and in some cases even segmentation-based approaches. One interesting real-life application of our approach is to help historians to find and count specific words in court records that are related to women's sustenance activities and division of labor. We provide promising preliminary experiments that validate our method on this task. △ Less

Submitted 17 August, 2017; v1 submitted 22 March, 2017; originally announced March 2017.

Comments: To appear in ICCV 2017

arXiv:1703.06719 [pdf, other]

Bayesian reconstruction of past land-cover from pollen data: model robustness and sensitivity to auxiliary variables

Authors: Behnaz Pirzamanbein, Anneli Poska, Johan Lindström

Abstract: Realistic depictions of past land cover are needed to investigate prehistoric environmental changes, effects of anthropogenic deforestation, and long term land cover-climate feedbacks. Observation based reconstructions of past land cover are rare and commonly used model based reconstructions exhibit considerable differences. Recently \citet[Spatial Statistics, 24:14--31,][]{PirzaLPG2018_24} develo… ▽ More Realistic depictions of past land cover are needed to investigate prehistoric environmental changes, effects of anthropogenic deforestation, and long term land cover-climate feedbacks. Observation based reconstructions of past land cover are rare and commonly used model based reconstructions exhibit considerable differences. Recently \citet[Spatial Statistics, 24:14--31,][]{PirzaLPG2018_24} developed a statistical interpolation method that produces spatially complete reconstructions of past land cover from pollen assemblage. These reconstructions incorporate a number of auxiliary datasets raising questions regarding the method's sensitivity to different auxiliary datasets. Here the sensitivity of the method is examined by performing spatial reconstructions for northern Europe during three time periods (1900 CE, 1725 CE and 4000 BCE). The auxiliary datasets considered include the most commonly utilized sources of past land-cover data --- e.g.\ estimates produced by a dynamic vegetation (DVM) and anthropogenic land-cover change (ALCC) models. Five different auxiliary datasets were considered, including different climate data driving the DVM and different ALCC models. The resulting reconstructions were also evaluated using cross-validation for all the time periods. For the recent time period, 1900 CE, the different land-cover reconstructions were compared against a present day forest map. The validation confirms that the statistical model provides a robust spatial interpolation tool with low sensitivity to differences in auxiliary data and high capacity to capture information in the pollen based proxy data. Further auxiliary data with high spatial detail improves model performance for areas with complex topography or few observations. △ Less

Submitted 3 November, 2018; v1 submitted 20 March, 2017; originally announced March 2017.

arXiv:1511.06417 [pdf, other]

doi 10.1016/j.spasta.2018.03.005

Modelling Spatial Compositional Data: Reconstructions of past land cover and uncertainties

Authors: Behnaz Pirzamanbein, Johan Lindström, Anneli Poska, Marie-José Gaillard

Abstract: In this paper, we construct a hierarchical model for spatial compositional data, which is used to reconstruct past land-cover compositions (in terms of coniferous forest, broadleaved forest, and unforested/open land) for five time periods during the past $6\,000$ years over Europe. The model consists of a Gaussian Markov Random Field (GMRF) with Dirichlet observations. A block updated Markov chain… ▽ More In this paper, we construct a hierarchical model for spatial compositional data, which is used to reconstruct past land-cover compositions (in terms of coniferous forest, broadleaved forest, and unforested/open land) for five time periods during the past $6\,000$ years over Europe. The model consists of a Gaussian Markov Random Field (GMRF) with Dirichlet observations. A block updated Markov chain Monte Carlo (MCMC), including an adaptive Metropolis adjusted Langevin step, is used to estimate model parameters. The sparse precision matrix in the GMRF provides computational advantages leading to a fast MCMC algorithm. Reconstructions are obtained by combining pollen-based estimates of vegetation cover at a limited number of locations with scenarios of past deforestation and output from a dynamic vegetation model. To evaluate uncertainties in the predictions a novel way of constructing joint confidence regions for the entire composition at each prediction location is proposed. The hierarchical model's ability to reconstruct past land cover is evaluated through cross validation for all time periods, and by comparing reconstructions for the recent past to a present day European forest map. The evaluation results are promising and the model is able to capture known structures in past land-cover compositions. △ Less

Submitted 24 February, 2017; v1 submitted 19 November, 2015; originally announced November 2015.

arXiv:1502.00871 [pdf, ps, other]

doi 10.1214/14-AOAS786

Reduced-rank spatio-temporal modeling of air pollution concentrations in the Multi-Ethnic Study of Atherosclerosis and Air Pollution

Authors: Casey Olives, Lianne Sheppard, Johan Lindström, Paul D. Sampson, Joel D. Kaufman, Adam A. Szpiro

Abstract: There is growing evidence in the epidemiologic literature of the relationship between air pollution and adverse health outcomes. Prediction of individual air pollution exposure in the Environmental Protection Agency (EPA) funded Multi-Ethnic Study of Atheroscelerosis and Air Pollution (MESA Air) study relies on a flexible spatio-temporal prediction model that integrates land-use regression with kr… ▽ More There is growing evidence in the epidemiologic literature of the relationship between air pollution and adverse health outcomes. Prediction of individual air pollution exposure in the Environmental Protection Agency (EPA) funded Multi-Ethnic Study of Atheroscelerosis and Air Pollution (MESA Air) study relies on a flexible spatio-temporal prediction model that integrates land-use regression with kriging to account for spatial dependence in pollutant concentrations. Temporal variability is captured using temporal trends estimated via modified singular value decomposition and temporally varying spatial residuals. This model utilizes monitoring data from existing regulatory networks and supplementary MESA Air monitoring data to predict concentrations for individual cohort members. In general, spatio-temporal models are limited in their efficacy for large data sets due to computational intractability. We develop reduced-rank versions of the MESA Air spatio-temporal model. To do so, we apply low-rank kriging to account for spatial variation in the mean process and discuss the limitations of this approach. As an alternative, we represent spatial variation using thin plate regression splines. We compare the performance of the outlined models using EPA and MESA Air monitoring data for predicting concentrations of oxides of nitrogen (NO$_x$)-a pollutant of primary interest in MESA Air-in the Los Angeles metropolitan area via cross-validated $R^2$. Our findings suggest that use of reduced-rank models can improve computational efficiency in certain cases. Low-rank kriging and thin plate regression splines were competitive across the formulations considered, although TPRS appeared to be more robust in some settings. △ Less

Submitted 3 February, 2015; originally announced February 2015.

Comments: Published in at http://dx.doi.org/10.1214/14-AOAS786 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org)

Report number: IMS-AOAS-AOAS786

Journal ref: Annals of Applied Statistics 2014, Vol. 8, No. 4, 2509-2537

Showing 1–12 of 12 results for author: Lindström, J