subscribe to arXiv mailings

Foundation Inference Models for Markov Jump Processes

Authors: David Berghaus, Kostadin Cvejoski, Patrick Seifner, Cesar Ojeda, Ramses J. Sanchez

Abstract: Markov jump processes are continuous-time stochastic processes which describe dynamical systems evolving in discrete state spaces. These processes find wide application in the natural sciences and machine learning, but their inference is known to be far from trivial. In this work we introduce a methodology for zero-shot inference of Markov jump processes (MJPs), on bounded state spaces, from noisy… ▽ More Markov jump processes are continuous-time stochastic processes which describe dynamical systems evolving in discrete state spaces. These processes find wide application in the natural sciences and machine learning, but their inference is known to be far from trivial. In this work we introduce a methodology for zero-shot inference of Markov jump processes (MJPs), on bounded state spaces, from noisy and sparse observations, which consists of two components. First, a broad probability distribution over families of MJPs, as well as over possible observation times and noise mechanisms, with which we simulate a synthetic dataset of hidden MJPs and their noisy observation process. Second, a neural network model that processes subsets of the simulated observations, and that is trained to output the initial condition and rate matrix of the target MJP in a supervised way. We empirically demonstrate that one and the same (pretrained) model can infer, in a zero-shot fashion, hidden MJPs evolving in state spaces of different dimensionalities. Specifically, we infer MJPs which describe (i) discrete flashing ratchet systems, which are a type of Brownian motors, and the conformational dynamics in (ii) molecular simulations, (iii) experimental ion channel data and (iv) simple protein folding models. What is more, we show that our model performs on par with state-of-the-art models which are finetuned to the target datasets. △ Less

Submitted 4 October, 2024; v1 submitted 10 June, 2024; originally announced June 2024.

arXiv:2402.07594 [pdf, other]

Foundational Inference Models for Dynamical Systems

Authors: Patrick Seifner, Kostadin Cvejoski, Antonia Körner, Ramsés J. Sánchez

Abstract: Dynamical systems governed by ordinary differential equations (ODEs) serve as models for a vast number of natural and social phenomena. In this work, we offer a fresh perspective on the classical problem of imputing missing time series data, whose underlying dynamics are assumed to be determined by ODEs. Specifically, we revisit ideas from amortized inference and neural operators, and propose a no… ▽ More Dynamical systems governed by ordinary differential equations (ODEs) serve as models for a vast number of natural and social phenomena. In this work, we offer a fresh perspective on the classical problem of imputing missing time series data, whose underlying dynamics are assumed to be determined by ODEs. Specifically, we revisit ideas from amortized inference and neural operators, and propose a novel supervised learning framework for zero-shot time series imputation, through parametric functions satisfying some (hidden) ODEs. Our proposal consists of two components. First, a broad probability distribution over the space of ODE solutions, observation times and noise mechanisms, with which we generate a large, synthetic dataset of (hidden) ODE solutions, along with their noisy and sparse observations. Second, a neural recognition model that is trained offline, to map the generated time series onto the spaces of initial conditions and time derivatives of the (hidden) ODE solutions, which we then integrate to impute the missing data. We empirically demonstrate that one and the same (pretrained) recognition model can perform zero-shot imputation across 63 distinct time series with missing values, each sampled from widely different dynamical systems. Likewise, we demonstrate that it can perform zero-shot imputation of missing high-dimensional data in 10 vastly different settings, spanning human motion, air quality, traffic and electricity studies, as well as Navier-Stokes simulations -- without requiring any fine-tuning. What is more, our proposal often outperforms state-of-the-art methods, which are trained on the target datasets. Our pretrained model will be available online soon. △ Less

Submitted 4 October, 2024; v1 submitted 12 February, 2024; originally announced February 2024.

arXiv:2305.19744 [pdf, other]

Neural Markov Jump Processes

Authors: Patrick Seifner, Ramses J. Sanchez

Abstract: Markov jump processes are continuous-time stochastic processes with a wide range of applications in both natural and social sciences. Despite their widespread use, inference in these models is highly non-trivial and typically proceeds via either Monte Carlo or expectation-maximization methods. In this work we introduce an alternative, variational inference algorithm for Markov jump processes which… ▽ More Markov jump processes are continuous-time stochastic processes with a wide range of applications in both natural and social sciences. Despite their widespread use, inference in these models is highly non-trivial and typically proceeds via either Monte Carlo or expectation-maximization methods. In this work we introduce an alternative, variational inference algorithm for Markov jump processes which relies on neural ordinary differential equations, and is trainable via back-propagation. Our methodology learns neural, continuous-time representations of the observed data, that are used to approximate the initial distribution and time-dependent transition probability rates of the posterior Markov jump process. The time-independent rates of the prior process are in contrast trained akin to generative adversarial networks. We test our approach on synthetic data sampled from ground-truth Markov jump processes, experimental switching ion channel data and molecular dynamics simulations. Source code to reproduce our experiments is available online. △ Less

Submitted 31 May, 2023; originally announced May 2023.

arXiv:2301.10988 [pdf, other]

Neural Dynamic Focused Topic Model

Authors: Kostadin Cvejoski, Ramsés J. Sánchez, César Ojeda

Abstract: Topic models and all their variants analyse text by learning meaningful representations through word co-occurrences. As pointed out by Williamson et al. (2010), such models implicitly assume that the probability of a topic to be active and its proportion within each document are positively correlated. This correlation can be strongly detrimental in the case of documents created over time, simply b… ▽ More Topic models and all their variants analyse text by learning meaningful representations through word co-occurrences. As pointed out by Williamson et al. (2010), such models implicitly assume that the probability of a topic to be active and its proportion within each document are positively correlated. This correlation can be strongly detrimental in the case of documents created over time, simply because recent documents are likely better described by new and hence rare topics. In this work we leverage recent advances in neural variational inference and present an alternative neural approach to the dynamic Focused Topic Model. Indeed, we develop a neural model for topic evolution which exploits sequences of Bernoulli random variables in order to track the appearances of topics, thereby decoupling their activities from their proportions. We evaluate our model on three different datasets (the UN general debates, the collection of NeurIPS papers, and the ACL Anthology dataset) and show that it (i) outperforms state-of-the-art topic models in generalization tasks and (ii) performs comparably to them on prediction tasks, while employing roughly the same number of parameters, and converging about two times faster. Source code to reproduce our experiments is available online. △ Less

Submitted 26 January, 2023; originally announced January 2023.

Comments: Accepted at Association for the Advancement of Artificial Intelligence (AAAI2023)

arXiv:2211.00384 [pdf, other]

The future is different: Large pre-trained language models fail in prediction tasks

Authors: Kostadin Cvejoski, Ramsés J. Sánchez, César Ojeda

Abstract: Large pre-trained language models (LPLM) have shown spectacular success when fine-tuned on downstream supervised tasks. Yet, it is known that their performance can drastically drop when there is a distribution shift between the data used during training and that used at inference time. In this paper we focus on data distributions that naturally change over time and introduce four new REDDIT datase… ▽ More Large pre-trained language models (LPLM) have shown spectacular success when fine-tuned on downstream supervised tasks. Yet, it is known that their performance can drastically drop when there is a distribution shift between the data used during training and that used at inference time. In this paper we focus on data distributions that naturally change over time and introduce four new REDDIT datasets, namely the WALLSTREETBETS, ASKSCIENCE, THE DONALD, and POLITICS sub-reddits. First, we empirically demonstrate that LPLM can display average performance drops of about 88% (in the best case!) when predicting the popularity of future posts from sub-reddits whose topic distribution changes with time. We then introduce a simple methodology that leverages neural variational dynamic topic models and attention mechanisms to infer temporal language model representations for regression tasks. Our models display performance drops of only about 40% in the worst cases (2% in the best ones) when predicting the popularity of future posts, while using only about 7% of the total number of parameters of LPLM and providing interpretable representations that offer insight into real-world events, like the GameStop short squeeze of 2021 △ Less

Submitted 2 November, 2022; v1 submitted 1 November, 2022; originally announced November 2022.

arXiv:2207.03777 [pdf, other]

Hidden Schema Networks

Authors: Ramsés J. Sánchez, Lukas Conrads, Pascal Welke, Kostadin Cvejoski, César Ojeda

Abstract: Large, pretrained language models infer powerful representations that encode rich semantic and syntactic content, albeit implicitly. In this work we introduce a novel neural language model that enforces, via inductive biases, explicit relational structures which allow for compositionality onto the output representations of pretrained language models. Specifically, the model encodes sentences into… ▽ More Large, pretrained language models infer powerful representations that encode rich semantic and syntactic content, albeit implicitly. In this work we introduce a novel neural language model that enforces, via inductive biases, explicit relational structures which allow for compositionality onto the output representations of pretrained language models. Specifically, the model encodes sentences into sequences of symbols (composed representations), which correspond to the nodes visited by biased random walkers on a global latent graph, and infers the posterior distribution of the latter. We first demonstrate that the model is able to uncover ground-truth graphs from artificially generated datasets of random token sequences. Next, we leverage pretrained BERT and GPT-2 language models as encoder and decoder, respectively, to infer networks of symbols (schemata) from natural language datasets. Our experiments show that (i) the inferred symbols can be interpreted as encoding different aspects of language, as e.g. topics or sentiments, and that (ii) GPT-like models can effectively be conditioned on symbolic representations. Finally, we explore training autoregressive, random walk ``reasoning" models on schema networks inferred from commonsense knowledge databases, and using the sampled paths to enhance the performance of pretrained language models on commonsense If-Then reasoning tasks. △ Less

Submitted 26 May, 2023; v1 submitted 8 July, 2022; originally announced July 2022.

Comments: accepted at ACL 2023

arXiv:2110.14747 [pdf, other]

Dynamic Review-based Recommenders

Authors: Kostadin Cvejoski, Ramses J. Sanchez, Christian Bauckhage, Cesar Ojeda

Abstract: Just as user preferences change with time, item reviews also reflect those same preference changes. In a nutshell, if one is to sequentially incorporate review content knowledge into recommender systems, one is naturally led to dynamical models of text. In the present work we leverage the known power of reviews to enhance rating predictions in a way that (i) respects the causality of review genera… ▽ More Just as user preferences change with time, item reviews also reflect those same preference changes. In a nutshell, if one is to sequentially incorporate review content knowledge into recommender systems, one is naturally led to dynamical models of text. In the present work we leverage the known power of reviews to enhance rating predictions in a way that (i) respects the causality of review generation and (ii) includes, in a bidirectional fashion, the ability of ratings to inform language review models and vice-versa, language representations that help predict ratings end-to-end. Moreover, our representations are time-interval aware and thus yield a continuous-time representation of the dynamics. We provide experiments on real-world datasets and show that our methodology is able to outperform several state-of-the-art models. Source code for all models can be found at [1]. △ Less

Submitted 22 March, 2022; v1 submitted 27 October, 2021; originally announced October 2021.

Comments: 6pages, Published at International Data Science Conference 2021 (iDSC21)

arXiv:2012.05684 [pdf, other]

doi 10.1109/IJCNN48605.2020.9206768

Recurrent Point Review Models

Authors: Kostadin Cvejoski, Ramses J. Sanchez, Bogdan Georgiev, Christian Bauckhage, Cesar Ojeda

Abstract: Deep neural network models represent the state-of-the-art methodologies for natural language processing. Here we build on top of these methodologies to incorporate temporal information and model how to review data changes with time. Specifically, we use the dynamic representations of recurrent point process models, which encode the history of how business or service reviews are received in time, t… ▽ More Deep neural network models represent the state-of-the-art methodologies for natural language processing. Here we build on top of these methodologies to incorporate temporal information and model how to review data changes with time. Specifically, we use the dynamic representations of recurrent point process models, which encode the history of how business or service reviews are received in time, to generate instantaneous language models with improved prediction capabilities. Simultaneously, our methodologies enhance the predictive power of our point process models by incorporating summarized review content representations. We provide recurrent network and temporal convolution solutions for modeling the review content. We deploy our methodologies in the context of recommender systems, effectively characterizing the change in preference and taste of users as time evolves. Source code is available at [1]. △ Less

Submitted 10 December, 2020; originally announced December 2020.

Comments: 8 pages, 6 figures, Published in: 2020 International Joint Conference on Neural Networks (IJCNN)

Journal ref: 2020 International Joint Conference on Neural Networks (IJCNN), Glasgow, United Kingdom, 2020, pp. 1-8

arXiv:1912.04132 [pdf, other]

Recurrent Point Processes for Dynamic Review Models

Authors: Kostadin Cvejoski, Ramses J. Sanchez, Bogdan Georgiev, Jannis Schuecker, Christian Bauckhage, Cesar Ojeda

Abstract: Recent progress in recommender system research has shown the importance of including temporal representations to improve interpretability and performance. Here, we incorporate temporal representations in continuous time via recurrent point process for a dynamical model of reviews. Our goal is to characterize how changes in perception, user interest and seasonal effects affect review text. Recent progress in recommender system research has shown the importance of including temporal representations to improve interpretability and performance. Here, we incorporate temporal representations in continuous time via recurrent point process for a dynamical model of reviews. Our goal is to characterize how changes in perception, user interest and seasonal effects affect review text. △ Less

Submitted 15 January, 2020; v1 submitted 9 December, 2019; originally announced December 2019.

Comments: Presented at the AAAI 2020 Workshop on Interactive and Conversational Recommendation Systems

arXiv:1906.09808 [pdf, ps, other]

Recurrent Adversarial Service Times

Authors: César Ojeda, Kostadin Cvejosky, Ramsés J. Sánchez, Jannis Schuecker, Bogdan Georgiev, Christian Bauckhage

Abstract: Service system dynamics occur at the interplay between customer behaviour and a service provider's response. This kind of dynamics can effectively be modeled within the framework of queuing theory where customers' arrivals are described by point process models. However, these approaches are limited by parametric assumptions as to, for example, inter-event time distributions. In this paper, we addr… ▽ More Service system dynamics occur at the interplay between customer behaviour and a service provider's response. This kind of dynamics can effectively be modeled within the framework of queuing theory where customers' arrivals are described by point process models. However, these approaches are limited by parametric assumptions as to, for example, inter-event time distributions. In this paper, we address these limitations and propose a novel, deep neural network solution to the queuing problem. Our solution combines a recurrent neural network that models the arrival process with a recurrent generative adversarial network which models the service time distribution. We evaluate our methodology on various empirical datasets ranging from internet services (Blockchain, GitHub, Stackoverflow) to mobility service systems (New York taxi cab). △ Less

Submitted 24 June, 2019; originally announced June 2019.

arXiv:1711.11214 [pdf, ps, other]

doi 10.1103/PhysRevB.98.054415

Anomalous and regular transport in spin 1/2 chains: AC conductivity

Authors: Ramsés J. Sánchez, Vipin Kerala Varma, Vadim Oganesyan

Abstract: We study magnetization transport in anisotropic spin-$1/2$ chains governed by the integrable XXZ model with and without integrability-breaking perturbations at high temperatures ($T\to \infty$) using a hybrid approach that combines exact sum-rules with judiciously chosen Ansätze. In the integrable XXZ model we find (i) super-diffusion at the isotropic (Heisenberg) point, with frequency dependent c… ▽ More We study magnetization transport in anisotropic spin-$1/2$ chains governed by the integrable XXZ model with and without integrability-breaking perturbations at high temperatures ($T\to \infty$) using a hybrid approach that combines exact sum-rules with judiciously chosen Ansätze. In the integrable XXZ model we find (i) super-diffusion at the isotropic (Heisenberg) point, with frequency dependent conductivity $ σ'(ω\to 0) \sim |ω|^α$, where $α=-3/7$ in close numerical agreement with recent $t$-DMRG computations; (ii) a continuously drifting exponent from $α=-1^+$ in the XY limit of the model to $α>0$ within the Ising regime; and (iii) a diffusion constant saturating in the XY coupling deep in the Ising limit. We consider two kinds of integrability breaking perturbations --- a simple next-nearest-neighbor spin-flip term ($J_2$) and a three-spin assisted variant ($t_2$), natural in the fermion particle representation of the spin chain. In the first case we discover a remarkable sensitivity of $σ'(ω)$ to the sign of $J_2$, with enhanced low frequency spectral weight and a pronounced upward shift in the magnitude of $α$ for $J_2>0$. Perhaps even more surprising, we find sub-diffusion ($α>0$) over a range of $J_2<0$. By contrast, the effects of the \enquote{fermionic} three-spin perturbation are sign symmetric; this perturbation produces a clearly observable hydrodynamic relaxation. At large strength of the integrability breaking term $J_2\to \pm \infty$ the problem is effectively non-interacting (fermions hopping on odd and even sublattices) and we find $α\to -1$ behavior reminiscent of the XY limit of the XXZ chain. Exact diagonalization studies largely corroborate these findings at mid-frequencies. △ Less

Submitted 5 March, 2018; v1 submitted 29 November, 2017; originally announced November 2017.

Comments: 2-column format, 17 pages, 58 references, 8 figures

Journal ref: Phys. Rev. B 98, 054415 (2018)

arXiv:1704.04273 [pdf, ps, other]

doi 10.1103/PhysRevB.96.245117

Finite-size anomalies of the Drude weight: role of symmetries and ensembles

Authors: Ramsés J. Sánchez, Vipin Kerala Varma

Abstract: We revisit the subtelties of computing the high temperature spin stiffness $D$ of the spin-$1/2$ XXZ chain using exact diagonalization to analyze its dependence on system symmetries and ensemble. Within the canonical ensemble and for states with zero magnetization, we find $D$ vanishes exactly due to spin-inversion symmetry for all but the anisotropies $\tilde Δ_{MN} = \cos(πM /N)$ with $N > M$ an… ▽ More We revisit the subtelties of computing the high temperature spin stiffness $D$ of the spin-$1/2$ XXZ chain using exact diagonalization to analyze its dependence on system symmetries and ensemble. Within the canonical ensemble and for states with zero magnetization, we find $D$ vanishes exactly due to spin-inversion symmetry for all but the anisotropies $\tilde Δ_{MN} = \cos(πM /N)$ with $N > M$ and coprime, provided system sizes $L \ge 2N$, for which states with different spin-inversion signature become degenerate due to the underlying $sl_2$ loop algebra symmetry. All these loop-algebra degenerate states carry finite currents which we conjecture [based on $L$ and anisotropies $\tilde Δ_{MN}$ (with $N<L/2$) available to us] to dominate the grand-canonical ensemble evaluation of $D$ in the thermodynamic limit. Including a magnetic flux not only breaks spin-inversion in the zero magnetization sector but also lifts the loop-algebra degeneracies in all symmetry sectors --- this effect is more pertinent at smaller $Δ$ due to the larger contributions to $D$ coming from the low-magnetization sectors which are more sensitive to the system's symmetries. Thus we generically find a finite $D$ for fluxed rings and arbitrary $0<Δ<1$ in both ensembles. In contrast, at the isotropic point and in the gapped phase ($Δ\ge 1$) $D$ is found to vanish in the thermodynamic limit, independent of symmetry or ensemble. Our analysis demonstrates how convergence to the thermodynamic limit within the gapless phase ($Δ< 1$) may be accelerated and the finite-size anomalies overcome: $D$ extrapolates nicely in the thermodynamic limit to either the recently computed lower-bound or the Thermodynamic Bethe Ansatz result provided both spin-inversion is broken and the additional degeneracies at the $\tilde Δ_{MN}$ anisotropies are lifted. △ Less

Submitted 2 December, 2017; v1 submitted 13 April, 2017; originally announced April 2017.

Comments: 9 + epsilon pages, 6 figures, 2 tables; v2 provides stronger support for origin of these anomalies (version accepted in Phys. Rev. B)

Journal ref: Phys. Rev. B 96, 245117 (2017)

arXiv:cs/0111031 [pdf]

Large-Scale Corba-Distributed Software Framework for Nif Controls

Authors: Robert W. Carey, Kirby W. Fong, Randy J. Sanchez, Joseph D. Tappero, John P. Woodruff

Abstract: The Integrated Computer Control System (ICCS) is based on a scalable software framework that is distributed over some 325 computers throughout the NIF facility. The framework provides templates and services at multiple levels of abstraction for the construction of software applications that communicate via CORBA (Common Object Request Broker Architecture). Various forms of object-oriented softwa… ▽ More The Integrated Computer Control System (ICCS) is based on a scalable software framework that is distributed over some 325 computers throughout the NIF facility. The framework provides templates and services at multiple levels of abstraction for the construction of software applications that communicate via CORBA (Common Object Request Broker Architecture). Various forms of object-oriented software design patterns are implemented as templates to be extended by application software. Developers extend the framework base classes to model the numerous physical control points, thereby sharing the functionality defined by the base classes. About 56,000 software objects each individually addressed through CORBA are to be created in the complete ICCS. Most objects have a persistent state that is initialized at system start-up and stored in a database. Additional framework services are provided by centralized server programs that implement events, alerts, reservations, message logging, database/file persistence, name services, and process management. The ICCS software framework approach allows for efficient construction of a software system that supports a large number of distributed control points representing a complex control application. △ Less

Submitted 9 November, 2001; originally announced November 2001.

Comments: 5 pages, 0 figures, ICALEPCS '01

Report number: THAI001 ACM Class: C.2.4

Journal ref: eConf C011127 (2001) THAI001

Showing 1–13 of 13 results for author: Sánchez, R J