subscribe to arXiv mailings

The XLZD Design Book: Towards the Next-Generation Liquid Xenon Observatory for Dark Matter and Neutrino Physics

Authors: XLZD Collaboration, J. Aalbers, K. Abe, M. Adrover, S. Ahmed Maouloud, D. S. Akerib, A. K. Al Musalhi, F. Alder, L. Althueser, D. W. P. Amaral, C. S. Amarasinghe, A. Ames, B. Andrieu, N. Angelides, E. Angelino, B. Antunovic, E. Aprile, H. M. Araújo, J. E. Armstrong, M. Arthurs, M. Babicz, D. Bajpai, A. Baker, M. Balzer, J. Bang , et al. (419 additional authors not shown)

Abstract: This report describes the experimental strategy and technologies for a next-generation xenon observatory sensitive to dark matter and neutrino physics. The detector will have an active liquid xenon target mass of 60-80 tonnes and is proposed by the XENON-LUX-ZEPLIN-DARWIN (XLZD) collaboration. The design is based on the mature liquid xenon time projection chamber technology of the current-generati… ▽ More This report describes the experimental strategy and technologies for a next-generation xenon observatory sensitive to dark matter and neutrino physics. The detector will have an active liquid xenon target mass of 60-80 tonnes and is proposed by the XENON-LUX-ZEPLIN-DARWIN (XLZD) collaboration. The design is based on the mature liquid xenon time projection chamber technology of the current-generation experiments, LZ and XENONnT. A baseline design and opportunities for further optimization of the individual detector components are discussed. The experiment envisaged here has the capability to explore parameter space for Weakly Interacting Massive Particle (WIMP) dark matter down to the neutrino fog, with a 3$σ$ evidence potential for the spin-independent WIMP-nucleon cross sections as low as $3\times10^{-49}\rm cm^2$ (at 40 GeV/c$^2$ WIMP mass). The observatory is also projected to have a 3$σ$ observation potential of neutrinoless double-beta decay of $^{136}$Xe at a half-life of up to $5.7\times 10^{27}$ years. Additionally, it is sensitive to astrophysical neutrinos from the atmosphere, sun, and galactic supernovae. △ Less

Submitted 22 October, 2024; originally announced October 2024.

Comments: 32 pages, 14 figures

arXiv:2410.17036 [pdf, other]

Dark Matter Search Results from 4.2 Tonne-Years of Exposure of the LUX-ZEPLIN (LZ) Experiment

Authors: J. Aalbers, D. S. Akerib, A. K. Al Musalhi, F. Alder, C. S. Amarasinghe, A. Ames, T. J. Anderson, N. Angelides, H. M. Araújo, J. E. Armstrong, M. Arthurs, A. Baker, S. Balashov, J. Bang, J. W. Bargemann, E. E. Barillier, D. Bauer, K. Beattie, T. Benson, A. Bhatti, A. Biekert, T. P. Biesiadzinski, H. J. Birch, E. Bishop, G. M. Blockinger , et al. (193 additional authors not shown)

Abstract: We report results of a search for nuclear recoils induced by weakly interacting massive particle (WIMP) dark matter using the LUX-ZEPLIN (LZ) two-phase xenon time projection chamber. This analysis uses a total exposure of $4.2\pm0.1$ tonne-years from 280 live days of LZ operation, of which $3.3\pm0.1$ tonne-years and 220 live days are new. A technique to actively tag background electronic recoils… ▽ More We report results of a search for nuclear recoils induced by weakly interacting massive particle (WIMP) dark matter using the LUX-ZEPLIN (LZ) two-phase xenon time projection chamber. This analysis uses a total exposure of $4.2\pm0.1$ tonne-years from 280 live days of LZ operation, of which $3.3\pm0.1$ tonne-years and 220 live days are new. A technique to actively tag background electronic recoils from $^{214}$Pb $β$ decays is featured for the first time. Enhanced electron-ion recombination is observed in two-neutrino double electron capture decays of $^{124}$Xe, representing a noteworthy new background. After removal of artificial signal-like events injected into the data set to mitigate analyzer bias, we find no evidence for an excess over expected backgrounds. World-leading constraints are placed on spin-independent (SI) and spin-dependent WIMP-nucleon cross sections for masses $\geq$9 GeV/$c^2$. The strongest SI exclusion set is $2.1\times10^{-48}$ cm$^{2}$ at the 90% confidence level at a mass of 36 GeV/$c^2$, and the best SI median sensitivity achieved is $5.0\times10^{-48}$ cm$^{2}$ for a mass of 40 GeV/$c^2$. △ Less

Submitted 22 October, 2024; originally announced October 2024.

arXiv:2410.07765 [pdf, other]

GameTraversalBenchmark: Evaluating Planning Abilities Of Large Language Models Through Traversing 2D Game Maps

Authors: Muhammad Umair Nasir, Steven James, Julian Togelius

Abstract: Large language models (LLMs) have recently demonstrated great success in generating and understanding natural language. While they have also shown potential beyond the domain of natural language, it remains an open question as to what extent and in which way these LLMs can plan. We investigate their planning capabilities by proposing GameTraversalBenchmark (GTB), a benchmark consisting of diverse… ▽ More Large language models (LLMs) have recently demonstrated great success in generating and understanding natural language. While they have also shown potential beyond the domain of natural language, it remains an open question as to what extent and in which way these LLMs can plan. We investigate their planning capabilities by proposing GameTraversalBenchmark (GTB), a benchmark consisting of diverse 2D grid-based game maps. An LLM succeeds if it can traverse through given objectives, with a minimum number of steps and a minimum number of generation errors. We evaluate a number of LLMs on GTB and found that GPT-4-Turbo achieved the highest score of 44.97% on GTB\_Score (GTBS), a composite score that combines the three above criteria. Furthermore, we preliminarily test large reasoning models, namely o1, which scores $67.84\%$ on GTBS, indicating that the benchmark remains challenging for current models. Code, data, and documentation are available at https://github.com/umair-nasir14/Game-Traversal-Benchmark. △ Less

Submitted 10 October, 2024; originally announced October 2024.

Comments: Accepted at 38th Conference on Neural Information Processing Systems (NeurIPS 2024) Track on Datasets and Benchmarks

arXiv:2410.00755 [pdf, other]

Model-independent searches of new physics in DARWIN with a semi-supervised deep learning pipeline

Authors: J. Aalbers, K. Abe, M. Adrover, S. Ahmed Maouloud, L. Althueser, D. W. P. Amaral, B. Andrieu, E. Angelino, D. Antón Martin, B. Antunovic, E. Aprile, M. Babicz, D. Bajpai, M. Balzer, E. Barberio, L. Baudis, M. Bazyk, N. F. Bell, L. Bellagamba, R. Biondi, Y. Biondi, A. Bismark, C. Boehm, K. Boese, R. Braun , et al. (209 additional authors not shown)

Abstract: We present a novel deep learning pipeline to perform a model-independent, likelihood-free search for anomalous (i.e., non-background) events in the proposed next generation multi-ton scale liquid Xenon-based direct detection experiment, DARWIN. We train an anomaly detector comprising a variational autoencoder and a classifier on extensive, high-dimensional simulated detector response data and cons… ▽ More We present a novel deep learning pipeline to perform a model-independent, likelihood-free search for anomalous (i.e., non-background) events in the proposed next generation multi-ton scale liquid Xenon-based direct detection experiment, DARWIN. We train an anomaly detector comprising a variational autoencoder and a classifier on extensive, high-dimensional simulated detector response data and construct a one-dimensional anomaly score optimised to reject the background only hypothesis in the presence of an excess of non-background-like events. We benchmark the procedure with a sensitivity study that determines its power to reject the background-only hypothesis in the presence of an injected WIMP dark matter signal, outperforming the classical, likelihood-based background rejection test. We show that our neural networks learn relevant energy features of the events from low-level, high-dimensional detector outputs, without the need to compress this data into lower-dimensional observables, thus reducing computational effort and information loss. For the future, our approach lays the foundation for an efficient end-to-end pipeline that eliminates the need for many of the corrections and cuts that are traditionally part of the analysis chain, with the potential of achieving higher accuracy and significant reduction of analysis time. △ Less

Submitted 1 October, 2024; originally announced October 2024.

Comments: 10 Figures, 3 Tables, 23 Pages (incl. references)

arXiv:2408.17391 [pdf, other]

Two-neutrino double electron capture of $^{124}$Xe in the first LUX-ZEPLIN exposure

Authors: J. Aalbers, D. S. Akerib, A. K. Al Musalhi, F. Alder, C. S. Amarasinghe, A. Ames, T. J. Anderson, N. Angelides, H. M. Araújo, J. E. Armstrong, M. Arthurs, A. Baker, S. Balashov, J. Bang, J. W. Bargemann, E. E. Barillier, K. Beattie, A. Bhatti, A. Biekert, T. P. Biesiadzinski, H. J. Birch, E. Bishop, G. M. Blockinger, B. Boxer, C. A. J. Brew , et al. (180 additional authors not shown)

Abstract: The broad physics reach of the LUX-ZEPLIN (LZ) experiment covers rare phenomena beyond the direct detection of dark matter. We report precise measurements of the extremely rare decay of $^{124}$Xe through the process of two-neutrino double electron capture (2$ν$2EC), utilizing a $1.39\,\mathrm{kg} \times \mathrm{yr}$ isotopic exposure from the first LZ science run. A half-life of… ▽ More The broad physics reach of the LUX-ZEPLIN (LZ) experiment covers rare phenomena beyond the direct detection of dark matter. We report precise measurements of the extremely rare decay of $^{124}$Xe through the process of two-neutrino double electron capture (2$ν$2EC), utilizing a $1.39\,\mathrm{kg} \times \mathrm{yr}$ isotopic exposure from the first LZ science run. A half-life of $T_{1/2}^{2\nu2\mathrm{EC}} = (1.09 \pm 0.14_{\text{stat}} \pm 0.05_{\text{sys}}) \times 10^{22}\,\mathrm{yr}$ is observed with a statistical significance of $8.3\,σ$, in agreement with literature. First empirical measurements of the KK capture fraction relative to other K-shell modes were conducted, and demonstrate consistency with respect to recent signal models at the $1.4\,σ$ level. △ Less

Submitted 30 August, 2024; originally announced August 2024.

Comments: 15 pages, 3 figures

arXiv:2408.08697 [pdf, other]

The DAMA/LIBRA signal: an induced modulation effect?

Authors: R. S. James, K. Rule, E. Barberio, V. U. Bashu, L. J. Bignell, I. Bolognino, G. Brooks, S. S. Chhun, F. Dastgiri, A. R. Duffy, M. Froehlich, T. M. A. Fruth, G. Fu, G. C. Hill, K. Janssens, S. Kapoor, G. J. Lane, K. T. Leaver, P. McGee, L. J. McKie, P. C. McNamara, J. McKenzie, W. J. D. Melbourne, M. Mews, L. J. Milligan , et al. (14 additional authors not shown)

Abstract: The persistence of the DAMA/LIBRA (DAMA) modulation over the past two decades has been a source of great contention within the dark matter community. The DAMA collaboration reports a persistent, modulating event rate within their setup of NaI(Tl) scintillating crystals at the INFN Laboratori Nazionali del Gran Sasso (LNGS) underground laboratory. A recent work alluded that this signal could have a… ▽ More The persistence of the DAMA/LIBRA (DAMA) modulation over the past two decades has been a source of great contention within the dark matter community. The DAMA collaboration reports a persistent, modulating event rate within their setup of NaI(Tl) scintillating crystals at the INFN Laboratori Nazionali del Gran Sasso (LNGS) underground laboratory. A recent work alluded that this signal could have arisen due to an analysis artefact, caused by DAMA not accounting for time variation of decaying background radioisotopes in their analysis procedure. In this work, we examine in detail this 'induced modulation' effect, arguing that a number of aspects of the DAMA signal are incompatible with an induced modulation arising from decays of background isotopes over the lifetime of the experiment. Using a toy model of the DAMA/LIBRA experiment, we explore the induced modulation effect under different variations of the activities of the relevant isotopes - namely, $^3$H and $^{210}$Pb - highlighting the various inconsistencies between the resultant toy datasets and the DAMA signal. We stress the importance of the SABRE experiment, whose goal is to unambiguously test for the presence of such a modulating signal in an experiment using the same target material and comparable levels of background. △ Less

Submitted 16 August, 2024; originally announced August 2024.

arXiv:2407.15484 [pdf, other]

6DGS: 6D Pose Estimation from a Single Image and a 3D Gaussian Splatting Model

Authors: Matteo Bortolon, Theodore Tsesmelis, Stuart James, Fabio Poiesi, Alessio Del Bue

Abstract: We propose 6DGS to estimate the camera pose of a target RGB image given a 3D Gaussian Splatting (3DGS) model representing the scene. 6DGS avoids the iterative process typical of analysis-by-synthesis methods (e.g. iNeRF) that also require an initialization of the camera pose in order to converge. Instead, our method estimates a 6DoF pose by inverting the 3DGS rendering process. Starting from the o… ▽ More We propose 6DGS to estimate the camera pose of a target RGB image given a 3D Gaussian Splatting (3DGS) model representing the scene. 6DGS avoids the iterative process typical of analysis-by-synthesis methods (e.g. iNeRF) that also require an initialization of the camera pose in order to converge. Instead, our method estimates a 6DoF pose by inverting the 3DGS rendering process. Starting from the object surface, we define a radiant Ellicell that uniformly generates rays departing from each ellipsoid that parameterize the 3DGS model. Each Ellicell ray is associated with the rendering parameters of each ellipsoid, which in turn is used to obtain the best bindings between the target image pixels and the cast rays. These pixel-ray bindings are then ranked to select the best scoring bundle of rays, which their intersection provides the camera center and, in turn, the camera rotation. The proposed solution obviates the necessity of an "a priori" pose for initialization, and it solves 6DoF pose estimation in closed form, without the need for iterations. Moreover, compared to the existing Novel View Synthesis (NVS) baselines for pose estimation, 6DGS can improve the overall average rotational accuracy by 12% and translation accuracy by 22% on real scenes, despite not requiring any initialization pose. At the same time, our method operates near real-time, reaching 15fps on consumer hardware. △ Less

Submitted 22 July, 2024; originally announced July 2024.

Comments: Project page: https://mbortolon97.github.io/6dgs/ Accepted to ECCV 2024

arXiv:2407.07875 [pdf, other]

Generative Image as Action Models

Authors: Mohit Shridhar, Yat Long Lo, Stephen James

Abstract: Image-generation diffusion models have been fine-tuned to unlock new capabilities such as image-editing and novel view synthesis. Can we similarly unlock image-generation models for visuomotor control? We present GENIMA, a behavior-cloning agent that fine-tunes Stable Diffusion to 'draw joint-actions' as targets on RGB images. These images are fed into a controller that maps the visual targets int… ▽ More Image-generation diffusion models have been fine-tuned to unlock new capabilities such as image-editing and novel view synthesis. Can we similarly unlock image-generation models for visuomotor control? We present GENIMA, a behavior-cloning agent that fine-tunes Stable Diffusion to 'draw joint-actions' as targets on RGB images. These images are fed into a controller that maps the visual targets into a sequence of joint-positions. We study GENIMA on 25 RLBench and 9 real-world manipulation tasks. We find that, by lifting actions into image-space, internet pre-trained diffusion models can generate policies that outperform state-of-the-art visuomotor approaches, especially in robustness to scene perturbations and generalizing to novel objects. Our method is also competitive with 3D agents, despite lacking priors such as depth, keypoints, or motion-planners. △ Less

Submitted 8 October, 2024; v1 submitted 10 July, 2024; originally announced July 2024.

Comments: CoRL 2024. Website, code, checkpoints: https://genima-robot.github.io/

arXiv:2407.07868 [pdf, other]

Green Screen Augmentation Enables Scene Generalisation in Robotic Manipulation

Authors: Eugene Teoh, Sumit Patidar, Xiao Ma, Stephen James

Abstract: Generalising vision-based manipulation policies to novel environments remains a challenging area with limited exploration. Current practices involve collecting data in one location, training imitation learning or reinforcement learning policies with this data, and deploying the policy in the same location. However, this approach lacks scalability as it necessitates data collection in multiple loca… ▽ More Generalising vision-based manipulation policies to novel environments remains a challenging area with limited exploration. Current practices involve collecting data in one location, training imitation learning or reinforcement learning policies with this data, and deploying the policy in the same location. However, this approach lacks scalability as it necessitates data collection in multiple locations for each task. This paper proposes a novel approach where data is collected in a location predominantly featuring green screens. We introduce Green-screen Augmentation (GreenAug), employing a chroma key algorithm to overlay background textures onto a green screen. Through extensive real-world empirical studies with over 850 training demonstrations and 8.2k evaluation episodes, we demonstrate that GreenAug surpasses no augmentation, standard computer vision augmentation, and prior generative augmentation methods in performance. While no algorithmic novelties are claimed, our paper advocates for a fundamental shift in data collection practices. We propose that real-world demonstrations in future research should utilise green screens, followed by the application of GreenAug. We believe GreenAug unlocks policy generalisation to visually distinct novel locations, addressing the current scene generalisation limitations in robot learning. △ Less

Submitted 8 September, 2024; v1 submitted 10 July, 2024; originally announced July 2024.

Comments: Project website: https://greenaug.github.io/

arXiv:2407.07788 [pdf, other]

BiGym: A Demo-Driven Mobile Bi-Manual Manipulation Benchmark

Authors: Nikita Chernyadev, Nicholas Backshall, Xiao Ma, Yunfan Lu, Younggyo Seo, Stephen James

Abstract: We introduce BiGym, a new benchmark and learning environment for mobile bi-manual demo-driven robotic manipulation. BiGym features 40 diverse tasks set in home environments, ranging from simple target reaching to complex kitchen cleaning. To capture the real-world performance accurately, we provide human-collected demonstrations for each task, reflecting the diverse modalities found in real-world… ▽ More We introduce BiGym, a new benchmark and learning environment for mobile bi-manual demo-driven robotic manipulation. BiGym features 40 diverse tasks set in home environments, ranging from simple target reaching to complex kitchen cleaning. To capture the real-world performance accurately, we provide human-collected demonstrations for each task, reflecting the diverse modalities found in real-world robot trajectories. BiGym supports a variety of observations, including proprioceptive data and visual inputs such as RGB, and depth from 3 camera views. To validate the usability of BiGym, we thoroughly benchmark the state-of-the-art imitation learning algorithms and demo-driven reinforcement learning algorithms within the environment and discuss the future opportunities. △ Less

Submitted 11 July, 2024; v1 submitted 10 July, 2024; originally announced July 2024.

Comments: Project webpage: https://chernyadev.github.io/bigym/

arXiv:2407.07787 [pdf, other]

Continuous Control with Coarse-to-fine Reinforcement Learning

Authors: Younggyo Seo, Jafar Uruç, Stephen James

Abstract: Despite recent advances in improving the sample-efficiency of reinforcement learning (RL) algorithms, designing an RL algorithm that can be practically deployed in real-world environments remains a challenge. In this paper, we present Coarse-to-fine Reinforcement Learning (CRL), a framework that trains RL agents to zoom-into a continuous action space in a coarse-to-fine manner, enabling the use of… ▽ More Despite recent advances in improving the sample-efficiency of reinforcement learning (RL) algorithms, designing an RL algorithm that can be practically deployed in real-world environments remains a challenge. In this paper, we present Coarse-to-fine Reinforcement Learning (CRL), a framework that trains RL agents to zoom-into a continuous action space in a coarse-to-fine manner, enabling the use of stable, sample-efficient value-based RL algorithms for fine-grained continuous control tasks. Our key idea is to train agents that output actions by iterating the procedure of (i) discretizing the continuous action space into multiple intervals and (ii) selecting the interval with the highest Q-value to further discretize at the next level. We then introduce a concrete, value-based algorithm within the CRL framework called Coarse-to-fine Q-Network (CQN). Our experiments demonstrate that CQN significantly outperforms RL and behavior cloning baselines on 20 sparsely-rewarded RLBench manipulation tasks with a modest number of environment interactions and expert demonstrations. We also show that CQN robustly learns to solve real-world manipulation tasks within a few minutes of online training. △ Less

Submitted 10 July, 2024; originally announced July 2024.

Comments: Project webpage: https://younggyo.me/cqn/

arXiv:2406.12874 [pdf, other]

doi 10.1088/1748-0221/19/08/P08027

The Design, Implementation, and Performance of the LZ Calibration Systems

Authors: J. Aalbers, D. S. Akerib, A. K. Al Musalhi, F. Alder, C. S. Amarasinghe, A. Ames, T. J. Anderson, N. Angelides, H. M. Araújo, J. E. Armstrong, M. Arthurs, A. Baker, S. Balashov, J. Bang, E. E. Barillier, J. W. Bargemann, K. Beattie, T. Benson, A. Bhatti, A. Biekert, T. P. Biesiadzinski, H. J. Birch, E. Bishop, G. M. Blockinger, B. Boxer , et al. (179 additional authors not shown)

Abstract: LUX-ZEPLIN (LZ) is a tonne-scale experiment searching for direct dark matter interactions and other rare events. It is located at the Sanford Underground Research Facility (SURF) in Lead, South Dakota, USA. The core of the LZ detector is a dual-phase xenon time projection chamber (TPC), designed with the primary goal of detecting Weakly Interacting Massive Particles (WIMPs) via their induced low e… ▽ More LUX-ZEPLIN (LZ) is a tonne-scale experiment searching for direct dark matter interactions and other rare events. It is located at the Sanford Underground Research Facility (SURF) in Lead, South Dakota, USA. The core of the LZ detector is a dual-phase xenon time projection chamber (TPC), designed with the primary goal of detecting Weakly Interacting Massive Particles (WIMPs) via their induced low energy nuclear recoils. Surrounding the TPC, two veto detectors immersed in an ultra-pure water tank enable reducing background events to enhance the discovery potential. Intricate calibration systems are purposely designed to precisely understand the responses of these three detector volumes to various types of particle interactions and to demonstrate LZ's ability to discriminate between signals and backgrounds. In this paper, we present a comprehensive discussion of the key features, requirements, and performance of the LZ calibration systems, which play a crucial role in enabling LZ's WIMP-search and its broad science program. The thorough description of these calibration systems, with an emphasis on their novel aspects, is valuable for future calibration efforts in direct dark matter and other rare-event search experiments. △ Less

Submitted 5 September, 2024; v1 submitted 2 May, 2024; originally announced June 2024.

Journal ref: JINST 19 P08027 (2024)

arXiv:2406.10916 [pdf, other]

M-SET: Multi-Drone Swarm Intelligence Experimentation with Collision Avoidance Realism

Authors: Chuhao Qin, Alexander Robins, Callum Lillywhite-Roake, Adam Pearce, Hritik Mehta, Scott James, Tsz Ho Wong, Evangelos Pournaras

Abstract: Distributed sensing by cooperative drone swarms is crucial for several Smart City applications, such as traffic monitoring and disaster response. Using an indoor lab with inexpensive drones, a testbed supports complex and ambitious studies on these systems while maintaining low cost, rigor, and external validity. This paper introduces the Multi-drone Sensing Experimentation Testbed (M-SET), a nove… ▽ More Distributed sensing by cooperative drone swarms is crucial for several Smart City applications, such as traffic monitoring and disaster response. Using an indoor lab with inexpensive drones, a testbed supports complex and ambitious studies on these systems while maintaining low cost, rigor, and external validity. This paper introduces the Multi-drone Sensing Experimentation Testbed (M-SET), a novel platform designed to prototype, develop, test, and evaluate distributed sensing with swarm intelligence. M-SET addresses the limitations of existing testbeds that fail to emulate collisions, thus lacking realism in outdoor environments. By integrating a collision avoidance method based on a potential field algorithm, M-SET ensures collision-free navigation and sensing, further optimized via a multi-agent collective learning algorithm. Extensive evaluation demonstrates accurate energy consumption estimation and a low risk of collisions, providing a robust proof-of-concept. New insights show that M-SET has significant potential to support ambitious research with minimal cost, simplicity, and high sensing quality. △ Less

Submitted 16 June, 2024; originally announced June 2024.

Comments: 7 pages, 7 figures. This work has been submitted to the IEEE conferenece

arXiv:2406.04144 [pdf, other]

Redundancy-aware Action Spaces for Robot Learning

Authors: Pietro Mazzaglia, Nicholas Backshall, Xiao Ma, Stephen James

Abstract: Joint space and task space control are the two dominant action modes for controlling robot arms within the robot learning literature. Actions in joint space provide precise control over the robot's pose, but tend to suffer from inefficient training; actions in task space boast data-efficient training but sacrifice the ability to perform tasks in confined spaces due to limited control over the full… ▽ More Joint space and task space control are the two dominant action modes for controlling robot arms within the robot learning literature. Actions in joint space provide precise control over the robot's pose, but tend to suffer from inefficient training; actions in task space boast data-efficient training but sacrifice the ability to perform tasks in confined spaces due to limited control over the full joint configuration. This work analyses the criteria for designing action spaces for robot manipulation and introduces ER (End-effector Redundancy), a novel action space formulation that, by addressing the redundancies present in the manipulator, aims to combine the advantages of both joint and task spaces, offering fine-grained comprehensive control with overactuated robot arms whilst achieving highly efficient robot learning. We present two implementations of ER, ERAngle (ERA) and ERJoint (ERJ), and we show that ERJ in particular demonstrates superior performance across multiple settings, especially when precise control over the robot configuration is required. We validate our results both in simulated and real robotic environments. △ Less

Submitted 6 June, 2024; originally announced June 2024.

Comments: Published in the RA-L journal

arXiv:2406.02441 [pdf, other]

doi 10.1038/s42005-024-01774-8

Probing the Scalar WIMP-Pion Coupling with the first LUX-ZEPLIN data

Authors: J. Aalbers, D. S. Akerib, A. K. Al Musalhi, F. Alder, C. S. Amarasinghe, A. Ames, T. J. Anderson, N. Angelides, H. M. Araújo, J. E. Armstrong, M. Arthurs, A. Baker, S. Balashov, J. Bang, E. E. Barillier, J. W. Bargemann, K. Beattie, T. Benson, A. Bhatti, A. Biekert, T. P. Biesiadzinski, H. J. Birch, E. J. Bishop, G. M. Blockinger, B. Boxer , et al. (178 additional authors not shown)

Abstract: Weakly interacting massive particles (WIMPs) may interact with a virtual pion that is exchanged between nucleons. This interaction channel is important to consider in models where the spin-independent isoscalar channel is suppressed. Using data from the first science run of the LUX-ZEPLIN dark matter experiment, containing 60 live days of data in a 5.5~tonne fiducial mass of liquid xenon, we repor… ▽ More Weakly interacting massive particles (WIMPs) may interact with a virtual pion that is exchanged between nucleons. This interaction channel is important to consider in models where the spin-independent isoscalar channel is suppressed. Using data from the first science run of the LUX-ZEPLIN dark matter experiment, containing 60 live days of data in a 5.5~tonne fiducial mass of liquid xenon, we report the results on a search for WIMP-pion interactions. We observe no significant excess and set an upper limit of $1.5\times10^{-46}$~cm$^2$ at a 90\% confidence level for a WIMP mass of 33~GeV/c$^2$ for this interaction. △ Less

Submitted 4 June, 2024; originally announced June 2024.

Journal ref: Commun Phys 7, 292 (2024)

arXiv:2405.18196 [pdf, other]

Render and Diffuse: Aligning Image and Action Spaces for Diffusion-based Behaviour Cloning

Authors: Vitalis Vosylius, Younggyo Seo, Jafar Uruç, Stephen James

Abstract: In the field of Robot Learning, the complex mapping between high-dimensional observations such as RGB images and low-level robotic actions, two inherently very different spaces, constitutes a complex learning problem, especially with limited amounts of data. In this work, we introduce Render and Diffuse (R&D) a method that unifies low-level robot actions and RGB observations within the image space… ▽ More In the field of Robot Learning, the complex mapping between high-dimensional observations such as RGB images and low-level robotic actions, two inherently very different spaces, constitutes a complex learning problem, especially with limited amounts of data. In this work, we introduce Render and Diffuse (R&D) a method that unifies low-level robot actions and RGB observations within the image space using virtual renders of the 3D model of the robot. Using this joint observation-action representation it computes low-level robot actions using a learnt diffusion process that iteratively updates the virtual renders of the robot. This space unification simplifies the learning problem and introduces inductive biases that are crucial for sample efficiency and spatial generalisation. We thoroughly evaluate several variants of R&D in simulation and showcase their applicability on six everyday tasks in the real world. Our results show that R&D exhibits strong spatial generalisation capabilities and is more sample efficient than more common image-to-action methods. △ Less

Submitted 28 May, 2024; originally announced May 2024.

Comments: Robotics: Science and Systems (RSS) 2024. Videos are available on our project webpage at https://vv19.github.io/render-and-diffuse/

arXiv:2405.14732 [pdf, other]

The Data Acquisition System of the LZ Dark Matter Detector: FADR

Authors: J. Aalbers, D. S. Akerib, A. K. Al Musalhi, F. Alder, C. S. Amarasinghe, A. Ames, T. J. Anderson, N. Angelides, H. M. Araújo, J. E. Armstrong, M. Arthurs, A. Baker, S. Balashov, J. Bang, E. E. Barillier, J. W. Bargemann, K. Beattie, T. Benson, A. Bhatti, A. Biekert, T. P. Biesiadzinski, H. J. Birch, E. Bishop, G. M. Blockinger, B. Boxer , et al. (191 additional authors not shown)

Abstract: The Data Acquisition System (DAQ) for the LUX-ZEPLIN (LZ) dark matter detector is described. The signals from 745 PMTs, distributed across three subsystems, are sampled with 100-MHz 32-channel digitizers (DDC-32s). A basic waveform analysis is carried out on the on-board Field Programmable Gate Arrays (FPGAs) to extract information about the observed scintillation and electroluminescence signals.… ▽ More The Data Acquisition System (DAQ) for the LUX-ZEPLIN (LZ) dark matter detector is described. The signals from 745 PMTs, distributed across three subsystems, are sampled with 100-MHz 32-channel digitizers (DDC-32s). A basic waveform analysis is carried out on the on-board Field Programmable Gate Arrays (FPGAs) to extract information about the observed scintillation and electroluminescence signals. This information is used to determine if the digitized waveforms should be preserved for offline analysis. The system is designed around the Kintex-7 FPGA. In addition to digitizing the PMT signals and providing basic event selection in real time, the flexibility provided by the use of FPGAs allows us to monitor the performance of the detector and the DAQ in parallel to normal data acquisition. The hardware and software/firmware of this FPGA-based Architecture for Data acquisition and Realtime monitoring (FADR) are discussed and performance measurements are described. △ Less

Submitted 16 August, 2024; v1 submitted 23 May, 2024; originally announced May 2024.

Comments: 18 pages, 24 figures

arXiv:2405.06686 [pdf, other]

Word2World: Generating Stories and Worlds through Large Language Models

Authors: Muhammad U. Nasir, Steven James, Julian Togelius

Abstract: Large Language Models (LLMs) have proven their worth across a diverse spectrum of disciplines. LLMs have shown great potential in Procedural Content Generation (PCG) as well, but directly generating a level through a pre-trained LLM is still challenging. This work introduces Word2World, a system that enables LLMs to procedurally design playable games through stories, without any task-specific fine… ▽ More Large Language Models (LLMs) have proven their worth across a diverse spectrum of disciplines. LLMs have shown great potential in Procedural Content Generation (PCG) as well, but directly generating a level through a pre-trained LLM is still challenging. This work introduces Word2World, a system that enables LLMs to procedurally design playable games through stories, without any task-specific fine-tuning. Word2World leverages the abilities of LLMs to create diverse content and extract information. Combining these abilities, LLMs can create a story for the game, design narrative, and place tiles in appropriate places to create coherent worlds and playable games. We test Word2World with different LLMs and perform a thorough ablation study to validate each step. We open-source the code at https://github.com/umair-nasir14/Word2World. △ Less

Submitted 6 May, 2024; originally announced May 2024.

arXiv:2404.17666 [pdf, other]

Constraints On Covariant WIMP-Nucleon Effective Field Theory Interactions from the First Science Run of the LUX-ZEPLIN Experiment

Authors: J. Aalbers, D. S. Akerib, A. K. Al Musalhi, F. Alder, C. S. Amarasinghe, A. Ames, T. J. Anderson, N. Angelides, H. M. Araújo, J. E. Armstrong, M. Arthurs, A. Baker, S. Balashov, J. Bang, E. E. Barillier, J. W. Bargemann, K. Beattie, T. Benson, A. Bhatti, A. Biekert, T. P. Biesiadzinski, H. J. Birch, E. J. Bishop, G. M. Blockinger, B. Boxer , et al. (179 additional authors not shown)

Abstract: The first science run of the LUX-ZEPLIN (LZ) experiment, a dual-phase xenon time project chamber operating in the Sanford Underground Research Facility in South Dakota, USA, has reported leading limits on spin-independent WIMP-nucleon interactions and interactions described from a non-relativistic effective field theory (NREFT). Using the same 5.5~t fiducial mass and 60 live days of exposure we re… ▽ More The first science run of the LUX-ZEPLIN (LZ) experiment, a dual-phase xenon time project chamber operating in the Sanford Underground Research Facility in South Dakota, USA, has reported leading limits on spin-independent WIMP-nucleon interactions and interactions described from a non-relativistic effective field theory (NREFT). Using the same 5.5~t fiducial mass and 60 live days of exposure we report on the results of a relativistic extension to the NREFT. We present constraints on couplings from covariant interactions arising from the coupling of vector, axial currents, and electric dipole moments of the nucleon to the magnetic and electric dipole moments of the WIMP which cannot be described by recasting previous results described by an NREFT. Using a profile-likelihood ratio analysis, in an energy region between 0~keV$_\text{nr}$ to 270~keV$_\text{nr}$, we report 90% confidence level exclusion limits on the coupling strength of five interactions in both the isoscalar and isovector bases. △ Less

Submitted 26 April, 2024; originally announced April 2024.

Comments: 7 pages, 4 figures

arXiv:2403.19375 [pdf, other]

Multi-Agent Team Access Monitoring: Environments that Benefit from Target Information Sharing

Authors: Andrew Dudash, Scott James, Ryan Rubel

Abstract: Robotic access monitoring of multiple target areas has applications including checkpoint enforcement, surveillance and containment of fire and flood hazards. Monitoring access for a single target region has been successfully modeled as a minimum-cut problem. We generalize this model to support multiple target areas using two approaches: iterating on individual targets and examining the collections… ▽ More Robotic access monitoring of multiple target areas has applications including checkpoint enforcement, surveillance and containment of fire and flood hazards. Monitoring access for a single target region has been successfully modeled as a minimum-cut problem. We generalize this model to support multiple target areas using two approaches: iterating on individual targets and examining the collections of targets holistically. Through simulation we measure the performance of each approach on different scenarios. △ Less

Submitted 28 March, 2024; originally announced March 2024.

arXiv:2403.12682 [pdf, other]

IFFNeRF: Initialisation Free and Fast 6DoF pose estimation from a single image and a NeRF model

Authors: Matteo Bortolon, Theodore Tsesmelis, Stuart James, Fabio Poiesi, Alessio Del Bue

Abstract: We introduce IFFNeRF to estimate the six degrees-of-freedom (6DoF) camera pose of a given image, building on the Neural Radiance Fields (NeRF) formulation. IFFNeRF is specifically designed to operate in real-time and eliminates the need for an initial pose guess that is proximate to the sought solution. IFFNeRF utilizes the Metropolis-Hasting algorithm to sample surface points from within the NeRF… ▽ More We introduce IFFNeRF to estimate the six degrees-of-freedom (6DoF) camera pose of a given image, building on the Neural Radiance Fields (NeRF) formulation. IFFNeRF is specifically designed to operate in real-time and eliminates the need for an initial pose guess that is proximate to the sought solution. IFFNeRF utilizes the Metropolis-Hasting algorithm to sample surface points from within the NeRF model. From these sampled points, we cast rays and deduce the color for each ray through pixel-level view synthesis. The camera pose can then be estimated as the solution to a Least Squares problem by selecting correspondences between the query image and the resulting bundle. We facilitate this process through a learned attention mechanism, bridging the query image embedding with the embedding of parameterized rays, thereby matching rays pertinent to the image. Through synthetic and real evaluation settings, we show that our method can improve the angular and translation error accuracy by 80.1% and 67.3%, respectively, compared to iNeRF while performing at 34fps on consumer hardware and not requiring the initial pose guess. △ Less

Submitted 19 March, 2024; originally announced March 2024.

Comments: Accepted ICRA 2024, Project page: https://mbortolon97.github.io/iffnerf/

arXiv:2403.09830 [pdf, other]

Towards the Reusability and Compositionality of Causal Representations

Authors: Davide Talon, Phillip Lippe, Stuart James, Alessio Del Bue, Sara Magliacane

Abstract: Causal Representation Learning (CRL) aims at identifying high-level causal factors and their relationships from high-dimensional observations, e.g., images. While most CRL works focus on learning causal representations in a single environment, in this work we instead propose a first step towards learning causal representations from temporal sequences of images that can be adapted in a new environm… ▽ More Causal Representation Learning (CRL) aims at identifying high-level causal factors and their relationships from high-dimensional observations, e.g., images. While most CRL works focus on learning causal representations in a single environment, in this work we instead propose a first step towards learning causal representations from temporal sequences of images that can be adapted in a new environment, or composed across multiple related environments. In particular, we introduce DECAF, a framework that detects which causal factors can be reused and which need to be adapted from previously learned causal representations. Our approach is based on the availability of intervention targets, that indicate which variables are perturbed at each time step. Experiments on three benchmark datasets show that integrating our framework with four state-of-the-art CRL approaches leads to accurate representations in a new environment with only a few samples. △ Less

Submitted 14 March, 2024; originally announced March 2024.

Comments: Accepted to the 3rd Conference on Causal Learning and Reasoning (CLeaR 2024)

arXiv:2403.08586 [pdf, other]

PRAGO: Differentiable Multi-View Pose Optimization From Objectness Detections

Authors: Matteo Taiana, Matteo Toso, Stuart James, Alessio Del Bue

Abstract: Robustly estimating camera poses from a set of images is a fundamental task which remains challenging for differentiable methods, especially in the case of small and sparse camera pose graphs. To overcome this challenge, we propose Pose-refined Rotation Averaging Graph Optimization (PRAGO). From a set of objectness detections on unordered images, our method reconstructs the rotational pose, and in… ▽ More Robustly estimating camera poses from a set of images is a fundamental task which remains challenging for differentiable methods, especially in the case of small and sparse camera pose graphs. To overcome this challenge, we propose Pose-refined Rotation Averaging Graph Optimization (PRAGO). From a set of objectness detections on unordered images, our method reconstructs the rotational pose, and in turn, the absolute pose, in a differentiable manner benefiting from the optimization of a sequence of geometrical tasks. We show how our objectness pose-refinement module in PRAGO is able to refine the inherent ambiguities in pairwise relative pose estimation without removing edges and avoiding making early decisions on the viability of graph edges. PRAGO then refines the absolute rotations through iterative graph construction, reweighting the graph edges to compute the final rotational pose, which can be converted into absolute poses using translation averaging. We show that PRAGO is able to outperform non-differentiable solvers on small and sparse scenes extracted from 7-Scenes achieving a relative improvement of 21% for rotations while achieving similar translation estimates. △ Less

Submitted 15 March, 2024; v1 submitted 13 March, 2024; originally announced March 2024.

arXiv:2403.03890 [pdf, other]

Hierarchical Diffusion Policy for Kinematics-Aware Multi-Task Robotic Manipulation

Authors: Xiao Ma, Sumit Patidar, Iain Haughton, Stephen James

Abstract: This paper introduces Hierarchical Diffusion Policy (HDP), a hierarchical agent for multi-task robotic manipulation. HDP factorises a manipulation policy into a hierarchical structure: a high-level task-planning agent which predicts a distant next-best end-effector pose (NBP), and a low-level goal-conditioned diffusion policy which generates optimal motion trajectories. The factorised policy repre… ▽ More This paper introduces Hierarchical Diffusion Policy (HDP), a hierarchical agent for multi-task robotic manipulation. HDP factorises a manipulation policy into a hierarchical structure: a high-level task-planning agent which predicts a distant next-best end-effector pose (NBP), and a low-level goal-conditioned diffusion policy which generates optimal motion trajectories. The factorised policy representation allows HDP to tackle both long-horizon task planning while generating fine-grained low-level actions. To generate context-aware motion trajectories while satisfying robot kinematics constraints, we present a novel kinematics-aware goal-conditioned control agent, Robot Kinematics Diffuser (RK-Diffuser). Specifically, RK-Diffuser learns to generate both the end-effector pose and joint position trajectories, and distill the accurate but kinematics-unaware end-effector pose diffuser to the kinematics-aware but less accurate joint position diffuser via differentiable kinematics. Empirically, we show that HDP achieves a significantly higher success rate than the state-of-the-art methods in both simulation and real-world. △ Less

Submitted 6 March, 2024; originally announced March 2024.

Comments: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2024). Videos and code: https://yusufma03.github.io/projects/hdp/

arXiv:2402.08865 [pdf, other]

doi 10.1103/PhysRevD.109.112010

New constraints on ultraheavy dark matter from the LZ experiment

Authors: J. Aalbers, D. S. Akerib, A. K. Al Musalhi, C. S. Amarasinghe, A. Ames, T. J. Anderson, N. Angelides, H. M. Araújo, J. E. Armstrong, M. Arthurs, A. Baker, S. Balashov, J. Bang, J. W. Bargemann, A. Baxter, K. Beattie, T. Benson, A. Bhatti, A. Biekert, T. P. Biesiadzinski, H. J. Birch, E. Bishop, G. M. Blockinger, B. Boxer, C. A. J. Brew , et al. (174 additional authors not shown)

Abstract: Searches for dark matter with liquid xenon time projection chamber experiments have traditionally focused on the region of the parameter space that is characteristic of weakly interacting massive particles, ranging from a few GeV/$c^2$ to a few TeV/$c^2$. Models of dark matter with a mass much heavier than this are well motivated by early production mechanisms different from the standard thermal f… ▽ More Searches for dark matter with liquid xenon time projection chamber experiments have traditionally focused on the region of the parameter space that is characteristic of weakly interacting massive particles, ranging from a few GeV/$c^2$ to a few TeV/$c^2$. Models of dark matter with a mass much heavier than this are well motivated by early production mechanisms different from the standard thermal freeze-out, but they have generally been less explored experimentally. In this work, we present a re-analysis of the first science run (SR1) of the LZ experiment, with an exposure of $0.9$ tonne$\times$year, to search for ultraheavy particle dark matter. The signal topology consists of multiple energy deposits in the active region of the detector forming a straight line, from which the velocity of the incoming particle can be reconstructed on an event-by-event basis. Zero events with this topology were observed after applying the data selection calibrated on a simulated sample of signal-like events. New experimental constraints are derived, which rule out previously unexplored regions of the dark matter parameter space of spin-independent interactions beyond a mass of 10$^{17}$ GeV/$c^2$. △ Less

Submitted 13 February, 2024; originally announced February 2024.

Comments: 9 pages, 7 figures

Journal ref: Phys. Rev. D 109, 112010 (2024)

arXiv:2312.12891 [pdf, other]

MinePlanner: A Benchmark for Long-Horizon Planning in Large Minecraft Worlds

Authors: William Hill, Ireton Liu, Anita De Mello Koch, Damion Harvey, Nishanth Kumar, George Konidaris, Steven James

Abstract: We propose a new benchmark for planning tasks based on the Minecraft game. Our benchmark contains 45 tasks overall, but also provides support for creating both propositional and numeric instances of new Minecraft tasks automatically. We benchmark numeric and propositional planning systems on these tasks, with results demonstrating that state-of-the-art planners are currently incapable of dealing w… ▽ More We propose a new benchmark for planning tasks based on the Minecraft game. Our benchmark contains 45 tasks overall, but also provides support for creating both propositional and numeric instances of new Minecraft tasks automatically. We benchmark numeric and propositional planning systems on these tasks, with results demonstrating that state-of-the-art planners are currently incapable of dealing with many of the challenges advanced by our new benchmark, such as scaling to instances with thousands of objects. Based on these results, we identify areas of improvement for future planners. Our framework is made available at https://github.com/IretonLiu/mine-pddl/. △ Less

Submitted 28 April, 2024; v1 submitted 20 December, 2023; originally announced December 2023.

Comments: Accepted to the 6th ICAPS Workshop on the International Planning Competition (WIPC 2024)

arXiv:2312.11364 [pdf, other]

Counting Reward Automata: Sample Efficient Reinforcement Learning Through the Exploitation of Reward Function Structure

Authors: Tristan Bester, Benjamin Rosman, Steven James, Geraud Nangue Tasse

Abstract: We present counting reward automata-a finite state machine variant capable of modelling any reward function expressible as a formal language. Unlike previous approaches, which are limited to the expression of tasks as regular languages, our framework allows for tasks described by unrestricted grammars. We prove that an agent equipped with such an abstract machine is able to solve a larger set of t… ▽ More We present counting reward automata-a finite state machine variant capable of modelling any reward function expressible as a formal language. Unlike previous approaches, which are limited to the expression of tasks as regular languages, our framework allows for tasks described by unrestricted grammars. We prove that an agent equipped with such an abstract machine is able to solve a larger set of tasks than those utilising current approaches. We show that this increase in expressive power does not come at the cost of increased automaton complexity. A selection of learning algorithms are presented which exploit automaton structure to improve sample efficiency. We show that the state machines required in our formulation can be specified from natural language task descriptions using large language models. Empirical results demonstrate that our method outperforms competing approaches in terms of sample efficiency, automaton complexity, and task completion. △ Less

Submitted 16 February, 2024; v1 submitted 18 December, 2023; originally announced December 2023.

Comments: 14 pages, 11 Figures, Published in AAAI W25: Neuro-Symbolic Learning and Reasoning in the era of Large Language Models (NuCLeaR)

ACM Class: I.2; F.4

arXiv:2312.02030 [pdf, other]

doi 10.1103/PhysRevD.109.092003

First Constraints on WIMP-Nucleon Effective Field Theory Couplings in an Extended Energy Region From LUX-ZEPLIN

Authors: LZ Collaboration, J. Aalbers, D. S. Akerib, A. K. Al Musalhi, F. Alder, C. S. Amarasinghe, A. Ames, T. J. Anderson, N. Angelides, H. M. Araújo, J. E. Armstrong, M. Arthurs, A. Baker, S. Balashov, J. Bang, J. W. Bargemann, A. Baxter, K. Beattie, T. Benson, A. Bhatti, A. Biekert, T. P. Biesiadzinski, H. J. Birch, E. Bishop, G. M. Blockinger , et al. (175 additional authors not shown)

Abstract: Following the first science results of the LUX-ZEPLIN (LZ) experiment, a dual-phase xenon time projection chamber operating from the Sanford Underground Research Facility in Lead, South Dakota, USA, we report the initial limits on a model-independent non-relativistic effective field theory describing the complete set of possible interactions of a weakly interacting massive particle (WIMP) with a n… ▽ More Following the first science results of the LUX-ZEPLIN (LZ) experiment, a dual-phase xenon time projection chamber operating from the Sanford Underground Research Facility in Lead, South Dakota, USA, we report the initial limits on a model-independent non-relativistic effective field theory describing the complete set of possible interactions of a weakly interacting massive particle (WIMP) with a nucleon. These results utilize the same 5.5 t fiducial mass and 60 live days of exposure collected for the LZ spin-independent and spin-dependent analyses while extending the upper limit of the energy region of interest by a factor of 7.5 to 270 keVnr. No significant excess in this high energy region is observed. Using a profile-likelihood ratio analysis, we report 90% confidence level exclusion limits on the coupling of each individual non-relativistic WIMP-nucleon operator for both elastic and inelastic interactions in the isoscalar and isovector bases. △ Less

Submitted 26 February, 2024; v1 submitted 4 December, 2023; originally announced December 2023.

Comments: 17 pages 11 figures

Journal ref: Phys. Rev. D 109, 092003 (2024)

arXiv:2310.16686 [pdf, other]

Dynamics Generalisation in Reinforcement Learning via Adaptive Context-Aware Policies

Authors: Michael Beukman, Devon Jarvis, Richard Klein, Steven James, Benjamin Rosman

Abstract: While reinforcement learning has achieved remarkable successes in several domains, its real-world application is limited due to many methods failing to generalise to unfamiliar conditions. In this work, we consider the problem of generalising to new transition dynamics, corresponding to cases in which the environment's response to the agent's actions differs. For example, the gravitational force e… ▽ More While reinforcement learning has achieved remarkable successes in several domains, its real-world application is limited due to many methods failing to generalise to unfamiliar conditions. In this work, we consider the problem of generalising to new transition dynamics, corresponding to cases in which the environment's response to the agent's actions differs. For example, the gravitational force exerted on a robot depends on its mass and changes the robot's mobility. Consequently, in such cases, it is necessary to condition an agent's actions on extrinsic state information and pertinent contextual information reflecting how the environment responds. While the need for context-sensitive policies has been established, the manner in which context is incorporated architecturally has received less attention. Thus, in this work, we present an investigation into how context information should be incorporated into behaviour learning to improve generalisation. To this end, we introduce a neural network architecture, the Decision Adapter, which generates the weights of an adapter module and conditions the behaviour of an agent on the context information. We show that the Decision Adapter is a useful generalisation of a previously proposed architecture and empirically demonstrate that it results in superior generalisation performance compared to previous approaches in several environments. Beyond this, the Decision Adapter is more robust to irrelevant distractor variables than several alternative methods. △ Less

Submitted 25 October, 2023; originally announced October 2023.

Comments: Accepted to NeurIPS 2023

arXiv:2309.13942 [pdf, other]

Speed Co-Augmentation for Unsupervised Audio-Visual Pre-training

Authors: Jiangliu Wang, Jianbo Jiao, Yibing Song, Stephen James, Zhan Tong, Chongjian Ge, Pieter Abbeel, Yun-hui Liu

Abstract: This work aims to improve unsupervised audio-visual pre-training. Inspired by the efficacy of data augmentation in visual contrastive learning, we propose a novel speed co-augmentation method that randomly changes the playback speeds of both audio and video data. Despite its simplicity, the speed co-augmentation method possesses two compelling attributes: (1) it increases the diversity of audio-vi… ▽ More This work aims to improve unsupervised audio-visual pre-training. Inspired by the efficacy of data augmentation in visual contrastive learning, we propose a novel speed co-augmentation method that randomly changes the playback speeds of both audio and video data. Despite its simplicity, the speed co-augmentation method possesses two compelling attributes: (1) it increases the diversity of audio-visual pairs and doubles the size of negative pairs, resulting in a significant enhancement in the learned representations, and (2) it changes the strict correlation between audio-visual pairs but introduces a partial relationship between the augmented pairs, which is modeled by our proposed SoftInfoNCE loss to further boost the performance. Experimental results show that the proposed method significantly improves the learned representations when compared to vanilla audio-visual contrastive learning. △ Less

Submitted 25 September, 2023; originally announced September 2023.

Comments: Published at the CVPR 2023 Sight and Sound workshop

arXiv:2308.16893 [pdf, other]

Language-Conditioned Path Planning

Authors: Amber Xie, Youngwoon Lee, Pieter Abbeel, Stephen James

Abstract: Contact is at the core of robotic manipulation. At times, it is desired (e.g. manipulation and grasping), and at times, it is harmful (e.g. when avoiding obstacles). However, traditional path planning algorithms focus solely on collision-free paths, limiting their applicability in contact-rich tasks. To address this limitation, we propose the domain of Language-Conditioned Path Planning, where con… ▽ More Contact is at the core of robotic manipulation. At times, it is desired (e.g. manipulation and grasping), and at times, it is harmful (e.g. when avoiding obstacles). However, traditional path planning algorithms focus solely on collision-free paths, limiting their applicability in contact-rich tasks. To address this limitation, we propose the domain of Language-Conditioned Path Planning, where contact-awareness is incorporated into the path planning problem. As a first step in this domain, we propose Language-Conditioned Collision Functions (LACO) a novel approach that learns a collision function using only a single-view image, language prompt, and robot configuration. LACO predicts collisions between the robot and the environment, enabling flexible, conditional path planning without the need for manual object annotations, point cloud data, or ground-truth object meshes. In both simulation and the real world, we demonstrate that LACO can facilitate complex, nuanced path plans that allow for interaction with objects that are safe to collide, rather than prohibiting any collision. △ Less

Submitted 31 August, 2023; originally announced August 2023.

Comments: Conference on Robot Learning, 2023

arXiv:2308.12270 [pdf, other]

Language Reward Modulation for Pretraining Reinforcement Learning

Authors: Ademi Adeniji, Amber Xie, Carmelo Sferrazza, Younggyo Seo, Stephen James, Pieter Abbeel

Abstract: Using learned reward functions (LRFs) as a means to solve sparse-reward reinforcement learning (RL) tasks has yielded some steady progress in task-complexity through the years. In this work, we question whether today's LRFs are best-suited as a direct replacement for task rewards. Instead, we propose leveraging the capabilities of LRFs as a pretraining signal for RL. Concretely, we propose… ▽ More Using learned reward functions (LRFs) as a means to solve sparse-reward reinforcement learning (RL) tasks has yielded some steady progress in task-complexity through the years. In this work, we question whether today's LRFs are best-suited as a direct replacement for task rewards. Instead, we propose leveraging the capabilities of LRFs as a pretraining signal for RL. Concretely, we propose $\textbf{LA}$nguage Reward $\textbf{M}$odulated $\textbf{P}$retraining (LAMP) which leverages the zero-shot capabilities of Vision-Language Models (VLMs) as a $\textit{pretraining}$ utility for RL as opposed to a downstream task reward. LAMP uses a frozen, pretrained VLM to scalably generate noisy, albeit shaped exploration rewards by computing the contrastive alignment between a highly diverse collection of language instructions and the image observations of an agent in its pretraining environment. LAMP optimizes these rewards in conjunction with standard novelty-seeking exploration rewards with reinforcement learning to acquire a language-conditioned, pretrained policy. Our VLM pretraining approach, which is a departure from previous attempts to use LRFs, can warmstart sample-efficient learning on robot manipulation tasks in RLBench. △ Less

Submitted 23 August, 2023; originally announced August 2023.

Comments: Code available at https://github.com/ademiadeniji/lamp

arXiv:2307.15753 [pdf, other]

doi 10.1103/PhysRevD.108.072006

A search for new physics in low-energy electron recoils from the first LZ exposure

Authors: The LZ Collaboration, J. Aalbers, D. S. Akerib, A. K. Al Musalhi, F. Alder, C. S. Amarasinghe, A. Ames, T. J. Anderson, N. Angelides, H. M. Araújo, J. E. Armstrong, M. Arthurs, A. Baker, S. Balashov, J. Bang, J. W. Bargemann, A. Baxter, K. Beattie, P. Beltrame, T. Benson, A. Bhatti, A. Biekert, T. P. Biesiadzinski, H. J. Birch, G. M. Blockinger , et al. (178 additional authors not shown)

Abstract: The LUX-ZEPLIN (LZ) experiment is a dark matter detector centered on a dual-phase xenon time projection chamber. We report searches for new physics appearing through few-keV-scale electron recoils, using the experiment's first exposure of 60 live days and a fiducial mass of 5.5t. The data are found to be consistent with a background-only hypothesis, and limits are set on models for new physics inc… ▽ More The LUX-ZEPLIN (LZ) experiment is a dark matter detector centered on a dual-phase xenon time projection chamber. We report searches for new physics appearing through few-keV-scale electron recoils, using the experiment's first exposure of 60 live days and a fiducial mass of 5.5t. The data are found to be consistent with a background-only hypothesis, and limits are set on models for new physics including solar axion electron coupling, solar neutrino magnetic moment and millicharge, and electron couplings to galactic axion-like particles and hidden photons. Similar limits are set on weakly interacting massive particle (WIMP) dark matter producing signals through ionized atomic states from the Migdal effect. △ Less

Submitted 9 September, 2023; v1 submitted 28 July, 2023; originally announced July 2023.

Comments: 13 pages, 10 figures. See https://tinyurl.com/LZDataReleaseRun1ER for a data release related to this paper

Journal ref: Phys. Rev. D 108, 072006 (2023)

arXiv:2306.01102 [pdf, other]

doi 10.1145/3638529.3654017

LLMatic: Neural Architecture Search via Large Language Models and Quality Diversity Optimization

Authors: Muhammad U. Nasir, Sam Earle, Christopher Cleghorn, Steven James, Julian Togelius

Abstract: Large Language Models (LLMs) have emerged as powerful tools capable of accomplishing a broad spectrum of tasks. Their abilities span numerous areas, and one area where they have made a significant impact is in the domain of code generation. Here, we propose using the coding abilities of LLMs to introduce meaningful variations to code defining neural networks. Meanwhile, Quality-Diversity (QD) algo… ▽ More Large Language Models (LLMs) have emerged as powerful tools capable of accomplishing a broad spectrum of tasks. Their abilities span numerous areas, and one area where they have made a significant impact is in the domain of code generation. Here, we propose using the coding abilities of LLMs to introduce meaningful variations to code defining neural networks. Meanwhile, Quality-Diversity (QD) algorithms are known to discover diverse and robust solutions. By merging the code-generating abilities of LLMs with the diversity and robustness of QD solutions, we introduce \texttt{LLMatic}, a Neural Architecture Search (NAS) algorithm. While LLMs struggle to conduct NAS directly through prompts, \texttt{LLMatic} uses a procedural approach, leveraging QD for prompts and network architecture to create diverse and high-performing networks. We test \texttt{LLMatic} on the CIFAR-10 and NAS-bench-201 benchmarks, demonstrating that it can produce competitive networks while evaluating just $2,000$ candidates, even without prior knowledge of the benchmark domain or exposure to any previous top-performing models for the benchmark. The open-sourced code is available in \url{https://github.com/umair-nasir14/LLMatic}. △ Less

Submitted 12 April, 2024; v1 submitted 1 June, 2023; originally announced June 2023.

Comments: Accepted to The Genetic and Evolutionary Computation Conference 2024

arXiv:2306.00035 [pdf, other]

ROSARL: Reward-Only Safe Reinforcement Learning

Authors: Geraud Nangue Tasse, Tamlin Love, Mark Nemecek, Steven James, Benjamin Rosman

Abstract: An important problem in reinforcement learning is designing agents that learn to solve tasks safely in an environment. A common solution is for a human expert to define either a penalty in the reward function or a cost to be minimised when reaching unsafe states. However, this is non-trivial, since too small a penalty may lead to agents that reach unsafe states, while too large a penalty increases… ▽ More An important problem in reinforcement learning is designing agents that learn to solve tasks safely in an environment. A common solution is for a human expert to define either a penalty in the reward function or a cost to be minimised when reaching unsafe states. However, this is non-trivial, since too small a penalty may lead to agents that reach unsafe states, while too large a penalty increases the time to convergence. Additionally, the difficulty in designing reward or cost functions can increase with the complexity of the problem. Hence, for a given environment with a given set of unsafe states, we are interested in finding the upper bound of rewards at unsafe states whose optimal policies minimise the probability of reaching those unsafe states, irrespective of task rewards. We refer to this exact upper bound as the "Minmax penalty", and show that it can be obtained by taking into account both the controllability and diameter of an environment. We provide a simple practical model-free algorithm for an agent to learn this Minmax penalty while learning the task policy, and demonstrate that using it leads to agents that learn safe policies in high-dimensional continuous control environments. △ Less

Submitted 31 May, 2023; originally announced June 2023.

arXiv:2304.06373 [pdf, other]

3DoF Localization from a Single Image and an Object Map: the Flatlandia Problem and Dataset

Authors: Matteo Toso, Matteo Taiana, Stuart James, Alessio Del Bue

Abstract: Efficient visual localization is crucial to many applications, such as large-scale deployment of autonomous agents and augmented reality. Traditional visual localization, while achieving remarkable accuracy, relies on extensive 3D models of the scene or large collections of geolocalized images, which are often inefficient to store and to scale to novel environments. In contrast, humans orient them… ▽ More Efficient visual localization is crucial to many applications, such as large-scale deployment of autonomous agents and augmented reality. Traditional visual localization, while achieving remarkable accuracy, relies on extensive 3D models of the scene or large collections of geolocalized images, which are often inefficient to store and to scale to novel environments. In contrast, humans orient themselves using very abstract 2D maps, using the location of clearly identifiable landmarks. Drawing on this and on the success of recent works that explored localization on 2D abstract maps, we propose Flatlandia, a novel visual localization challenge. With Flatlandia, we investigate whether it is possible to localize a visual query by comparing the layout of its common objects detected against the known spatial layout of objects in the map. We formalize the challenge as two tasks at different levels of accuracy to investigate the problem and its possible limitations; for each, we propose initial baseline models and compare them against state-of-the-art 6DoF and 3DoF methods. Code and dataset are publicly available at github.com/IIT-PAVIS/Flatlandia. △ Less

Submitted 8 November, 2023; v1 submitted 13 April, 2023; originally announced April 2023.

arXiv:2303.11120 [pdf, other]

Positional Diffusion: Ordering Unordered Sets with Diffusion Probabilistic Models

Authors: Francesco Giuliari, Gianluca Scarpellini, Stuart James, Yiming Wang, Alessio Del Bue

Abstract: Positional reasoning is the process of ordering unsorted parts contained in a set into a consistent structure. We present Positional Diffusion, a plug-and-play graph formulation with Diffusion Probabilistic Models to address positional reasoning. We use the forward process to map elements' positions in a set to random positions in a continuous space. Positional Diffusion learns to reverse the nois… ▽ More Positional reasoning is the process of ordering unsorted parts contained in a set into a consistent structure. We present Positional Diffusion, a plug-and-play graph formulation with Diffusion Probabilistic Models to address positional reasoning. We use the forward process to map elements' positions in a set to random positions in a continuous space. Positional Diffusion learns to reverse the noising process and recover the original positions through an Attention-based Graph Neural Network. We conduct extensive experiments with benchmark datasets including two puzzle datasets, three sentence ordering datasets, and one visual storytelling dataset, demonstrating that our method outperforms long-lasting research on puzzle solving with up to +18% compared to the second-best deep learning method, and performs on par against the state-of-the-art methods on sentence ordering and visual storytelling. Our work highlights the suitability of diffusion models for ordering problems and proposes a novel formulation and method for solving various ordering tasks. Project website at https://iit-pavis.github.io/Positional_Diffusion/ △ Less

Submitted 20 March, 2023; originally announced March 2023.

arXiv:2303.05683 [pdf, other]

doi 10.1016/j.fss.2023.108740

Hierarchical clustering with OWA-based linkages, the Lance-Williams formula, and dendrogram inversions

Authors: Marek Gagolewski, Anna Cena, Simon James, Gleb Beliakov

Abstract: Agglomerative hierarchical clustering based on Ordered Weighted Averaging (OWA) operators not only generalises the single, complete, and average linkages, but also includes intercluster distances based on a few nearest or farthest neighbours, trimmed and winsorised means of pairwise point similarities, amongst many others. We explore the relationships between the famous Lance-Williams update formu… ▽ More Agglomerative hierarchical clustering based on Ordered Weighted Averaging (OWA) operators not only generalises the single, complete, and average linkages, but also includes intercluster distances based on a few nearest or farthest neighbours, trimmed and winsorised means of pairwise point similarities, amongst many others. We explore the relationships between the famous Lance-Williams update formula and the extended OWA-based linkages with weights generated via infinite coefficient sequences. Furthermore, we provide some conditions for the weight generators to guarantee the resulting dendrograms to be free from unaesthetic inversions. △ Less

Submitted 25 October, 2023; v1 submitted 9 March, 2023; originally announced March 2023.

Journal ref: Fuzzy Sets and Systems 473, 108740, 2023

arXiv:2302.02408 [pdf, other]

Multi-View Masked World Models for Visual Robotic Manipulation

Authors: Younggyo Seo, Junsu Kim, Stephen James, Kimin Lee, Jinwoo Shin, Pieter Abbeel

Abstract: Visual robotic manipulation research and applications often use multiple cameras, or views, to better perceive the world. How else can we utilize the richness of multi-view data? In this paper, we investigate how to learn good representations with multi-view data and utilize them for visual robotic manipulation. Specifically, we train a multi-view masked autoencoder which reconstructs pixels of ra… ▽ More Visual robotic manipulation research and applications often use multiple cameras, or views, to better perceive the world. How else can we utilize the richness of multi-view data? In this paper, we investigate how to learn good representations with multi-view data and utilize them for visual robotic manipulation. Specifically, we train a multi-view masked autoencoder which reconstructs pixels of randomly masked viewpoints and then learn a world model operating on the representations from the autoencoder. We demonstrate the effectiveness of our method in a range of scenarios, including multi-view control and single-view control with auxiliary cameras for representation learning. We also show that the multi-view masked autoencoder trained with multiple randomized viewpoints enables training a policy with strong viewpoint randomization and transferring the policy to solve real-robot tasks without camera calibration and an adaptation procedure. Video demonstrations are available at: https://sites.google.com/view/mv-mwm. △ Less

Submitted 31 May, 2023; v1 submitted 5 February, 2023; originally announced February 2023.

Comments: Accepted to ICML 2023. First two authors contributed equally. Project webpage: https://sites.google.com/view/mv-mwm

arXiv:2302.01561 [pdf, other]

Hierarchically Composing Level Generators for the Creation of Complex Structures

Authors: Michael Beukman, Manuel Fokam, Marcel Kruger, Guy Axelrod, Muhammad Nasir, Branden Ingram, Benjamin Rosman, Steven James

Abstract: Procedural content generation (PCG) is a growing field, with numerous applications in the video game industry and great potential to help create better games at a fraction of the cost of manual creation. However, much of the work in PCG is focused on generating relatively straightforward levels in simple games, as it is challenging to design an optimisable objective function for complex settings.… ▽ More Procedural content generation (PCG) is a growing field, with numerous applications in the video game industry and great potential to help create better games at a fraction of the cost of manual creation. However, much of the work in PCG is focused on generating relatively straightforward levels in simple games, as it is challenging to design an optimisable objective function for complex settings. This limits the applicability of PCG to more complex and modern titles, hindering its adoption in industry. Our work aims to address this limitation by introducing a compositional level generation method that recursively composes simple low-level generators to construct large and complex creations. This approach allows for easily-optimisable objectives and the ability to design a complex structure in an interpretable way by referencing lower-level components. We empirically demonstrate that our method outperforms a non-compositional baseline by more accurately satisfying a designer's functional requirements in several tasks. Finally, we provide a qualitative showcase (in Minecraft) illustrating the large and complex, but still coherent, structures that were generated using simple base generators. △ Less

Submitted 19 July, 2023; v1 submitted 3 February, 2023; originally announced February 2023.

Comments: Code is available at https://github.com/Michael-Beukman/MCHAMR. This work has been accepted to IEEE Transactions on Games, with copyright transferred to the IEEE

arXiv:2211.17120 [pdf, other]

doi 10.1103/PhysRevD.108.012010

Background Determination for the LUX-ZEPLIN (LZ) Dark Matter Experiment

Authors: J. Aalbers, D. S. Akerib, A. K. Al Musalhi, F. Alder, S. K. Alsum, C. S. Amarasinghe, A. Ames, T. J. Anderson, N. Angelides, H. M. Araújo, J. E. Armstrong, M. Arthurs, A. Baker, J. Bang, J. W. Bargemann, A. Baxter, K. Beattie, P. Beltrame, E. P. Bernard, A. Bhatti, A. Biekert, T. P. Biesiadzinski, H. J. Birch, G. M. Blockinger, B. Boxer , et al. (178 additional authors not shown)

Abstract: The LUX-ZEPLIN experiment recently reported limits on WIMP-nucleus interactions from its initial science run, down to $9.2\times10^{-48}$ cm$^2$ for the spin-independent interaction of a 36 GeV/c$^2$ WIMP at 90% confidence level. In this paper, we present a comprehensive analysis of the backgrounds important for this result and for other upcoming physics analyses, including neutrinoless double-bet… ▽ More The LUX-ZEPLIN experiment recently reported limits on WIMP-nucleus interactions from its initial science run, down to $9.2\times10^{-48}$ cm$^2$ for the spin-independent interaction of a 36 GeV/c$^2$ WIMP at 90% confidence level. In this paper, we present a comprehensive analysis of the backgrounds important for this result and for other upcoming physics analyses, including neutrinoless double-beta decay searches and effective field theory interpretations of LUX-ZEPLIN data. We confirm that the in-situ determinations of bulk and fixed radioactive backgrounds are consistent with expectations from the ex-situ assays. The observed background rate after WIMP search criteria were applied was $(6.3\pm0.5)\times10^{-5}$ events/keV$_{ee}$/kg/day in the low-energy region, approximately 60 times lower than the equivalent rate reported by the LUX experiment. △ Less

Submitted 17 July, 2023; v1 submitted 30 November, 2022; originally announced November 2022.

Comments: 25 pages, 15 figures

Journal ref: Phys. Rev. D 108, 012010 (2023)

arXiv:2211.01644 [pdf, other]

StereoPose: Category-Level 6D Transparent Object Pose Estimation from Stereo Images via Back-View NOCS

Authors: Kai Chen, Stephen James, Congying Sui, Yun-Hui Liu, Pieter Abbeel, Qi Dou

Abstract: Most existing methods for category-level pose estimation rely on object point clouds. However, when considering transparent objects, depth cameras are usually not able to capture meaningful data, resulting in point clouds with severe artifacts. Without a high-quality point cloud, existing methods are not applicable to challenging transparent objects. To tackle this problem, we present StereoPose,… ▽ More Most existing methods for category-level pose estimation rely on object point clouds. However, when considering transparent objects, depth cameras are usually not able to capture meaningful data, resulting in point clouds with severe artifacts. Without a high-quality point cloud, existing methods are not applicable to challenging transparent objects. To tackle this problem, we present StereoPose, a novel stereo image framework for category-level object pose estimation, ideally suited for transparent objects. For a robust estimation from pure stereo images, we develop a pipeline that decouples category-level pose estimation into object size estimation, initial pose estimation, and pose refinement. StereoPose then estimates object pose based on representation in the normalized object coordinate space~(NOCS). To address the issue of image content aliasing, we further define a back-view NOCS map for the transparent object. The back-view NOCS aims to reduce the network learning ambiguity caused by content aliasing, and leverage informative cues on the back of the transparent object for more accurate pose estimation. To further improve the performance of the stereo framework, StereoPose is equipped with a parallax attention module for stereo feature fusion and an epipolar loss for improving the stereo-view consistency of network predictions. Extensive experiments on the public TOD dataset demonstrate the superiority of the proposed StereoPose framework for category-level 6D transparent object pose estimation. △ Less

Submitted 3 November, 2022; originally announced November 2022.

Comments: 7 pages, 6 figures, Project homepage: https://appsrv.cse.cuhk.edu.hk/~kaichen/stereopose.html

arXiv:2210.14721 [pdf, other]

Sim-to-Real via Sim-to-Seg: End-to-end Off-road Autonomous Driving Without Real Data

Authors: John So, Amber Xie, Sunggoo Jung, Jeffrey Edlund, Rohan Thakker, Ali Agha-mohammadi, Pieter Abbeel, Stephen James

Abstract: Autonomous driving is complex, requiring sophisticated 3D scene understanding, localization, mapping, and control. Rather than explicitly modelling and fusing each of these components, we instead consider an end-to-end approach via reinforcement learning (RL). However, collecting exploration driving data in the real world is impractical and dangerous. While training in simulation and deploying vis… ▽ More Autonomous driving is complex, requiring sophisticated 3D scene understanding, localization, mapping, and control. Rather than explicitly modelling and fusing each of these components, we instead consider an end-to-end approach via reinforcement learning (RL). However, collecting exploration driving data in the real world is impractical and dangerous. While training in simulation and deploying visual sim-to-real techniques has worked well for robot manipulation, deploying beyond controlled workspace viewpoints remains a challenge. In this paper, we address this challenge by presenting Sim2Seg, a re-imagining of RCAN that crosses the visual reality gap for off-road autonomous driving, without using any real-world data. This is done by learning to translate randomized simulation images into simulated segmentation and depth maps, subsequently enabling real-world images to also be translated. This allows us to train an end-to-end RL policy in simulation, and directly deploy in the real-world. Our approach, which can be trained in 48 hours on 1 GPU, can perform equally as well as a classical perception and control stack that took thousands of engineering hours over several months to build. We hope this work motivates future end-to-end autonomous driving research. △ Less

Submitted 25 October, 2022; originally announced October 2022.

Comments: CoRL 2022 Paper

arXiv:2210.11442 [pdf, other]

Augmentative Topology Agents For Open-Ended Learning

Authors: Muhammad Umair Nasir, Michael Beukman, Steven James, Christopher Wesley Cleghorn

Abstract: In this work, we tackle the problem of open-ended learning by introducing a method that simultaneously evolves agents and increasingly challenging environments. Unlike previous open-ended approaches that optimize agents using a fixed neural network topology, we hypothesize that generalization can be improved by allowing agents' controllers to become more complex as they encounter more difficult en… ▽ More In this work, we tackle the problem of open-ended learning by introducing a method that simultaneously evolves agents and increasingly challenging environments. Unlike previous open-ended approaches that optimize agents using a fixed neural network topology, we hypothesize that generalization can be improved by allowing agents' controllers to become more complex as they encounter more difficult environments. Our method, Augmentative Topology EPOET (ATEP), extends the Enhanced Paired Open-Ended Trailblazer (EPOET) algorithm by allowing agents to evolve their own neural network structures over time, adding complexity and capacity as necessary. Empirical results demonstrate that ATEP results in general agents capable of solving more environments than a fixed-topology baseline. We also investigate mechanisms for transferring agents between environments and find that a species-based approach further improves the performance and generalization of agents. △ Less

Submitted 11 October, 2023; v1 submitted 20 October, 2022; originally announced October 2022.

Comments: Accepted to The Proceedings of Genetic and Evolutionary Computation Conference (GECCO) 2023

arXiv:2210.03109 [pdf, other]

Real-World Robot Learning with Masked Visual Pre-training

Authors: Ilija Radosavovic, Tete Xiao, Stephen James, Pieter Abbeel, Jitendra Malik, Trevor Darrell

Abstract: In this work, we explore self-supervised visual pre-training on images from diverse, in-the-wild videos for real-world robotic tasks. Like prior work, our visual representations are pre-trained via a masked autoencoder (MAE), frozen, and then passed into a learnable control module. Unlike prior work, we show that the pre-trained representations are effective across a range of real-world robotic ta… ▽ More In this work, we explore self-supervised visual pre-training on images from diverse, in-the-wild videos for real-world robotic tasks. Like prior work, our visual representations are pre-trained via a masked autoencoder (MAE), frozen, and then passed into a learnable control module. Unlike prior work, we show that the pre-trained representations are effective across a range of real-world robotic tasks and embodiments. We find that our encoder consistently outperforms CLIP (up to 75%), supervised ImageNet pre-training (up to 81%), and training from scratch (up to 81%). Finally, we train a 307M parameter vision transformer on a massive collection of 4.5M images from the Internet and egocentric videos, and demonstrate clearly the benefits of scaling visual pre-training for robot learning. △ Less

Submitted 6 October, 2022; originally announced October 2022.

Comments: CoRL 2022; Project page: https://tetexiao.com/projects/real-mvp

arXiv:2210.02396 [pdf, other]

Temporally Consistent Transformers for Video Generation

Authors: Wilson Yan, Danijar Hafner, Stephen James, Pieter Abbeel

Abstract: To generate accurate videos, algorithms have to understand the spatial and temporal dependencies in the world. Current algorithms enable accurate predictions over short horizons but tend to suffer from temporal inconsistencies. When generated content goes out of view and is later revisited, the model invents different content instead. Despite this severe limitation, no established benchmarks on co… ▽ More To generate accurate videos, algorithms have to understand the spatial and temporal dependencies in the world. Current algorithms enable accurate predictions over short horizons but tend to suffer from temporal inconsistencies. When generated content goes out of view and is later revisited, the model invents different content instead. Despite this severe limitation, no established benchmarks on complex data exist for rigorously evaluating video generation with long temporal dependencies. In this paper, we curate 3 challenging video datasets with long-range dependencies by rendering walks through 3D scenes of procedural mazes, Minecraft worlds, and indoor scans. We perform a comprehensive evaluation of current models and observe their limitations in temporal consistency. Moreover, we introduce the Temporally Consistent Transformer (TECO), a generative model that substantially improves long-term consistency while also reducing sampling time. By compressing its input sequence into fewer embeddings, applying a temporal transformer, and expanding back using a spatial MaskGit, TECO outperforms existing models across many metrics. Videos are available on the website: https://wilson1yan.github.io/teco △ Less

Submitted 31 May, 2023; v1 submitted 5 October, 2022; originally announced October 2022.

Comments: Project website: https://wilson1yan.github.io/teco

arXiv:2209.07143 [pdf, other]

HARP: Autoregressive Latent Video Prediction with High-Fidelity Image Generator

Authors: Younggyo Seo, Kimin Lee, Fangchen Liu, Stephen James, Pieter Abbeel

Abstract: Video prediction is an important yet challenging problem; burdened with the tasks of generating future frames and learning environment dynamics. Recently, autoregressive latent video models have proved to be a powerful video prediction tool, by separating the video prediction into two sub-problems: pre-training an image generator model, followed by learning an autoregressive prediction model in th… ▽ More Video prediction is an important yet challenging problem; burdened with the tasks of generating future frames and learning environment dynamics. Recently, autoregressive latent video models have proved to be a powerful video prediction tool, by separating the video prediction into two sub-problems: pre-training an image generator model, followed by learning an autoregressive prediction model in the latent space of the image generator. However, successfully generating high-fidelity and high-resolution videos has yet to be seen. In this work, we investigate how to train an autoregressive latent video prediction model capable of predicting high-fidelity future frames with minimal modification to existing models, and produce high-resolution (256x256) videos. Specifically, we scale up prior models by employing a high-fidelity image generator (VQ-GAN) with a causal transformer model, and introduce additional techniques of top-k sampling and data augmentation to further improve video prediction quality. Despite the simplicity, the proposed method achieves competitive performance to state-of-the-art approaches on standard video prediction benchmarks with fewer parameters, and enables high-resolution video prediction on complex and large-scale datasets. Videos are available at https://sites.google.com/view/harp-videos/home. △ Less

Submitted 15 September, 2022; originally announced September 2022.

Comments: Extended draft of the paper accepted to ICIP 2022 conference

arXiv:2209.03638 [pdf, other]

Geolocation of Cultural Heritage using Multi-View Knowledge Graph Embedding

Authors: Hebatallah A. Mohamed, Sebastiano Vascon, Feliks Hibraj, Stuart James, Diego Pilutti, Alessio Del Bue, Marcello Pelillo

Abstract: Knowledge Graphs (KGs) have proven to be a reliable way of structuring data. They can provide a rich source of contextual information about cultural heritage collections. However, cultural heritage KGs are far from being complete. They are often missing important attributes such as geographical location, especially for sculptures and mobile or indoor entities such as paintings. In this paper, we f… ▽ More Knowledge Graphs (KGs) have proven to be a reliable way of structuring data. They can provide a rich source of contextual information about cultural heritage collections. However, cultural heritage KGs are far from being complete. They are often missing important attributes such as geographical location, especially for sculptures and mobile or indoor entities such as paintings. In this paper, we first present a framework for ingesting knowledge about tangible cultural heritage entities from various data sources and their connected multi-hop knowledge into a geolocalized KG. Secondly, we propose a multi-view learning model for estimating the relative distance between a given pair of cultural heritage entities, based on the geographical as well as the knowledge connections of the entities. △ Less

Submitted 8 September, 2022; originally announced September 2022.

arXiv:2207.14772 [pdf, other]

doi 10.29007/qpkt

Combining Evolutionary Search with Behaviour Cloning for Procedurally Generated Content

Authors: Nicholas Muir, Steven James

Abstract: In this work, we consider the problem of procedural content generation for video game levels. Prior approaches have relied on evolutionary search (ES) methods capable of generating diverse levels, but this generation procedure is slow, which is problematic in real-time settings. Reinforcement learning (RL) has also been proposed to tackle the same problem, and while level generation is fast, train… ▽ More In this work, we consider the problem of procedural content generation for video game levels. Prior approaches have relied on evolutionary search (ES) methods capable of generating diverse levels, but this generation procedure is slow, which is problematic in real-time settings. Reinforcement learning (RL) has also been proposed to tackle the same problem, and while level generation is fast, training time can be prohibitively expensive. We propose a framework to tackle the procedural content generation problem that combines the best of ES and RL. In particular, our approach first uses ES to generate a sequence of levels evolved over time, and then uses behaviour cloning to distil these levels into a policy, which can then be queried to produce new levels quickly. We apply our approach to a maze game and Super Mario Bros, with our results indicating that our approach does in fact decrease the time required for level generation, especially when an increasing number of valid levels are required. △ Less

Submitted 29 July, 2022; originally announced July 2022.

Journal ref: Proceedings of 43rd Conference of the South African Institute of Computer Scientists and Information Technologists, July 2022

arXiv:2207.09445 [pdf, other]

PoserNet: Refining Relative Camera Poses Exploiting Object Detections

Authors: Matteo Taiana, Matteo Toso, Stuart James, Alessio Del Bue

Abstract: The estimation of the camera poses associated with a set of images commonly relies on feature matches between the images. In contrast, we are the first to address this challenge by using objectness regions to guide the pose estimation problem rather than explicit semantic object detections. We propose Pose Refiner Network (PoserNet) a light-weight Graph Neural Network to refine the approximate pai… ▽ More The estimation of the camera poses associated with a set of images commonly relies on feature matches between the images. In contrast, we are the first to address this challenge by using objectness regions to guide the pose estimation problem rather than explicit semantic object detections. We propose Pose Refiner Network (PoserNet) a light-weight Graph Neural Network to refine the approximate pair-wise relative camera poses. PoserNet exploits associations between the objectness regions - concisely expressed as bounding boxes - across multiple views to globally refine sparsely connected view graphs. We evaluate on the 7-Scenes dataset across varied sizes of graphs and show how this process can be beneficial to optimisation-based Motion Averaging algorithms improving the median error on the rotation by 62 degrees with respect to the initial estimates obtained based on bounding boxes. Code and data are available at https://github.com/IIT-PAVIS/PoserNet. △ Less

Submitted 21 July, 2022; v1 submitted 19 July, 2022; originally announced July 2022.

Comments: Accepted at ECCV 2022

Showing 1–50 of 132 results for author: James, S