-
Bayesian Estimation and Tuning-Free Rank Detection for Probability Mass Function Tensors
Authors:
Joseph K. Chege,
Arie Yeredor,
Martin Haardt
Abstract:
Obtaining a reliable estimate of the joint probability mass function (PMF) of a set of random variables from observed data is a significant objective in statistical signal processing and machine learning. Modelling the joint PMF as a tensor that admits a low-rank canonical polyadic decomposition (CPD) has enabled the development of efficient PMF estimation algorithms. However, these algorithms req…
▽ More
Obtaining a reliable estimate of the joint probability mass function (PMF) of a set of random variables from observed data is a significant objective in statistical signal processing and machine learning. Modelling the joint PMF as a tensor that admits a low-rank canonical polyadic decomposition (CPD) has enabled the development of efficient PMF estimation algorithms. However, these algorithms require the rank (model order) of the tensor to be specified beforehand. In real-world applications, the true rank is unknown. Therefore, an appropriate rank is usually selected from a candidate set either by observing validation errors or by computing various likelihood-based information criteria, a procedure which is computationally expensive for large datasets. This paper presents a novel Bayesian framework for estimating the joint PMF and automatically inferring its rank from observed data. We specify a Bayesian PMF estimation model and employ appropriate prior distributions for the model parameters, allowing for tuning-free rank inference via a single training run. We then derive a deterministic solution based on variational inference (VI) to approximate the posterior distributions of various model parameters. Additionally, we develop a scalable version of the VI-based approach by leveraging stochastic variational inference (SVI) to arrive at an efficient algorithm whose complexity scales sublinearly with the size of the dataset. Numerical experiments involving both synthetic data and real movie recommendation data illustrate the advantages of our VI and SVI-based methods in terms of estimation accuracy, automatic rank detection, and computational efficiency.
△ Less
Submitted 8 October, 2024;
originally announced October 2024.
-
Ionospheric contributions to the excess power in high-redshift 21-cm power-spectrum observations with LOFAR
Authors:
S. A. Brackenhoff,
M. Mevius,
L. V. E. Koopmans,
A. Offringa,
E. Ceccotti,
J. K. Chege,
B. K. Gehlot,
S. Ghosh,
C. Höfer,
F. G. Mertens,
S. Munshi,
S. Zaroubi
Abstract:
The turbulent ionosphere causes phase shifts to incoming radio waves on a broad range of temporal and spatial scales. When an interferometer is not sufficiently calibrated for the direction-dependent ionospheric effects, the time-varying phase shifts can cause the signal to decorrelate. The ionosphere's influence over various spatiotemporal scales introduces a baseline-dependent effect on the inte…
▽ More
The turbulent ionosphere causes phase shifts to incoming radio waves on a broad range of temporal and spatial scales. When an interferometer is not sufficiently calibrated for the direction-dependent ionospheric effects, the time-varying phase shifts can cause the signal to decorrelate. The ionosphere's influence over various spatiotemporal scales introduces a baseline-dependent effect on the interferometric array. We study the impact of baseline-dependent decorrelation on high-redshift observations with the Low Frequency Array (LOFAR). Datasets with a range of ionospheric corruptions are simulated using a thin-screen ionosphere model, and calibrated using the state-of-the-art LOFAR Epoch of Reionisation pipeline. For the first time ever, we show the ionospheric impact on various stages of the calibration process including an analysis of the transfer of gain errors from longer to shorter baselines using realistic end-to-end simulations. We find that direction-dependent calibration for source subtraction leaves excess power of up to two orders of magnitude above the thermal noise at the largest spectral scales in the cylindrically averaged auto-power spectrum under normal ionospheric conditions. However, we demonstrate that this excess power can be removed through Gaussian process regression, leaving no excess power above the ten per cent level for a $5~$km diffractive scale. We conclude that ionospheric errors, in the absence of interactions with other aggravating effects, do not constitute a dominant component in the excess power observed in LOFAR Epoch of Reionisation observations of the North Celestial Pole. Future work should therefore focus on less spectrally smooth effects, such as beam modelling errors.
△ Less
Submitted 29 July, 2024;
originally announced July 2024.
-
The impact of lossy data compression on the power spectrum of the high redshift 21-cm signal with LOFAR
Authors:
J. K. Chege,
L. V. E. Koopmans,
A. R. Offringa,
B. K. Gehlot,
S. A. Brackenhoff,
E. Ceccotti,
S. Ghosh,
C. Höfer,
F. G. Mertens,
M. Mevius,
S. Munshi
Abstract:
Current radio interferometers output multi-petabyte-scale volumes of data per year making the storage, transfer, and processing of this data a sizeable challenge. This challenge is expected to grow with the next-generation telescopes such as the Square Kilometre Array. Lossy compression of interferometric data post-correlation can abate this challenge. However, since high-redshift 21-cm studies im…
▽ More
Current radio interferometers output multi-petabyte-scale volumes of data per year making the storage, transfer, and processing of this data a sizeable challenge. This challenge is expected to grow with the next-generation telescopes such as the Square Kilometre Array. Lossy compression of interferometric data post-correlation can abate this challenge. However, since high-redshift 21-cm studies impose strict precision requirements, the impact of such lossy data compression on the 21-cm signal power spectrum statistic should be understood. We apply Dysco visibility compression, a technique to normalize and quantize specifically designed for radio interferometric data. We establish the level of the compression noise in the power spectrum in comparison to the thermal noise as well as its coherency behavior. Finally, for optimal compression results, we compare the compression noise obtained from different compression settings to a nominal 21-cm signal power. From a single night of observation, we find that the noise introduced due to the compression is more than five orders of magnitude lower than the thermal noise level in the power spectrum. The noise does not affect calibration. The compression noise shows no correlation with the sky signal and has no measurable coherent component. The level of compression error in the power spectrum ultimately depends on the compression settings. Dysco visibility compression is found to be of insignificant concern for 21-cm power spectrum studies. Hence, data volumes can be safely reduced by factors of $\sim 4$ and with insignificant bias to the final power spectrum. Data from SKA-low will likely be compressible by the same factor as LOFAR, owing to the similarities of the two instruments. The same technique can be used to compress data from other telescopes, but a small adjustment of the compression parameters might be required.
△ Less
Submitted 16 July, 2024;
originally announced July 2024.
-
Beyond the horizon: Quantifying the full sky foreground wedge in the cylindrical power spectrum
Authors:
S. Munshi,
F. G. Mertens,
L. V. E. Koopmans,
A. R. Offringa,
E. Ceccotti,
S. A. Brackenhoff,
J. K. Chege,
B. K. Gehlot,
S. Ghosh,
C. Höfer,
M. Mevius
Abstract:
One of the main obstacles preventing the detection of the redshifted 21-cm signal from neutral hydrogen in the early Universe is the astrophysical foreground emission, which is several orders of magnitude brighter than the signal. The foregrounds, due to their smooth spectra, are expected to predominantly occupy a region in the cylindrical power spectrum known as the foreground wedge. However, the…
▽ More
One of the main obstacles preventing the detection of the redshifted 21-cm signal from neutral hydrogen in the early Universe is the astrophysical foreground emission, which is several orders of magnitude brighter than the signal. The foregrounds, due to their smooth spectra, are expected to predominantly occupy a region in the cylindrical power spectrum known as the foreground wedge. However, the conventional equations describing the extent of the foreground wedge are derived under a flat-sky approximation. This assumption breaks down for tracking wide-field instruments, thus rendering these equations inapplicable in these situations. In this paper, we derive equations for the full sky foreground wedge and show that the foregrounds can potentially extend far beyond what the conventional equations suggest. We also derive the equations that describe a specific bright source in the cylindrical power spectrum space. The validity of both sets of equations is tested against numerical simulations. Many current and upcoming interferometers (e.g., LOFAR, NenuFAR, MWA, SKA) are wide-field phase-tracking instruments. These equations give us new insights into the nature of foreground contamination in the cylindrical power spectra estimated using wide-field instruments. Additionally, they allow us to accurately associate features in the power spectrum to foregrounds or instrumental effects. The equations are also important for correctly selecting the "EoR window" for foreground avoidance analyses, and for planning 21-cm observations. In future analyses, it is recommended to use these updated horizon lines to indicate the foreground wedge in the cylindrical power spectrum accurately. The new equations for generating the updated wedge lines are made available in a Python library, pslines.
△ Less
Submitted 15 July, 2024;
originally announced July 2024.
-
Optimising MWA EoR data processing for improved 21 cm power spectrum measurements -- fine-tuning ionospheric corrections
Authors:
J. Kariuki Chege,
C. H. Jordan,
C. Lynch,
C. M. Trott,
J. L. B. Line,
B. Pindor,
S. Yoshiura
Abstract:
The redshifted cosmological 21 cm signal emitted by neutral hydrogen during the first billion years of the universe is much fainter relative to other galactic and extragalactic radio emissions, posing a great challenge towards detection of the signal. Therefore, precise instrumental calibration is a vital prerequisite for the success of radio interferometers such as the Murchison Widefield Array (…
▽ More
The redshifted cosmological 21 cm signal emitted by neutral hydrogen during the first billion years of the universe is much fainter relative to other galactic and extragalactic radio emissions, posing a great challenge towards detection of the signal. Therefore, precise instrumental calibration is a vital prerequisite for the success of radio interferometers such as the Murchison Widefield Array (MWA), which aim for a 21 cm detection. Over the previous years, novel calibration techniques targeting the power spectrum paradigm of EoR science have been actively researched and where possible implemented. Using recently acquired computation resources for the MWA, we test the full capabilities of the state-of-the-art calibration techniques available for the MWA EoR project, with a focus on both direction dependent and direction independent calibration. Specifically, we investigate improvements that can be made in the vital calibration stages of sky modelling, ionospheric correction, and compact source foreground subtraction as applied in the hybrid foreground mitigation approach (one that combines both foreground subtraction and avoidance). Additionally, we investigate a method of ionospheric correction using interpolated ionospheric phase screens and assess its performance in the power spectrum space. Overall, we identify a refined RTS calibration configuration that leads to an at least 2 factor reduction of the EoR window power contamination at the $0.1 \; \text{hMpc}^{-1}$ scale. The improvement marks a step further towards detecting the 21 cm signal using the MWA and the forthcoming SKA low telescope.
△ Less
Submitted 25 July, 2022;
originally announced July 2022.
-
The MWA Long Baseline Epoch of Reionisation Survey: I. Improved Source Catalogue for the EoR 0 field
Authors:
C. R. Lynch,
T. J. Galvin,
J. L. B. Line,
C. H. Jordan,
C. M. Trott,
J. K. Chege,
B. McKinley,
M. Johnston-Hollitt,
S. J. Tingay
Abstract:
One of the principal systematic constraints on the Epoch of Reionisation (EoR) experiment is the accuracy of the foreground calibration model. Recent results have shown that highly accurate models of extended foreground sources, and including models for sources in both the primary beam and its sidelobes, are necessary for reducing foreground power. To improve the accuracy of the source models for…
▽ More
One of the principal systematic constraints on the Epoch of Reionisation (EoR) experiment is the accuracy of the foreground calibration model. Recent results have shown that highly accurate models of extended foreground sources, and including models for sources in both the primary beam and its sidelobes, are necessary for reducing foreground power. To improve the accuracy of the source models for the EoR fields observed by the Murchison Widefield Array (MWA), we conducted the MWA Long Baseline Epoch of Reionisation Survey (LoBES). This survey consists of multi-frequency observations of the main MWA EoR fields and their eight neighbouring fields using the MWA Phase II extended array. We present the results of the first half of this survey centred on the MWA EoR0 observing field (centred at RA(J2000) 0 h, Dec(J2000) -27 deg). This half of the survey covers an area of 3069 degrees$^2$, with an average rms of 2.1 mJy beam$^{-1}$. The resulting catalogue contains a total of 80824 sources, with 16 separate spectral measurements between 100 and 230 MHz, and spectral modelling for 78$\%$ of these sources. Over this region we estimate that the catalogue is 90$\%$ complete at 32 mJy, and 70$\%$ complete at 10.5~mJy. The overall normalised source counts are found to be in good agreement with previous low-frequency surveys at similar sensitivities. Testing the performance of the new source models we measure lower residual rms values for peeled sources, particularly for extended sources, in a set of MWA Phase I data. The 2-dimensional power spectrum of these data residuals also show improvement on small angular scales -- consistent with the better angular resolution of the LoBES catalogue. It is clear that the LoBES sky models improve upon the current sky model used by the Australian MWA EoR group for the EoR0 field.
△ Less
Submitted 15 October, 2021;
originally announced October 2021.
-
Simulations of ionospheric refraction on radio interferometric data
Authors:
J. Kariuki Chege,
C. H. Jordan,
C. Lynch,
J. L. B. Line,
C. M. Trott
Abstract:
The Epoch of Reionisation (EoR) is the period within which the neutral universe transitioned to an ionised one. This period remains unobserved using low-frequency radio interferometers which target the 21 cm signal of neutral hydrogen emitted in this era. The Murchison Widefield Array (MWA) radio telescope was built with the detection of this signal as one of its major science goals. One of the mo…
▽ More
The Epoch of Reionisation (EoR) is the period within which the neutral universe transitioned to an ionised one. This period remains unobserved using low-frequency radio interferometers which target the 21 cm signal of neutral hydrogen emitted in this era. The Murchison Widefield Array (MWA) radio telescope was built with the detection of this signal as one of its major science goals. One of the most significant challenges towards a successful detection is that of calibration, especially in the presence of the Earth's ionosphere. By introducing refractive source shifts, distorting source shapes and scintillating flux densities, the ionosphere is a major nuisance in low-frequency radio astronomy. We introduce SIVIO, a software tool developed for simulating observations of the MWA through different ionospheric conditions estimated using thin screen approximation models and propagated into the visibilities. This enables us to directly assess the impact of the ionosphere on observed EoR data and the resulting power spectra. We show that the simulated data captures the dispersive behaviour of ionospheric effects. We show that the spatial structure of the simulated ionospheric media is accurately reconstructed either from the resultant source positional offsets or from parameters evaluated during the data calibration procedure. In turn, this will inform on the best strategies of identifying and efficiently eliminating ionospheric contamination in EoR data moving into the Square Kilometre Array era.
△ Less
Submitted 10 May, 2021;
originally announced May 2021.