subscribe to arXiv mailings

Spatio-temporal Multivariate Cluster Evolution Analysis for Detecting and Tracking Climate Impacts

Authors: Warren L. Davis IV, Max Carlson, Irina Tezaur, Diana Bull, Kara Peterson, Laura Swiler

Abstract: Recent years have seen a growing concern about climate change and its impacts. While Earth System Models (ESMs) can be invaluable tools for studying the impacts of climate change, the complex coupling processes encoded in ESMs and the large amounts of data produced by these models, together with the high internal variability of the Earth system, can obscure important source-to-impact relationships… ▽ More Recent years have seen a growing concern about climate change and its impacts. While Earth System Models (ESMs) can be invaluable tools for studying the impacts of climate change, the complex coupling processes encoded in ESMs and the large amounts of data produced by these models, together with the high internal variability of the Earth system, can obscure important source-to-impact relationships. This paper presents a novel and efficient unsupervised data-driven approach for detecting statistically-significant impacts and tracing spatio-temporal source-impact pathways in the climate through a unique combination of ideas from anomaly detection, clustering and Natural Language Processing (NLP). Using as an exemplar the 1991 eruption of Mount Pinatubo in the Philippines, we demonstrate that the proposed approach is capable of detecting known post-eruption impacts/events. We additionally describe a methodology for extracting meaningful sequences of post-eruption impacts/events by using NLP to efficiently mine frequent multivariate cluster evolutions, which can be used to confirm or discover the chain of physical processes between a climate source and its impact(s). △ Less

Submitted 21 October, 2024; originally announced October 2024.

arXiv:2410.06557 [pdf, other]

Observation of disorder-free localization and efficient disorder averaging on a quantum processor

Authors: Gaurav Gyawali, Tyler Cochran, Yuri Lensky, Eliott Rosenberg, Amir H. Karamlou, Kostyantyn Kechedzhi, Julia Berndtsson, Tom Westerhout, Abraham Asfaw, Dmitry Abanin, Rajeev Acharya, Laleh Aghababaie Beni, Trond I. Andersen, Markus Ansmann, Frank Arute, Kunal Arya, Nikita Astrakhantsev, Juan Atalaya, Ryan Babbush, Brian Ballard, Joseph C. Bardin, Andreas Bengtsson, Alexander Bilmes, Gina Bortoli, Alexandre Bourassa , et al. (195 additional authors not shown)

Abstract: One of the most challenging problems in the computational study of localization in quantum manybody systems is to capture the effects of rare events, which requires sampling over exponentially many disorder realizations. We implement an efficient procedure on a quantum processor, leveraging quantum parallelism, to efficiently sample over all disorder realizations. We observe localization without d… ▽ More One of the most challenging problems in the computational study of localization in quantum manybody systems is to capture the effects of rare events, which requires sampling over exponentially many disorder realizations. We implement an efficient procedure on a quantum processor, leveraging quantum parallelism, to efficiently sample over all disorder realizations. We observe localization without disorder in quantum many-body dynamics in one and two dimensions: perturbations do not diffuse even though both the generator of evolution and the initial states are fully translationally invariant. The disorder strength as well as its density can be readily tuned using the initial state. Furthermore, we demonstrate the versatility of our platform by measuring Renyi entropies. Our method could also be extended to higher moments of the physical observables and disorder learning. △ Less

Submitted 9 October, 2024; originally announced October 2024.

arXiv:2410.01517 [pdf, other]

UW-GS: Distractor-Aware 3D Gaussian Splatting for Enhanced Underwater Scene Reconstruction

Authors: Haoran Wang, Nantheera Anantrasirichai, Fan Zhang, David Bull

Abstract: 3D Gaussian splatting (3DGS) offers the capability to achieve real-time high quality 3D scene rendering. However, 3DGS assumes that the scene is in a clear medium environment and struggles to generate satisfactory representations in underwater scenes, where light absorption and scattering are prevalent and moving objects are involved. To overcome these, we introduce a novel Gaussian Splatting-base… ▽ More 3D Gaussian splatting (3DGS) offers the capability to achieve real-time high quality 3D scene rendering. However, 3DGS assumes that the scene is in a clear medium environment and struggles to generate satisfactory representations in underwater scenes, where light absorption and scattering are prevalent and moving objects are involved. To overcome these, we introduce a novel Gaussian Splatting-based method, UW-GS, designed specifically for underwater applications. It introduces a color appearance that models distance-dependent color variation, employs a new physics-based density control strategy to enhance clarity for distant objects, and uses a binary motion mask to handle dynamic content. Optimized with a well-designed loss function supporting for scattering media and strengthened by pseudo-depth maps, UW-GS outperforms existing methods with PSNR gains up to 1.26dB. To fully verify the effectiveness of the model, we also developed a new underwater dataset, S-UW, with dynamic object masks. △ Less

Submitted 2 October, 2024; originally announced October 2024.

arXiv:2409.18011 [pdf, other]

Entropy-based feature selection for capturing impacts in Earth system models with extreme forcing

Authors: Jerry Watkins, Luca Bertagna, Graham Harper, Andrew Steyer, Irina Tezaur, Diana Bull

Abstract: This paper presents the development of a new entropy-based feature selection method for identifying and quantifying impacts. Here, impacts are defined as statistically significant differences in spatio-temporal fields when comparing datasets with and without an external forcing in Earth system models. Temporal feature selection is performed by first computing the cross-fuzzy entropy to quantify si… ▽ More This paper presents the development of a new entropy-based feature selection method for identifying and quantifying impacts. Here, impacts are defined as statistically significant differences in spatio-temporal fields when comparing datasets with and without an external forcing in Earth system models. Temporal feature selection is performed by first computing the cross-fuzzy entropy to quantify similarity of patterns between two datasets and then applying changepoint detection to identify regions of statistically constant entropy. The method is used to capture temperate north surface cooling from a 9-member simulation ensemble of the Mt. Pinatubo volcanic eruption, which injected 10 Tg of SO2 into the stratosphere. The results estimate a mean difference decrease in near surface air temperature of -0.560 K with a 99% confidence interval between -0.864 K and -0.257 K between April and November of 1992, one year following the eruption. A sensitivity analysis with decreasing SO2 injection revealed that the impact is statistically significant at 5 Tg but not at 3 Tg. Using identified features, a dependency graph model composed of 68 nodes and 229 edges directly connecting initial aerosol optical depth changes in the tropics to solar flux and temperature changes before the temperate north surface cooling is presented. △ Less

Submitted 26 September, 2024; originally announced September 2024.

Report number: SAND2024-12757O

arXiv:2409.16609 [pdf, other]

Random Forest Regression Feature Importance for Climate Impact Pathway Detection

Authors: Meredith G. L. Brown, Matt Peterson, Irina Tezaur, Kara Peterson, Diana Bull

Abstract: Disturbances to the climate system, both natural and anthropogenic, have far reaching impacts that are not always easy to identify or quantify using traditional climate science analyses or causal modeling techniques. In this paper, we develop a novel technique for discovering and ranking the chain of spatio-temporal downstream impacts of a climate source, referred to herein as a source-impact path… ▽ More Disturbances to the climate system, both natural and anthropogenic, have far reaching impacts that are not always easy to identify or quantify using traditional climate science analyses or causal modeling techniques. In this paper, we develop a novel technique for discovering and ranking the chain of spatio-temporal downstream impacts of a climate source, referred to herein as a source-impact pathway, using Random Forest Regression (RFR) and SHapley Additive exPlanation (SHAP) feature importances. Rather than utilizing RFR for classification or regression tasks (the most common use case for RFR), we propose a fundamentally new RFR-based workflow in which we: (i) train random forest (RF) regressors on a set of spatio-temporal features of interest, (ii) calculate their pair-wise feature importances using the SHAP weights associated with those features, and (iii) translate these feature importances into a weighted pathway network (i.e., a weighted directed graph), which can be used to trace out and rank interdependencies between climate features and/or modalities. We adopt a tiered verification approach to verify our new pathway identification methodology. In this approach, we apply our method to ensembles of data generated by running two increasingly complex benchmarks: (i) a set of synthetic coupled equations, and (ii) a fully coupled simulation of the 1991 eruption of Mount Pinatubo in the Philippines performed using a modified version 2 of the U.S. Department of Energy's Energy Exascale Earth System Model (E3SMv2). We find that our RFR feature importance-based approach can accurately detect known pathways of impact for both test cases. △ Less

Submitted 25 September, 2024; originally announced September 2024.

arXiv:2409.14587 [pdf, other]

Deep Learning Techniques for Atmospheric Turbulence Removal: A Review

Authors: Paul Hill, Nantheera Anantrasirichai, Alin Achim, David Bull

Abstract: The influence of atmospheric turbulence on acquired imagery makes image interpretation and scene analysis extremely difficult and reduces the effectiveness of conventional approaches for classifying and tracking objects of interest in the scene. Restoring a scene distorted by atmospheric turbulence is also a challenging problem. The effect, which is caused by random, spatially varying perturbation… ▽ More The influence of atmospheric turbulence on acquired imagery makes image interpretation and scene analysis extremely difficult and reduces the effectiveness of conventional approaches for classifying and tracking objects of interest in the scene. Restoring a scene distorted by atmospheric turbulence is also a challenging problem. The effect, which is caused by random, spatially varying perturbations, makes conventional model-based approaches difficult and, in most cases, impractical due to complexity and memory requirements. Deep learning approaches offer faster operation and are capable of implementation on small devices. This paper reviews the characteristics of atmospheric turbulence and its impact on acquired imagery. It compares the performance of various state-of-the-art deep neural networks, including Transformers, SWIN and Mamba, when used to mitigate spatio-temporal image distortions. △ Less

Submitted 3 September, 2024; originally announced September 2024.

Comments: 36 Pages, 8 figures

arXiv:2409.07414 [pdf, other]

NVRC: Neural Video Representation Compression

Authors: Ho Man Kwan, Ge Gao, Fan Zhang, Andrew Gower, David Bull

Abstract: Recent advances in implicit neural representation (INR)-based video coding have demonstrated its potential to compete with both conventional and other learning-based approaches. With INR methods, a neural network is trained to overfit a video sequence, with its parameters compressed to obtain a compact representation of the video content. However, although promising results have been achieved, the… ▽ More Recent advances in implicit neural representation (INR)-based video coding have demonstrated its potential to compete with both conventional and other learning-based approaches. With INR methods, a neural network is trained to overfit a video sequence, with its parameters compressed to obtain a compact representation of the video content. However, although promising results have been achieved, the best INR-based methods are still out-performed by the latest standard codecs, such as VVC VTM, partially due to the simple model compression techniques employed. In this paper, rather than focusing on representation architectures as in many existing works, we propose a novel INR-based video compression framework, Neural Video Representation Compression (NVRC), targeting compression of the representation. Based on the novel entropy coding and quantization models proposed, NVRC, for the first time, is able to optimize an INR-based video codec in a fully end-to-end manner. To further minimize the additional bitrate overhead introduced by the entropy models, we have also proposed a new model compression framework for coding all the network, quantization and entropy model parameters hierarchically. Our experiments show that NVRC outperforms many conventional and learning-based benchmark codecs, with a 24% average coding gain over VVC VTM (Random Access) on the UVG dataset, measured in PSNR. As far as we are aware, this is the first time an INR-based video codec achieving such performance. The implementation of NVRC will be released at www.github.com. △ Less

Submitted 11 September, 2024; originally announced September 2024.

arXiv:2409.06846 [pdf, other]

Stratospheric aerosol source inversion: Noise, variability, and uncertainty quantification

Authors: J. Hart, I. Manickam, M. Gulian, L. Swiler, D. Bull, T. Ehrmann, H. Brown, B. Wagman, J. Watkins

Abstract: Stratospheric aerosols play an important role in the earth system and can affect the climate on timescales of months to years. However, estimating the characteristics of partially observed aerosol injections, such as those from volcanic eruptions, is fraught with uncertainties. This article presents a framework for stratospheric aerosol source inversion which accounts for background aerosol noise… ▽ More Stratospheric aerosols play an important role in the earth system and can affect the climate on timescales of months to years. However, estimating the characteristics of partially observed aerosol injections, such as those from volcanic eruptions, is fraught with uncertainties. This article presents a framework for stratospheric aerosol source inversion which accounts for background aerosol noise and earth system internal variability via a Bayesian approximation error approach. We leverage specially designed earth system model simulations using the Energy Exascale Earth System Model (E3SM). A comprehensive framework for data generation, data processing, dimension reduction, operator learning, and Bayesian inversion is presented where each component of the framework is designed to address particular challenges in stratospheric modeling on the global scale. We present numerical results using synthesized observational data to rigorously assess the ability of our approach to estimate aerosol sources and associate uncertainty with those estimates. △ Less

Submitted 10 September, 2024; originally announced September 2024.

arXiv:2409.01396 [pdf, other]

Conditional multi-step attribution for climate forcings

Authors: Christopher R. Wentland, Michael Weylandt, Laura P. Swiler, Thomas S. Ehrmann, Diana Bull

Abstract: Attribution of climate impacts to a source forcing is critical to understanding, communicating, and addressing the effects of human influence on the climate. While standard attribution methods, such as optimal fingerprinting, have been successfully applied to long-term, widespread effects such as global surface temperature warming, they often struggle in low signal-to-noise regimes, typical of sho… ▽ More Attribution of climate impacts to a source forcing is critical to understanding, communicating, and addressing the effects of human influence on the climate. While standard attribution methods, such as optimal fingerprinting, have been successfully applied to long-term, widespread effects such as global surface temperature warming, they often struggle in low signal-to-noise regimes, typical of short-term climate forcings or climate variables which are loosely related to the forcing. Single-step approaches, which directly relate a source forcing and final impact, are unable to utilize additional climate information to improve attribution certainty. To address this shortcoming, this paper presents a novel multi-step attribution approach which is capable of analyzing multiple variables conditionally. A connected series of climate effects are treated as dependent, and relationships found in intermediary steps of a causal pathway are leveraged to better characterize the forcing impact. This enables attribution of the forcing level responsible for the observed impacts, while equivalent single-step approaches fail. Utilizing a scalar feature describing the forcing impact, simple forcing response models, and a conditional Bayesian formulation, this method can incorporate several causal pathways to identify the correct forcing magnitude. As an exemplar of a short-term, high-variance forcing, we demonstrate this method for the 1991 eruption of Mt. Pinatubo. Results indicate that including stratospheric and surface temperature and radiative flux measurements increases attribution certainty compared to analyses derived solely from temperature measurements. This framework has potential to improve climate attribution assessments for both geoengineering projects and long-term climate change, for which standard attribution methods may fail. △ Less

Submitted 2 September, 2024; originally announced September 2024.

Comments: 42 pages, 10 figures

Report number: SAND2024-11250O

arXiv:2409.00953 [pdf, other]

PNVC: Towards Practical INR-based Video Compression

Authors: Ge Gao, Ho Man Kwan, Fan Zhang, David Bull

Abstract: Neural video compression has recently demonstrated significant potential to compete with conventional video codecs in terms of rate-quality performance. These learned video codecs are however associated with various issues related to decoding complexity (for autoencoder-based methods) and/or system delays (for implicit neural representation (INR) based models), which currently prevent them from be… ▽ More Neural video compression has recently demonstrated significant potential to compete with conventional video codecs in terms of rate-quality performance. These learned video codecs are however associated with various issues related to decoding complexity (for autoencoder-based methods) and/or system delays (for implicit neural representation (INR) based models), which currently prevent them from being deployed in practical applications. In this paper, targeting a practical neural video codec, we propose a novel INR-based coding framework, PNVC, which innovatively combines autoencoder-based and overfitted solutions. Our approach benefits from several design innovations, including a new structural reparameterization-based architecture, hierarchical quality control, modulation-based entropy modeling, and scale-aware positional embedding. Supporting both low delay (LD) and random access (RA) configurations, PNVC outperforms existing INR-based codecs, achieving nearly 35%+ BD-rate savings against HEVC HM 18.0 (LD) - almost 10% more compared to one of the state-of-the-art INR-based codecs, HiNeRV and 5% more over VTM 20.0 (LD), while maintaining 20+ FPS decoding speeds for 1080p content. This represents an important step forward for INR-based video coding, moving it towards practical deployment. The source code will be available for public evaluation. △ Less

Submitted 2 September, 2024; originally announced September 2024.

arXiv:2408.13687 [pdf, other]

Quantum error correction below the surface code threshold

Authors: Rajeev Acharya, Laleh Aghababaie-Beni, Igor Aleiner, Trond I. Andersen, Markus Ansmann, Frank Arute, Kunal Arya, Abraham Asfaw, Nikita Astrakhantsev, Juan Atalaya, Ryan Babbush, Dave Bacon, Brian Ballard, Joseph C. Bardin, Johannes Bausch, Andreas Bengtsson, Alexander Bilmes, Sam Blackwell, Sergio Boixo, Gina Bortoli, Alexandre Bourassa, Jenna Bovaird, Leon Brill, Michael Broughton, David A. Browne , et al. (224 additional authors not shown)

Abstract: Quantum error correction provides a path to reach practical quantum computing by combining multiple physical qubits into a logical qubit, where the logical error rate is suppressed exponentially as more qubits are added. However, this exponential suppression only occurs if the physical error rate is below a critical threshold. In this work, we present two surface code memories operating below this… ▽ More Quantum error correction provides a path to reach practical quantum computing by combining multiple physical qubits into a logical qubit, where the logical error rate is suppressed exponentially as more qubits are added. However, this exponential suppression only occurs if the physical error rate is below a critical threshold. In this work, we present two surface code memories operating below this threshold: a distance-7 code and a distance-5 code integrated with a real-time decoder. The logical error rate of our larger quantum memory is suppressed by a factor of $Λ$ = 2.14 $\pm$ 0.02 when increasing the code distance by two, culminating in a 101-qubit distance-7 code with 0.143% $\pm$ 0.003% error per cycle of error correction. This logical memory is also beyond break-even, exceeding its best physical qubit's lifetime by a factor of 2.4 $\pm$ 0.3. We maintain below-threshold performance when decoding in real time, achieving an average decoder latency of 63 $μ$s at distance-5 up to a million cycles, with a cycle time of 1.1 $μ$s. To probe the limits of our error-correction performance, we run repetition codes up to distance-29 and find that logical performance is limited by rare correlated error events occurring approximately once every hour, or 3 $\times$ 10$^9$ cycles. Our results present device performance that, if scaled, could realize the operational requirements of large scale fault-tolerant quantum algorithms. △ Less

Submitted 24 August, 2024; originally announced August 2024.

Comments: 10 pages, 4 figures, Supplementary Information

arXiv:2408.07171 [pdf, other]

BVI-UGC: A Video Quality Database for User-Generated Content Transcoding

Authors: Zihao Qi, Chen Feng, Fan Zhang, Xiaozhong Xu, Shan Liu, David Bull

Abstract: In recent years, user-generated content (UGC) has become one of the major video types consumed via streaming networks. Numerous research contributions have focused on assessing its visual quality through subjective tests and objective modeling. In most cases, objective assessments are based on a no-reference scenario, where the corresponding reference content is assumed not to be available. Howeve… ▽ More In recent years, user-generated content (UGC) has become one of the major video types consumed via streaming networks. Numerous research contributions have focused on assessing its visual quality through subjective tests and objective modeling. In most cases, objective assessments are based on a no-reference scenario, where the corresponding reference content is assumed not to be available. However, full-reference video quality assessment is also important for UGC in the delivery pipeline, particularly associated with the video transcoding process. In this context, we present a new UGC video quality database, BVI-UGC, for user-generated content transcoding, which contains 60 (non-pristine) reference videos and 1,080 test sequences. In this work, we simulated the creation of non-pristine reference sequences (with a wide range of compression distortions), typical of content uploaded to UGC platforms for transcoding. A comprehensive crowdsourced subjective study was then conducted involving more than 3,500 human participants. Based on this collected subjective data, we benchmarked the performance of 10 full-reference and 11 no-reference quality metrics. Our results demonstrate the poor performance (SROCC values are lower than 0.6) of these metrics in predicting the perceptual quality of UGC in two different scenarios (with or without a reference). △ Less

Submitted 13 August, 2024; originally announced August 2024.

Comments: 12 pages, 11 figures

arXiv:2408.05042 [pdf, other]

Benchmarking Conventional and Learned Video Codecs with a Low-Delay Configuration

Authors: Siyue Teng, Yuxuan Jiang, Ge Gao, Fan Zhang, Thomas Davis, Zoe Liu, David Bull

Abstract: Recent advances in video compression have seen significant coding performance improvements with the development of new standards and learning-based video codecs. However, most of these works focus on application scenarios that allow a certain amount of system delay (e.g., Random Access mode in MPEG codecs), which is not always acceptable for live delivery. This paper conducts a comparative study o… ▽ More Recent advances in video compression have seen significant coding performance improvements with the development of new standards and learning-based video codecs. However, most of these works focus on application scenarios that allow a certain amount of system delay (e.g., Random Access mode in MPEG codecs), which is not always acceptable for live delivery. This paper conducts a comparative study of state-of-the-art conventional and learned video coding methods based on a low delay configuration. Specifically, this study includes two MPEG standard codecs (H.266/VVC VTM and JVET ECM), two AOM codecs (AV1 libaom and AVM), and two recent neural video coding models (DCVC-DC and DCVC-FM). To allow a fair and meaningful comparison, the evaluation was performed on test sequences defined in the AOM and MPEG common test conditions in the YCbCr 4:2:0 color space. The evaluation results show that the JVET ECM codecs offer the best overall coding performance among all codecs tested, with a 16.1% (based on PSNR) average BD-rate saving over AOM AVM, and 11.0% over DCVC-FM. We also observed inconsistent performance with the learned video codecs, DCVC-DC and DCVC-FM, for test content with large background motions. △ Less

Submitted 9 August, 2024; originally announced August 2024.

arXiv:2408.04099 [pdf, other]

In-situ data extraction for pathway analysis in an idealized atmosphere configuration of E3SM

Authors: Andrew Steyer, Luca Bertagna, Graham Harper, Jerry Watkins, Irina Tezaur, Diana Bull

Abstract: We propose an approach for characterizing source-impact pathways, the interactions of a set of variables in space-time due to an external forcing, in climate models using in-situ analyses that circumvent computationally expensive read/write operations. This approach makes use of a lightweight open-source software library we developed known as CLDERA-Tools. We describe how CLDERA-Tools is linked wi… ▽ More We propose an approach for characterizing source-impact pathways, the interactions of a set of variables in space-time due to an external forcing, in climate models using in-situ analyses that circumvent computationally expensive read/write operations. This approach makes use of a lightweight open-source software library we developed known as CLDERA-Tools. We describe how CLDERA-Tools is linked with the U.S. Department of Energy's Energy Exascale Earth System Model (E3SM) in a minimally invasive way for in-situ extraction of quantities of interested and associated statistics. Subsequently, these quantities are used to represent source-impact pathways with time-dependent directed acyclic graphs (DAGs). The utility of CLDERA-Tools is demonstrated by using the data it extracts in-situ to compute a spatially resolved DAG from an idealized configuration of the atmosphere with a parameterized representation of a volcanic eruption known as HSW-V. △ Less

Submitted 7 August, 2024; originally announced August 2024.

arXiv:2408.03265 [pdf, other]

BVI-AOM: A New Training Dataset for Deep Video Compression Optimization

Authors: Jakub Nawała, Yuxuan Jiang, Fan Zhang, Xiaoqing Zhu, Joel Sole, David Bull

Abstract: Deep learning is now playing an important role in enhancing the performance of conventional hybrid video codecs. These learning-based methods typically require diverse and representative training material for optimization in order to achieve model generalization and optimal coding performance. However, existing datasets either offer limited content variability or come with restricted licensing ter… ▽ More Deep learning is now playing an important role in enhancing the performance of conventional hybrid video codecs. These learning-based methods typically require diverse and representative training material for optimization in order to achieve model generalization and optimal coding performance. However, existing datasets either offer limited content variability or come with restricted licensing terms constraining their use to research purposes only. To address these issues, we propose a new training dataset, named BVI-AOM, which contains 956 uncompressed sequences at various resolutions from 270p to 2160p, covering a wide range of content and texture types. The dataset comes with more flexible licensing terms and offers competitive performance when used as a training set for optimizing deep video coding tools. The experimental results demonstrate that when used as a training set to optimize two popular network architectures for two different coding tools, the proposed dataset leads to additional bitrate savings of up to 0.29 and 2.98 percentage points in terms of PSNR-Y and VMAF, respectively, compared to an existing training dataset, BVI-DVC, which has been widely used for deep video coding. The BVI-AOM dataset is available for download under this link: (TBD). △ Less

Submitted 7 August, 2024; v1 submitted 6 August, 2024; originally announced August 2024.

Comments: 6 pages, 5 figures. Swapped the PSNR-HVS plot in Fig. 3 for a PSNR-YUV plot

arXiv:2407.11496 [pdf, other]

ReLaX-VQA: Residual Fragment and Layer Stack Extraction for Enhancing Video Quality Assessment

Authors: Xinyi Wang, Angeliki Katsenou, David Bull

Abstract: With the rapid growth of User-Generated Content (UGC) exchanged between users and sharing platforms, the need for video quality assessment in the wild has emerged. UGC is mostly acquired using consumer devices and undergoes multiple rounds of compression or transcoding before reaching the end user. Therefore, traditional quality metrics that require the original content as a reference cannot be us… ▽ More With the rapid growth of User-Generated Content (UGC) exchanged between users and sharing platforms, the need for video quality assessment in the wild has emerged. UGC is mostly acquired using consumer devices and undergoes multiple rounds of compression or transcoding before reaching the end user. Therefore, traditional quality metrics that require the original content as a reference cannot be used. In this paper, we propose ReLaX-VQA, a novel No-Reference Video Quality Assessment (NR-VQA) model that aims to address the challenges of evaluating the diversity of video content and the assessment of its quality without reference videos. ReLaX-VQA uses fragments of residual frames and optical flow, along with different expressions of spatial features of the sampled frames, to enhance motion and spatial perception. Furthermore, the model enhances abstraction by employing layer-stacking techniques in deep neural network features (from Residual Networks and Vision Transformers). Extensive testing on four UGC datasets confirms that ReLaX-VQA outperforms existing NR-VQA methods with an average SRCC value of 0.8658 and PLCC value of 0.8872. We will open source the code and trained models to facilitate further research and applications of NR-VQA: https://github.com/xinyiW915/ReLaX-VQA. △ Less

Submitted 16 July, 2024; originally announced July 2024.

arXiv:2407.03535 [pdf, other]

BVI-RLV: A Fully Registered Dataset and Benchmarks for Low-Light Video Enhancement

Authors: Ruirui Lin, Nantheera Anantrasirichai, Guoxi Huang, Joanne Lin, Qi Sun, Alexandra Malyugina, David R Bull

Abstract: Low-light videos often exhibit spatiotemporal incoherent noise, compromising visibility and performance in computer vision applications. One significant challenge in enhancing such content using deep learning is the scarcity of training data. This paper introduces a novel low-light video dataset, consisting of 40 scenes with various motion scenarios under two distinct low-lighting conditions, inco… ▽ More Low-light videos often exhibit spatiotemporal incoherent noise, compromising visibility and performance in computer vision applications. One significant challenge in enhancing such content using deep learning is the scarcity of training data. This paper introduces a novel low-light video dataset, consisting of 40 scenes with various motion scenarios under two distinct low-lighting conditions, incorporating genuine noise and temporal artifacts. We provide fully registered ground truth data captured in normal light using a programmable motorized dolly and refine it via an image-based approach for pixel-wise frame alignment across different light levels. We provide benchmarks based on four different technologies: convolutional neural networks, transformers, diffusion models, and state space models (mamba). Our experimental results demonstrate the significance of fully registered video pairs for low-light video enhancement (LLVE) and the comprehensive evaluation shows that the models trained with our dataset outperform those trained with the existing datasets. Our dataset and links to benchmarks are publicly available at https://doi.org/10.21227/mzny-8c77. △ Less

Submitted 28 July, 2024; v1 submitted 3 July, 2024; originally announced July 2024.

Comments: arXiv admin note: text overlap with arXiv:2402.01970

arXiv:2407.01230 [pdf, other]

DaBiT: Depth and Blur informed Transformer for Joint Refocusing and Super-Resolution

Authors: Crispian Morris, Nantheera Anantrasirichai, Fan Zhang, David Bull

Abstract: In many real-world scenarios, recorded videos suffer from accidental focus blur, and while video deblurring methods exist, most specifically target motion blur. This paper introduces a framework optimised for the joint task of focal deblurring (refocusing) and video super-resolution (VSR). The proposed method employs novel map guided transformers, in addition to image propagation, to effectively l… ▽ More In many real-world scenarios, recorded videos suffer from accidental focus blur, and while video deblurring methods exist, most specifically target motion blur. This paper introduces a framework optimised for the joint task of focal deblurring (refocusing) and video super-resolution (VSR). The proposed method employs novel map guided transformers, in addition to image propagation, to effectively leverage the continuous spatial variance of focal blur and restore the footage. We also introduce a flow re-focusing module to efficiently align relevant features between the blurry and sharp domains. Additionally, we propose a novel technique for generating synthetic focal blur data, broadening the model's learning capabilities to include a wider array of content. We have made a new benchmark dataset, DAVIS-Blur, available. This dataset, a modified extension of the popular DAVIS video segmentation set, provides realistic out-of-focus blur degradations as well as the corresponding blur maps. Comprehensive experiments on DAVIS-Blur demonstrate the superiority of our approach. We achieve state-of-the-art results with an average PSNR performance over 1.9dB greater than comparable existing video restoration methods. Our source code will be made available at https://github.com/crispianm/DaBiT △ Less

Submitted 10 July, 2024; v1 submitted 1 July, 2024; originally announced July 2024.

arXiv:2406.00212 [pdf, other]

MVAD: A Multiple Visual Artifact Detector for Video Streaming

Authors: Chen Feng, Duolikun Danier, Fan Zhang, David Bull

Abstract: Visual artifacts are often introduced into streamed video content, due to prevailing conditions during content production and/or delivery. Since these can degrade the quality of the user's experience, it is important to automatically and accurately detect them in order to enable effective quality measurement and enhancement. Existing detection methods often focus on a single type of artifact and/o… ▽ More Visual artifacts are often introduced into streamed video content, due to prevailing conditions during content production and/or delivery. Since these can degrade the quality of the user's experience, it is important to automatically and accurately detect them in order to enable effective quality measurement and enhancement. Existing detection methods often focus on a single type of artifact and/or determine the presence of an artifact through thresholding objective quality indices. Such approaches have been reported to offer inconsistent prediction performance and are also impractical for real-world applications where multiple artifacts co-exist and interact. In this paper, we propose a Multiple Visual Artifact Detector, MVAD, for video streaming which, for the first time, is able to detect multiple artifacts using a single framework that is not reliant on video quality assessment models. Our approach employs a new Artifact-aware Dynamic Feature Extractor (ADFE) to obtain artifact-relevant spatial features within each frame for multiple artifact types. The extracted features are further processed by a Recurrent Memory Vision Transformer (RMViT) module, which captures both short-term and long-term temporal information within the input video. The proposed network architecture is optimized in an end-to-end manner based on a new, large and diverse training database that is generated by simulating the video streaming pipeline and based on Adversarial Data Augmentation. This model has been evaluated on two video artifact databases, Maxwell and BVI-Artifact, and achieves consistent and improved prediction results for ten target visual artifacts when compared to seven existing single and multiple artifact detectors. The source code and training database will be available at https://chenfeng-bristol.github.io/MVAD/. △ Less

Submitted 31 May, 2024; originally announced June 2024.

Comments: 9 pages

arXiv:2405.17385 [pdf, other]

Thermalization and Criticality on an Analog-Digital Quantum Simulator

Authors: Trond I. Andersen, Nikita Astrakhantsev, Amir H. Karamlou, Julia Berndtsson, Johannes Motruk, Aaron Szasz, Jonathan A. Gross, Alexander Schuckert, Tom Westerhout, Yaxing Zhang, Ebrahim Forati, Dario Rossi, Bryce Kobrin, Agustin Di Paolo, Andrey R. Klots, Ilya Drozdov, Vladislav D. Kurilovich, Andre Petukhov, Lev B. Ioffe, Andreas Elben, Aniket Rath, Vittorio Vitale, Benoit Vermersch, Rajeev Acharya, Laleh Aghababaie Beni , et al. (202 additional authors not shown)

Abstract: Understanding how interacting particles approach thermal equilibrium is a major challenge of quantum simulators. Unlocking the full potential of such systems toward this goal requires flexible initial state preparation, precise time evolution, and extensive probes for final state characterization. We present a quantum simulator comprising 69 superconducting qubits which supports both universal qua… ▽ More Understanding how interacting particles approach thermal equilibrium is a major challenge of quantum simulators. Unlocking the full potential of such systems toward this goal requires flexible initial state preparation, precise time evolution, and extensive probes for final state characterization. We present a quantum simulator comprising 69 superconducting qubits which supports both universal quantum gates and high-fidelity analog evolution, with performance beyond the reach of classical simulation in cross-entropy benchmarking experiments. Emulating a two-dimensional (2D) XY quantum magnet, we leverage a wide range of measurement techniques to study quantum states after ramps from an antiferromagnetic initial state. We observe signatures of the classical Kosterlitz-Thouless phase transition, as well as strong deviations from Kibble-Zurek scaling predictions attributed to the interplay between quantum and classical coarsening of the correlated domains. This interpretation is corroborated by injecting variable energy density into the initial state, which enables studying the effects of the eigenstate thermalization hypothesis (ETH) in targeted parts of the eigenspectrum. Finally, we digitally prepare the system in pairwise-entangled dimer states and image the transport of energy and vorticity during thermalization. These results establish the efficacy of superconducting analog-digital quantum processors for preparing states across many-body spectra and unveiling their thermalization dynamics. △ Less

Submitted 8 July, 2024; v1 submitted 27 May, 2024; originally announced May 2024.

arXiv:2405.08621 [pdf, other]

RMT-BVQA: Recurrent Memory Transformer-based Blind Video Quality Assessment for Enhanced Video Content

Authors: Tianhao Peng, Chen Feng, Duolikun Danier, Fan Zhang, Benoit Vallade, Alex Mackin, David Bull

Abstract: With recent advances in deep learning, numerous algorithms have been developed to enhance video quality, reduce visual artifacts, and improve perceptual quality. However, little research has been reported on the quality assessment of enhanced content - the evaluation of enhancement methods is often based on quality metrics that were designed for compression applications. In this paper, we propose… ▽ More With recent advances in deep learning, numerous algorithms have been developed to enhance video quality, reduce visual artifacts, and improve perceptual quality. However, little research has been reported on the quality assessment of enhanced content - the evaluation of enhancement methods is often based on quality metrics that were designed for compression applications. In this paper, we propose a novel blind deep video quality assessment (VQA) method specifically for enhanced video content. It employs a new Recurrent Memory Transformer (RMT) based network architecture to obtain video quality representations, which is optimized through a novel content-quality-aware contrastive learning strategy based on a new database containing 13K training patches with enhanced content. The extracted quality representations are then combined through linear regression to generate video-level quality indices. The proposed method, RMT-BVQA, has been evaluated on the VDPVE (VQA Dataset for Perceptual Video Enhancement) database through a five-fold cross validation. The results show its superior correlation performance when compared to ten existing no-reference quality metrics. △ Less

Submitted 10 October, 2024; v1 submitted 14 May, 2024; originally announced May 2024.

Comments: This paper has been accepted by the ECCV 2024 AIM Advances in Image Manipulation workshop

arXiv:2405.05039 [pdf, other]

Reviewing Intelligent Cinematography: AI research for camera-based video production

Authors: Adrian Azzarelli, Nantheera Anantrasirichai, David R Bull

Abstract: This paper offers a comprehensive review of artificial intelligence (AI) research in the context of real camera content acquisition for entertainment purposes and is aimed at both researchers and cinematographers. Considering the breadth of computer vision research and the lack of review papers tied to intelligent cinematography (IC), this review introduces a holistic view of the IC landscape whil… ▽ More This paper offers a comprehensive review of artificial intelligence (AI) research in the context of real camera content acquisition for entertainment purposes and is aimed at both researchers and cinematographers. Considering the breadth of computer vision research and the lack of review papers tied to intelligent cinematography (IC), this review introduces a holistic view of the IC landscape while providing the technical insight for experts across across disciplines. We preface the main discussion with technical background on generative AI, object detection, automated camera calibration and 3-D content acquisition, and link explanatory articles to assist non-technical readers. The main discussion categorizes work by four production types: General Production, Virtual Production, Live Production and Aerial Production. Note that for Virtual Production we do not discuss research relating to virtual content acquisition, including work on automated video generation, like Stable Diffusion. Within each section, we (1) sub-classify work by the technical field of research - reflected by the subsections, and (2) evaluate the trends and challenge w.r.t to each type of production. In the final chapter, we present our concluding remarks on the greater scope of IC research and outline work that we believe has significant potential to influence the whole industry. We find that work relating to virtual production has the greatest potential to impact other mediums of production, driven by the growing interest in LED volumes/stages for in-camera virtual effects (ICVFX) and automated 3-D capture for a virtual modelling of real world scenes and actors. This is the first piece of literature to offer a structured and comprehensive examination of IC research. Consequently, we address ethical and legal concerns regarding the use of creative AI involving artists, actors and the general public, in the... △ Less

Submitted 8 May, 2024; originally announced May 2024.

Comments: For researchers and cinematographers. 43 pages including Table of Contents, List of Figures and Tables. We obtained permission to use Figures 5 and 11. All other Figures have been drawn by us

arXiv:2404.09571 [pdf, other]

MTKD: Multi-Teacher Knowledge Distillation for Image Super-Resolution

Authors: Yuxuan Jiang, Chen Feng, Fan Zhang, David Bull

Abstract: Knowledge distillation (KD) has emerged as a promising technique in deep learning, typically employed to enhance a compact student network through learning from their high-performance but more complex teacher variant. When applied in the context of image super-resolution, most KD approaches are modified versions of methods developed for other computer vision tasks, which are based on training stra… ▽ More Knowledge distillation (KD) has emerged as a promising technique in deep learning, typically employed to enhance a compact student network through learning from their high-performance but more complex teacher variant. When applied in the context of image super-resolution, most KD approaches are modified versions of methods developed for other computer vision tasks, which are based on training strategies with a single teacher and simple loss functions. In this paper, we propose a novel Multi-Teacher Knowledge Distillation (MTKD) framework specifically for image super-resolution. It exploits the advantages of multiple teachers by combining and enhancing the outputs of these teacher models, which then guides the learning process of the compact student network. To achieve more effective learning performance, we have also developed a new wavelet-based loss function for MTKD, which can better optimize the training process by observing differences in both the spatial and frequency domains. We fully evaluate the effectiveness of the proposed method by comparing it to five commonly used KD methods for image super-resolution based on three popular network architectures. The results show that the proposed MTKD method achieves evident improvements in super-resolution performance, up to 0.46dB (based on PSNR), over state-of-the-art KD approaches across different network structures. The source code of MTKD will be made available here for public evaluation. △ Less

Submitted 15 April, 2024; originally announced April 2024.

arXiv:2403.02408 [pdf, other]

A Spatio-temporal Aligned SUNet Model for Low-light Video Enhancement

Authors: Ruirui Lin, Nantheera Anantrasirichai, Alexandra Malyugina, David Bull

Abstract: Distortions caused by low-light conditions are not only visually unpleasant but also degrade the performance of computer vision tasks. The restoration and enhancement have proven to be highly beneficial. However, there are only a limited number of enhancement methods explicitly designed for videos acquired in low-light conditions. We propose a Spatio-Temporal Aligned SUNet (STA-SUNet) model using… ▽ More Distortions caused by low-light conditions are not only visually unpleasant but also degrade the performance of computer vision tasks. The restoration and enhancement have proven to be highly beneficial. However, there are only a limited number of enhancement methods explicitly designed for videos acquired in low-light conditions. We propose a Spatio-Temporal Aligned SUNet (STA-SUNet) model using a Swin Transformer as a backbone to capture low light video features and exploit their spatio-temporal correlations. The STA-SUNet model is trained on a novel, fully registered dataset (BVI), which comprises dynamic scenes captured under varying light conditions. It is further analysed comparatively against various other models over three test datasets. The model demonstrates superior adaptivity across all datasets, obtaining the highest PSNR and SSIM values. It is particularly effective in extreme low-light conditions, yielding fairly good visualisation results. △ Less

Submitted 12 July, 2024; v1 submitted 4 March, 2024; originally announced March 2024.

arXiv:2402.19041 [pdf, other]

Atmospheric Turbulence Removal with Video Sequence Deep Visual Priors

Authors: P. Hill, N. Anantrasirichai, A. Achim, D. R. Bull

Abstract: Atmospheric turbulence poses a challenge for the interpretation and visual perception of visual imagery due to its distortion effects. Model-based approaches have been used to address this, but such methods often suffer from artefacts associated with moving content. Conversely, deep learning based methods are dependent on large and diverse datasets that may not effectively represent any specific c… ▽ More Atmospheric turbulence poses a challenge for the interpretation and visual perception of visual imagery due to its distortion effects. Model-based approaches have been used to address this, but such methods often suffer from artefacts associated with moving content. Conversely, deep learning based methods are dependent on large and diverse datasets that may not effectively represent any specific content. In this paper, we address these problems with a self-supervised learning method that does not require ground truth. The proposed method is not dependent on any dataset outside of the single data sequence being processed but is also able to improve the quality of any input raw sequences or pre-processed sequences. Specifically, our method is based on an accelerated Deep Image Prior (DIP), but integrates temporal information using pixel shuffling and a temporal sliding window. This efficiently learns spatio-temporal priors leading to a system that effectively mitigates atmospheric turbulence distortions. The experiments show that our method improves visual quality results qualitatively and quantitatively. △ Less

Submitted 29 February, 2024; originally announced February 2024.

arXiv:2402.18307 [pdf, other]

Multi-Scale Denoising in the Feature Space for Low-Light Instance Segmentation

Authors: Joanne Lin, Nantheera Anantrasirichai, David Bull

Abstract: Instance segmentation for low-light imagery remains largely unexplored due to the challenges imposed by such conditions, for example shot noise due to low photon count, color distortions and reduced contrast. In this paper, we propose an end-to-end solution to address this challenging task. Our proposed method implements weighted non-local blocks (wNLB) in the feature extractor. This integration e… ▽ More Instance segmentation for low-light imagery remains largely unexplored due to the challenges imposed by such conditions, for example shot noise due to low photon count, color distortions and reduced contrast. In this paper, we propose an end-to-end solution to address this challenging task. Our proposed method implements weighted non-local blocks (wNLB) in the feature extractor. This integration enables an inherent denoising process at the feature level. As a result, our method eliminates the need for aligned ground truth images during training, thus supporting training on real-world low-light datasets. We introduce additional learnable weights at each layer in order to enhance the network's adaptability to real-world noise characteristics, which affect different feature scales in different ways. Experimental results on several object detectors show that the proposed method outperforms the pretrained networks with an Average Precision (AP) improvement of at least +7.6, with the introduction of wNLB further enhancing AP by upto +1.3. △ Less

Submitted 10 September, 2024; v1 submitted 28 February, 2024; originally announced February 2024.

arXiv:2402.07057 [pdf, other]

Rate-Quality or Energy-Quality Pareto Fronts for Adaptive Video Streaming?

Authors: Angeliki Katsenou, Xinyi Wang, Daniel Schien, David Bull

Abstract: Adaptive video streaming is a key enabler for optimising the delivery of offline encoded video content. The research focus to date has been on optimisation, based solely on rate-quality curves. This paper adds an additional dimension, the energy expenditure, and explores construction of bitrate ladders based on decoding energy-quality curves rather than the conventional rate-quality curves. Pareto… ▽ More Adaptive video streaming is a key enabler for optimising the delivery of offline encoded video content. The research focus to date has been on optimisation, based solely on rate-quality curves. This paper adds an additional dimension, the energy expenditure, and explores construction of bitrate ladders based on decoding energy-quality curves rather than the conventional rate-quality curves. Pareto fronts are extracted from the rate-quality and energy-quality spaces to select optimal points. Bitrate ladders are constructed from these points using conventional rate-based rules together with a novel quality-based approach. Evaluation on a subset of YouTube-UGC videos encoded with x.265 shows that the energy-quality ladders reduce energy requirements by 28-31% on average at the cost of slightly higher bitrates. The results indicate that optimising based on energy-quality curves rather than rate-quality curves and using quality levels to create the rungs could potentially improve energy efficiency for a comparable quality of experience. △ Less

Submitted 10 February, 2024; originally announced February 2024.

Comments: 6, submitted to a conference

arXiv:2402.01970 [pdf, other]

BVI-Lowlight: Fully Registered Benchmark Dataset for Low-Light Video Enhancement

Authors: Nantheera Anantrasirichai, Ruirui Lin, Alexandra Malyugina, David Bull

Abstract: Low-light videos often exhibit spatiotemporal incoherent noise, leading to poor visibility and compromised performance across various computer vision applications. One significant challenge in enhancing such content using modern technologies is the scarcity of training data. This paper introduces a novel low-light video dataset, consisting of 40 scenes captured in various motion scenarios under tw… ▽ More Low-light videos often exhibit spatiotemporal incoherent noise, leading to poor visibility and compromised performance across various computer vision applications. One significant challenge in enhancing such content using modern technologies is the scarcity of training data. This paper introduces a novel low-light video dataset, consisting of 40 scenes captured in various motion scenarios under two distinct low-lighting conditions, incorporating genuine noise and temporal artifacts. We provide fully registered ground truth data captured in normal light using a programmable motorized dolly, and subsequently, refine them via image-based post-processing to ensure the pixel-wise alignment of frames in different light levels. This paper also presents an exhaustive analysis of the low-light dataset, and demonstrates the extensive and representative nature of our dataset in the context of supervised learning. Our experimental results demonstrate the significance of fully registered video pairs in the development of low-light video enhancement methods and the need for comprehensive evaluation. Our dataset is available at DOI:10.21227/mzny-8c77. △ Less

Submitted 25 May, 2024; v1 submitted 2 February, 2024; originally announced February 2024.

arXiv:2402.01596 [pdf, other]

Immersive Video Compression using Implicit Neural Representations

Authors: Ho Man Kwan, Fan Zhang, Andrew Gower, David Bull

Abstract: Recent work on implicit neural representations (INRs) has evidenced their potential for efficiently representing and encoding conventional video content. In this paper we, for the first time, extend their application to immersive (multi-view) videos, by proposing MV-HiNeRV, a new INR-based immersive video codec. MV-HiNeRV is an enhanced version of a state-of-the-art INR-based video codec, HiNeRV,… ▽ More Recent work on implicit neural representations (INRs) has evidenced their potential for efficiently representing and encoding conventional video content. In this paper we, for the first time, extend their application to immersive (multi-view) videos, by proposing MV-HiNeRV, a new INR-based immersive video codec. MV-HiNeRV is an enhanced version of a state-of-the-art INR-based video codec, HiNeRV, which was developed for single-view video compression. We have modified the model to learn a different group of feature grids for each view, and share the learnt network parameters among all views. This enables the model to effectively exploit the spatio-temporal and the inter-view redundancy that exists within multi-view videos. The proposed codec was used to compress multi-view texture and depth video sequences in the MPEG Immersive Video (MIV) Common Test Conditions, and tested against the MIV Test model (TMIV) that uses the VVenC video codec. The results demonstrate the superior performance of MV-HiNeRV, with significant coding gains (up to 72.33\%) over TMIV. The implementation of MV-HiNeRV is published for further development and evaluation. △ Less

Submitted 23 February, 2024; v1 submitted 2 February, 2024; originally announced February 2024.

arXiv:2401.00523 [pdf, other]

Compressing Deep Image Super-resolution Models

Authors: Yuxuan Jiang, Jakub Nawala, Fan Zhang, David Bull

Abstract: Deep learning techniques have been applied in the context of image super-resolution (SR), achieving remarkable advances in terms of reconstruction performance. Existing techniques typically employ highly complex model structures which result in large model sizes and slow inference speeds. This often leads to high energy consumption and restricts their adoption for practical applications. To addres… ▽ More Deep learning techniques have been applied in the context of image super-resolution (SR), achieving remarkable advances in terms of reconstruction performance. Existing techniques typically employ highly complex model structures which result in large model sizes and slow inference speeds. This often leads to high energy consumption and restricts their adoption for practical applications. To address this issue, this work employs a three-stage workflow for compressing deep SR models which significantly reduces their memory requirement. Restoration performance has been maintained through teacher-student knowledge distillation using a newly designed distillation loss. We have applied this approach to two popular image super-resolution networks, SwinIR and EDSR, to demonstrate its effectiveness. The resulting compact models, SwinIRmini and EDSRmini, attain an 89% and 96% reduction in both model size and floating-point operations (FLOPs) respectively, compared to their original versions. They also retain competitive super-resolution performance compared to their original models and other commonly used SR approaches. The source code and pre-trained models for these two lightweight SR approaches are released at https://pikapi22.github.io/CDISM/. △ Less

Submitted 21 February, 2024; v1 submitted 31 December, 2023; originally announced January 2024.

arXiv:2312.12317 [pdf, other]

doi 10.1109/PCS60826.2024.10566416

Full-reference Video Quality Assessment for User Generated Content Transcoding

Authors: Zihao Qi, Chen Feng, Duolikun Danier, Fan Zhang, Xiaozhong Xu, Shan Liu, David Bull

Abstract: Unlike video coding for professional content, the delivery pipeline of User Generated Content (UGC) involves transcoding where unpristine reference content needs to be compressed repeatedly. In this work, we observe that existing full-/no-reference quality metrics fail to accurately predict the perceptual quality difference between transcoded UGC content and the corresponding unpristine references… ▽ More Unlike video coding for professional content, the delivery pipeline of User Generated Content (UGC) involves transcoding where unpristine reference content needs to be compressed repeatedly. In this work, we observe that existing full-/no-reference quality metrics fail to accurately predict the perceptual quality difference between transcoded UGC content and the corresponding unpristine references. Therefore, they are unsuited for guiding the rate-distortion optimisation process in the transcoding process. In this context, we propose a bespoke full-reference deep video quality metric for UGC transcoding. The proposed method features a transcoding-specific weakly supervised training strategy employing a quality ranking-based Siamese structure. The proposed method is evaluated on the YouTube-UGC VP9 subset and the LIVE-Wild database, demonstrating state-of-the-art performance compared to existing VQA methods. △ Less

Submitted 19 December, 2023; originally announced December 2023.

Comments: 5 pages, 4 figures

arXiv:2312.12150 [pdf, ps, other]

Comparative Study of Hardware and Software Power Measurements in Video Compression

Authors: Angeliki Katsenou, Xinyi Wang, Daniel Schien, David Bull

Abstract: The environmental impact of video streaming services has been discussed as part of the strategies towards sustainable information and communication technologies. A first step towards that is the energy profiling and assessment of energy consumption of existing video technologies. This paper presents a comprehensive study of power measurement techniques in video compression, comparing the use of ha… ▽ More The environmental impact of video streaming services has been discussed as part of the strategies towards sustainable information and communication technologies. A first step towards that is the energy profiling and assessment of energy consumption of existing video technologies. This paper presents a comprehensive study of power measurement techniques in video compression, comparing the use of hardware and software power meters. An experimental methodology to ensure reliability of measurements is introduced. Key findings demonstrate the high correlation of hardware and software based energy measurements for two video codecs across different spatial and temporal resolutions at a lower computational overhead. △ Less

Submitted 19 December, 2023; originally announced December 2023.

Comments: 5 pages

arXiv:2312.08864 [pdf, other]

RankDVQA-mini: Knowledge Distillation-Driven Deep Video Quality Assessment

Authors: Chen Feng, Duolikun Danier, Haoran Wang, Fan Zhang, Benoit Vallade, Alex Mackin, David Bull

Abstract: Deep learning-based video quality assessment (deep VQA) has demonstrated significant potential in surpassing conventional metrics, with promising improvements in terms of correlation with human perception. However, the practical deployment of such deep VQA models is often limited due to their high computational complexity and large memory requirements. To address this issue, we aim to significantl… ▽ More Deep learning-based video quality assessment (deep VQA) has demonstrated significant potential in surpassing conventional metrics, with promising improvements in terms of correlation with human perception. However, the practical deployment of such deep VQA models is often limited due to their high computational complexity and large memory requirements. To address this issue, we aim to significantly reduce the model size and runtime of one of the state-of-the-art deep VQA methods, RankDVQA, by employing a two-phase workflow that integrates pruning-driven model compression with multi-level knowledge distillation. The resulting lightweight full reference quality metric, RankDVQA-mini, requires less than 10% of the model parameters compared to its full version (14% in terms of FLOPs), while still retaining a quality prediction performance that is superior to most existing deep VQA methods. The source code of the RankDVQA-mini has been released at https://chenfeng-bristol.github.io/RankDVQA-mini/ for public evaluation. △ Less

Submitted 7 March, 2024; v1 submitted 14 December, 2023; originally announced December 2023.

Comments: The paper has been accepted by Picture Coding Symposium (PCS) 2024

arXiv:2312.08859 [pdf, other]

BVI-Artefact: An Artefact Detection Benchmark Dataset for Streamed Videos

Authors: Chen Feng, Duolikun Danier, Fan Zhang, Alex Mackin, Andy Collins, David Bull

Abstract: Professionally generated content (PGC) streamed online can contain visual artefacts that degrade the quality of user experience. These artefacts arise from different stages of the streaming pipeline, including acquisition, post-production, compression, and transmission. To better guide streaming experience enhancement, it is important to detect specific artefacts at the user end in the absence of… ▽ More Professionally generated content (PGC) streamed online can contain visual artefacts that degrade the quality of user experience. These artefacts arise from different stages of the streaming pipeline, including acquisition, post-production, compression, and transmission. To better guide streaming experience enhancement, it is important to detect specific artefacts at the user end in the absence of a pristine reference. In this work, we address the lack of a comprehensive benchmark for artefact detection within streamed PGC, via the creation and validation of a large database, BVI-Artefact. Considering the ten most relevant artefact types encountered in video streaming, we collected and generated 480 video sequences, each containing various artefacts with associated binary artefact labels. Based on this new database, existing artefact detection methods are benchmarked, with results showing the challenging nature of this tasks and indicating the requirement of more reliable artefact detection methods. To facilitate further research in this area, we have made BVI-Artifact publicly available at https://chenfeng-bristol.github.io/BVI-Artefact/ △ Less

Submitted 7 March, 2024; v1 submitted 14 December, 2023; originally announced December 2023.

Comments: The paper has been accepted by Picture Coding Symposium (PCS) 2024

arXiv:2312.02605 [pdf, other]

doi 10.1109/PCS60826.2024.10566283

Accelerating Learnt Video Codecs with Gradient Decay and Layer-wise Distillation

Authors: Tianhao Peng, Ge Gao, Heming Sun, Fan Zhang, David Bull

Abstract: In recent years, end-to-end learnt video codecs have demonstrated their potential to compete with conventional coding algorithms in term of compression efficiency. However, most learning-based video compression models are associated with high computational complexity and latency, in particular at the decoder side, which limits their deployment in practical applications. In this paper, we present a… ▽ More In recent years, end-to-end learnt video codecs have demonstrated their potential to compete with conventional coding algorithms in term of compression efficiency. However, most learning-based video compression models are associated with high computational complexity and latency, in particular at the decoder side, which limits their deployment in practical applications. In this paper, we present a novel model-agnostic pruning scheme based on gradient decay and adaptive layer-wise distillation. Gradient decay enhances parameter exploration during sparsification whilst preventing runaway sparsity and is superior to the standard Straight-Through Estimation. The adaptive layer-wise distillation regulates the sparse training in various stages based on the distortion of intermediate features. This stage-wise design efficiently updates parameters with minimal computational overhead. The proposed approach has been applied to three popular end-to-end learnt video codecs, FVC, DCVC, and DCVC-HEM. Results confirm that our method yields up to 65% reduction in MACs and 2x speed-up with less than 0.3dB drop in BD-PSNR. Supporting code and supplementary material can be downloaded from: https://jasminepp.github.io/lightweightdvc/ △ Less

Submitted 5 December, 2023; originally announced December 2023.

Report number: 2312.02605

arXiv:2312.02218 [pdf, other]

WavePlanes: A compact Wavelet representation for Dynamic Neural Radiance Fields

Authors: Adrian Azzarelli, Nantheera Anantrasirichai, David R Bull

Abstract: Dynamic Neural Radiance Fields (Dynamic NeRF) enhance NeRF technology to model moving scenes. However, they are resource intensive and challenging to compress. To address these issues, this paper presents WavePlanes, a fast and more compact explicit model. We propose a multi-scale space and space-time feature plane representation using N-level 2-D wavelet coefficients. The inverse discrete wavelet… ▽ More Dynamic Neural Radiance Fields (Dynamic NeRF) enhance NeRF technology to model moving scenes. However, they are resource intensive and challenging to compress. To address these issues, this paper presents WavePlanes, a fast and more compact explicit model. We propose a multi-scale space and space-time feature plane representation using N-level 2-D wavelet coefficients. The inverse discrete wavelet transform reconstructs feature signals at varying detail, which are linearly decoded to approximate the color and density of volumes in a 4-D grid. Exploiting the sparsity of wavelet coefficients, we compress the model using a Hash Map containing only non-zero coefficients and their locations on each plane. Compared to the state-of-the-art (SotA) plane-based models, WavePlanes is up to 15x smaller while being less resource demanding and competitive in performance and training time. Compared to other small SotA models WavePlanes preserves details better without requiring custom CUDA code or high performance computing resources. Our code is available at: https://github.com/azzarelli/waveplanes/ △ Less

Submitted 8 May, 2024; v1 submitted 3 December, 2023; originally announced December 2023.

arXiv:2309.08975 [pdf, other]

Wavelet-based Topological Loss for Low-Light Image Denoising

Authors: Alexandra Malyugina, Nantheera Anantrasirichai, David Bull

Abstract: Despite extensive research conducted in the field of image denoising, many algorithms still heavily depend on supervised learning and their effectiveness primarily relies on the quality and diversity of training data. It is widely assumed that digital image distortions are caused by spatially invariant Additive White Gaussian Noise (AWGN). However, the analysis of real-world data suggests that thi… ▽ More Despite extensive research conducted in the field of image denoising, many algorithms still heavily depend on supervised learning and their effectiveness primarily relies on the quality and diversity of training data. It is widely assumed that digital image distortions are caused by spatially invariant Additive White Gaussian Noise (AWGN). However, the analysis of real-world data suggests that this assumption is invalid. Therefore, this paper tackles image corruption by real noise, providing a framework to capture and utilise the underlying structural information of an image along with the spatial information conventionally used for deep learning tasks. We propose a novel denoising loss function that incorporates topological invariants and is informed by textural information extracted from the image wavelet domain. The effectiveness of this proposed method was evaluated by training state-of-the-art denoising models on the BVI-Lowlight dataset, which features a wide range of real noise distortions. Adding a topological term to common loss functions leads to a significant increase in the LPIPS (Learned Perceptual Image Patch Similarity) metric, with the improvement reaching up to 25\%. The results indicate that the proposed loss function enables neural networks to learn noise characteristics better. We demonstrate that they can consequently extract the topological features of noise-free images, resulting in enhanced contrast and preserved textural information. △ Less

Submitted 20 September, 2023; v1 submitted 16 September, 2023; originally announced September 2023.

arXiv:2308.06853 [pdf, other]

UGC Quality Assessment: Exploring the Impact of Saliency in Deep Feature-Based Quality Assessment

Authors: Xinyi Wang, Angeliki Katsenou, David Bull

Abstract: The volume of User Generated Content (UGC) has increased in recent years. The challenge with this type of content is assessing its quality. So far, the state-of-the-art metrics are not exhibiting a very high correlation with perceptual quality. In this paper, we explore state-of-the-art metrics that extract/combine natural scene statistics and deep neural network features. We experiment with these… ▽ More The volume of User Generated Content (UGC) has increased in recent years. The challenge with this type of content is assessing its quality. So far, the state-of-the-art metrics are not exhibiting a very high correlation with perceptual quality. In this paper, we explore state-of-the-art metrics that extract/combine natural scene statistics and deep neural network features. We experiment with these by introducing saliency maps to improve perceptibility. We train and test our models using public datasets, namely, YouTube-UGC and KoNViD-1k. Preliminary results indicate that high correlations are achieved by using only deep features while adding saliency is not always boosting the performance. Our results and code will be made publicly available to serve as a benchmark for the research community and can be found on our project page: https://github.com/xinyiW915/SPIE-2023-Supplementary. △ Less

Submitted 13 August, 2023; originally announced August 2023.

arXiv:2306.09818 [pdf, other]

doi 10.5555/3666122.3669299

HiNeRV: Video Compression with Hierarchical Encoding-based Neural Representation

Authors: Ho Man Kwan, Ge Gao, Fan Zhang, Andrew Gower, David Bull

Abstract: Learning-based video compression is currently a popular research topic, offering the potential to compete with conventional standard video codecs. In this context, Implicit Neural Representations (INRs) have previously been used to represent and compress image and video content, demonstrating relatively high decoding speed compared to other methods. However, existing INR-based methods have failed… ▽ More Learning-based video compression is currently a popular research topic, offering the potential to compete with conventional standard video codecs. In this context, Implicit Neural Representations (INRs) have previously been used to represent and compress image and video content, demonstrating relatively high decoding speed compared to other methods. However, existing INR-based methods have failed to deliver rate quality performance comparable with the state of the art in video compression. This is mainly due to the simplicity of the employed network architectures, which limit their representation capability. In this paper, we propose HiNeRV, an INR that combines light weight layers with novel hierarchical positional encodings. We employs depth-wise convolutional, MLP and interpolation layers to build the deep and wide network architecture with high capacity. HiNeRV is also a unified representation encoding videos in both frames and patches at the same time, which offers higher performance and flexibility than existing methods. We further build a video codec based on HiNeRV and a refined pipeline for training, pruning and quantization that can better preserve HiNeRV's performance during lossy model compression. The proposed method has been evaluated on both UVG and MCL-JCV datasets for video compression, demonstrating significant improvement over all existing INRs baselines and competitive performance when compared to learning-based codecs (72.3% overall bit rate saving over HNeRV and 43.4% over DCVC on the UVG dataset, measured in PSNR). △ Less

Submitted 26 January, 2024; v1 submitted 16 June, 2023; originally announced June 2023.

arXiv:2306.09333 [pdf, other]

doi 10.1126/science.adi7877

Dynamics of magnetization at infinite temperature in a Heisenberg spin chain

Authors: Eliott Rosenberg, Trond Andersen, Rhine Samajdar, Andre Petukhov, Jesse Hoke, Dmitry Abanin, Andreas Bengtsson, Ilya Drozdov, Catherine Erickson, Paul Klimov, Xiao Mi, Alexis Morvan, Matthew Neeley, Charles Neill, Rajeev Acharya, Richard Allen, Kyle Anderson, Markus Ansmann, Frank Arute, Kunal Arya, Abraham Asfaw, Juan Atalaya, Joseph Bardin, A. Bilmes, Gina Bortoli , et al. (156 additional authors not shown)

Abstract: Understanding universal aspects of quantum dynamics is an unresolved problem in statistical mechanics. In particular, the spin dynamics of the 1D Heisenberg model were conjectured to belong to the Kardar-Parisi-Zhang (KPZ) universality class based on the scaling of the infinite-temperature spin-spin correlation function. In a chain of 46 superconducting qubits, we study the probability distributio… ▽ More Understanding universal aspects of quantum dynamics is an unresolved problem in statistical mechanics. In particular, the spin dynamics of the 1D Heisenberg model were conjectured to belong to the Kardar-Parisi-Zhang (KPZ) universality class based on the scaling of the infinite-temperature spin-spin correlation function. In a chain of 46 superconducting qubits, we study the probability distribution, $P(\mathcal{M})$, of the magnetization transferred across the chain's center. The first two moments of $P(\mathcal{M})$ show superdiffusive behavior, a hallmark of KPZ universality. However, the third and fourth moments rule out the KPZ conjecture and allow for evaluating other theories. Our results highlight the importance of studying higher moments in determining dynamic universality classes and provide key insights into universal behavior in quantum systems. △ Less

Submitted 4 April, 2024; v1 submitted 15 June, 2023; originally announced June 2023.

Journal ref: Science 384, 48-53 (2024)

arXiv:2305.18079 [pdf, other]

Towards a Robust Framework for NeRF Evaluation

Authors: Adrian Azzarelli, Nantheera Anantrasirichai, David R Bull

Abstract: Neural Radiance Field (NeRF) research has attracted significant attention recently, with 3D modelling, virtual/augmented reality, and visual effects driving its application. While current NeRF implementations can produce high quality visual results, there is a conspicuous lack of reliable methods for evaluating them. Conventional image quality assessment methods and analytical metrics (e.g. PSNR,… ▽ More Neural Radiance Field (NeRF) research has attracted significant attention recently, with 3D modelling, virtual/augmented reality, and visual effects driving its application. While current NeRF implementations can produce high quality visual results, there is a conspicuous lack of reliable methods for evaluating them. Conventional image quality assessment methods and analytical metrics (e.g. PSNR, SSIM, LPIPS etc.) only provide approximate indicators of performance since they generalise the ability of the entire NeRF pipeline. Hence, in this paper, we propose a new test framework which isolates the neural rendering network from the NeRF pipeline and then performs a parametric evaluation by training and evaluating the NeRF on an explicit radiance field representation. We also introduce a configurable approach for generating representations specifically for evaluation purposes. This employs ray-casting to transform mesh models into explicit NeRF samples, as well as to "shade" these representations. Combining these two approaches, we demonstrate how different "tasks" (scenes with different visual effects or learning strategies) and types of networks (NeRFs and depth-wise implicit neural representations (INRs)) can be evaluated within this framework. Additionally, we propose a novel metric to measure task complexity of the framework which accounts for the visual parameters and the distribution of the spatial data. Our approach offers the potential to create a comparative objective evaluation framework for NeRF methods. △ Less

Submitted 31 May, 2023; v1 submitted 29 May, 2023; originally announced May 2023.

Comments: 9 pages, 2 main experiments, 2 additional experiments

arXiv:2304.13878 [pdf, other]

doi 10.1126/science.adh9932

Stable Quantum-Correlated Many Body States through Engineered Dissipation

Authors: X. Mi, A. A. Michailidis, S. Shabani, K. C. Miao, P. V. Klimov, J. Lloyd, E. Rosenberg, R. Acharya, I. Aleiner, T. I. Andersen, M. Ansmann, F. Arute, K. Arya, A. Asfaw, J. Atalaya, J. C. Bardin, A. Bengtsson, G. Bortoli, A. Bourassa, J. Bovaird, L. Brill, M. Broughton, B. B. Buckley, D. A. Buell, T. Burger , et al. (142 additional authors not shown)

Abstract: Engineered dissipative reservoirs have the potential to steer many-body quantum systems toward correlated steady states useful for quantum simulation of high-temperature superconductivity or quantum magnetism. Using up to 49 superconducting qubits, we prepared low-energy states of the transverse-field Ising model through coupling to dissipative auxiliary qubits. In one dimension, we observed long-… ▽ More Engineered dissipative reservoirs have the potential to steer many-body quantum systems toward correlated steady states useful for quantum simulation of high-temperature superconductivity or quantum magnetism. Using up to 49 superconducting qubits, we prepared low-energy states of the transverse-field Ising model through coupling to dissipative auxiliary qubits. In one dimension, we observed long-range quantum correlations and a ground-state fidelity of 0.86 for 18 qubits at the critical point. In two dimensions, we found mutual information that extends beyond nearest neighbors. Lastly, by coupling the system to auxiliaries emulating reservoirs with different chemical potentials, we explored transport in the quantum Heisenberg model. Our results establish engineered dissipation as a scalable alternative to unitary evolution for preparing entangled many-body states on noisy quantum processors. △ Less

Submitted 5 April, 2024; v1 submitted 26 April, 2023; originally announced April 2023.

Journal ref: Science 383, 1332-1337 (2024)

arXiv:2304.11119 [pdf, other]

Phase transition in Random Circuit Sampling

Authors: A. Morvan, B. Villalonga, X. Mi, S. Mandrà, A. Bengtsson, P. V. Klimov, Z. Chen, S. Hong, C. Erickson, I. K. Drozdov, J. Chau, G. Laun, R. Movassagh, A. Asfaw, L. T. A. N. Brandão, R. Peralta, D. Abanin, R. Acharya, R. Allen, T. I. Andersen, K. Anderson, M. Ansmann, F. Arute, K. Arya, J. Atalaya , et al. (160 additional authors not shown)

Abstract: Undesired coupling to the surrounding environment destroys long-range correlations on quantum processors and hinders the coherent evolution in the nominally available computational space. This incoherent noise is an outstanding challenge to fully leverage the computation power of near-term quantum processors. It has been shown that benchmarking Random Circuit Sampling (RCS) with Cross-Entropy Benc… ▽ More Undesired coupling to the surrounding environment destroys long-range correlations on quantum processors and hinders the coherent evolution in the nominally available computational space. This incoherent noise is an outstanding challenge to fully leverage the computation power of near-term quantum processors. It has been shown that benchmarking Random Circuit Sampling (RCS) with Cross-Entropy Benchmarking (XEB) can provide a reliable estimate of the effective size of the Hilbert space coherently available. The extent to which the presence of noise can trivialize the outputs of a given quantum algorithm, i.e. making it spoofable by a classical computation, is an unanswered question. Here, by implementing an RCS algorithm we demonstrate experimentally that there are two phase transitions observable with XEB, which we explain theoretically with a statistical model. The first is a dynamical transition as a function of the number of cycles and is the continuation of the anti-concentration point in the noiseless case. The second is a quantum phase transition controlled by the error per cycle; to identify it analytically and experimentally, we create a weak link model which allows varying the strength of noise versus coherent evolution. Furthermore, by presenting an RCS experiment with 67 qubits at 32 cycles, we demonstrate that the computational cost of our experiment is beyond the capabilities of existing classical supercomputers, even when accounting for the inevitable presence of noise. Our experimental and theoretical work establishes the existence of transitions to a stable computationally complex phase that is reachable with current quantum processors. △ Less

Submitted 21 December, 2023; v1 submitted 21 April, 2023; originally announced April 2023.

arXiv:2303.09508 [pdf, other]

doi 10.1609/aaai.v38i2.27912

LDMVFI: Video Frame Interpolation with Latent Diffusion Models

Authors: Duolikun Danier, Fan Zhang, David Bull

Abstract: Existing works on video frame interpolation (VFI) mostly employ deep neural networks that are trained by minimizing the L1, L2, or deep feature space distance (e.g. VGG loss) between their outputs and ground-truth frames. However, recent works have shown that these metrics are poor indicators of perceptual VFI quality. Towards developing perceptually-oriented VFI methods, in this work we propose l… ▽ More Existing works on video frame interpolation (VFI) mostly employ deep neural networks that are trained by minimizing the L1, L2, or deep feature space distance (e.g. VGG loss) between their outputs and ground-truth frames. However, recent works have shown that these metrics are poor indicators of perceptual VFI quality. Towards developing perceptually-oriented VFI methods, in this work we propose latent diffusion model-based VFI, LDMVFI. This approaches the VFI problem from a generative perspective by formulating it as a conditional generation problem. As the first effort to address VFI using latent diffusion models, we rigorously benchmark our method on common test sets used in the existing VFI literature. Our quantitative experiments and user study indicate that LDMVFI is able to interpolate video content with favorable perceptual quality compared to the state of the art, even in the high-resolution regime. Our code is available at https://github.com/danier97/LDMVFI. △ Less

Submitted 11 December, 2023; v1 submitted 16 March, 2023; originally announced March 2023.

Comments: Accepted by AAAI 2024

arXiv:2303.04792 [pdf, other]

doi 10.1038/s41586-023-06505-7

Measurement-induced entanglement and teleportation on a noisy quantum processor

Authors: Jesse C. Hoke, Matteo Ippoliti, Eliott Rosenberg, Dmitry Abanin, Rajeev Acharya, Trond I. Andersen, Markus Ansmann, Frank Arute, Kunal Arya, Abraham Asfaw, Juan Atalaya, Joseph C. Bardin, Andreas Bengtsson, Gina Bortoli, Alexandre Bourassa, Jenna Bovaird, Leon Brill, Michael Broughton, Bob B. Buckley, David A. Buell, Tim Burger, Brian Burkett, Nicholas Bushnell, Zijun Chen, Ben Chiaro , et al. (138 additional authors not shown)

Abstract: Measurement has a special role in quantum theory: by collapsing the wavefunction it can enable phenomena such as teleportation and thereby alter the "arrow of time" that constrains unitary evolution. When integrated in many-body dynamics, measurements can lead to emergent patterns of quantum information in space-time that go beyond established paradigms for characterizing phases, either in or out… ▽ More Measurement has a special role in quantum theory: by collapsing the wavefunction it can enable phenomena such as teleportation and thereby alter the "arrow of time" that constrains unitary evolution. When integrated in many-body dynamics, measurements can lead to emergent patterns of quantum information in space-time that go beyond established paradigms for characterizing phases, either in or out of equilibrium. On present-day NISQ processors, the experimental realization of this physics is challenging due to noise, hardware limitations, and the stochastic nature of quantum measurement. Here we address each of these experimental challenges and investigate measurement-induced quantum information phases on up to 70 superconducting qubits. By leveraging the interchangeability of space and time, we use a duality mapping, to avoid mid-circuit measurement and access different manifestations of the underlying phases -- from entanglement scaling to measurement-induced teleportation -- in a unified way. We obtain finite-size signatures of a phase transition with a decoding protocol that correlates the experimental measurement record with classical simulation data. The phases display sharply different sensitivity to noise, which we exploit to turn an inherent hardware limitation into a useful diagnostic. Our work demonstrates an approach to realize measurement-induced physics at scales that are at the limits of current NISQ processors. △ Less

Submitted 17 October, 2023; v1 submitted 8 March, 2023; originally announced March 2023.

Journal ref: Nature 622, 481-486 (2023)

arXiv:2302.08455 [pdf, other]

ST-MFNet Mini: Knowledge Distillation-Driven Frame Interpolation

Authors: Crispian Morris, Duolikun Danier, Fan Zhang, Nantheera Anantrasirichai, David R. Bull

Abstract: Currently, one of the major challenges in deep learning-based video frame interpolation (VFI) is the large model sizes and high computational complexity associated with many high performance VFI approaches. In this paper, we present a distillation-based two-stage workflow for obtaining compressed VFI models which perform competitively to the state of the arts, at a greatly reduced model size and c… ▽ More Currently, one of the major challenges in deep learning-based video frame interpolation (VFI) is the large model sizes and high computational complexity associated with many high performance VFI approaches. In this paper, we present a distillation-based two-stage workflow for obtaining compressed VFI models which perform competitively to the state of the arts, at a greatly reduced model size and complexity. Specifically, an optimisation-based network pruning method is first applied to a recently proposed frame interpolation model, ST-MFNet, which outperforms many other VFI methods but suffers from large model size. The resulting new network architecture achieves a 91% reduction in parameters and 35% increase in speed. Secondly, the performance of the new network is further enhanced through a teacher-student knowledge distillation training process using a Laplacian distillation loss. The final low complexity model, ST-MFNet Mini, achieves a comparable performance to most existing high-complex VFI methods, only outperformed by the original ST-MFNet. Our source code is available at https://github.com/crispianm/ST-MFNet-Mini △ Less

Submitted 23 February, 2023; v1 submitted 16 February, 2023; originally announced February 2023.

arXiv:2211.04728 [pdf, other]

doi 10.1038/s41567-023-02226-w

Overcoming leakage in scalable quantum error correction

Authors: Kevin C. Miao, Matt McEwen, Juan Atalaya, Dvir Kafri, Leonid P. Pryadko, Andreas Bengtsson, Alex Opremcak, Kevin J. Satzinger, Zijun Chen, Paul V. Klimov, Chris Quintana, Rajeev Acharya, Kyle Anderson, Markus Ansmann, Frank Arute, Kunal Arya, Abraham Asfaw, Joseph C. Bardin, Alexandre Bourassa, Jenna Bovaird, Leon Brill, Bob B. Buckley, David A. Buell, Tim Burger, Brian Burkett , et al. (92 additional authors not shown)

Abstract: Leakage of quantum information out of computational states into higher energy states represents a major challenge in the pursuit of quantum error correction (QEC). In a QEC circuit, leakage builds over time and spreads through multi-qubit interactions. This leads to correlated errors that degrade the exponential suppression of logical error with scale, challenging the feasibility of QEC as a path… ▽ More Leakage of quantum information out of computational states into higher energy states represents a major challenge in the pursuit of quantum error correction (QEC). In a QEC circuit, leakage builds over time and spreads through multi-qubit interactions. This leads to correlated errors that degrade the exponential suppression of logical error with scale, challenging the feasibility of QEC as a path towards fault-tolerant quantum computation. Here, we demonstrate the execution of a distance-3 surface code and distance-21 bit-flip code on a Sycamore quantum processor where leakage is removed from all qubits in each cycle. This shortens the lifetime of leakage and curtails its ability to spread and induce correlated errors. We report a ten-fold reduction in steady-state leakage population on the data qubits encoding the logical state and an average leakage population of less than $1 \times 10^{-3}$ throughout the entire device. The leakage removal process itself efficiently returns leakage population back to the computational basis, and adding it to a code circuit prevents leakage from inducing correlated error across cycles, restoring a fundamental assumption of QEC. With this demonstration that leakage can be contained, we resolve a key challenge for practical QEC at scale. △ Less

Submitted 9 November, 2022; originally announced November 2022.

Comments: Main text: 7 pages, 5 figures

arXiv:2210.10799 [pdf, other]

doi 10.1038/s41567-023-02240-y

Purification-based quantum error mitigation of pair-correlated electron simulations

Authors: T. E. O'Brien, G. Anselmetti, F. Gkritsis, V. E. Elfving, S. Polla, W. J. Huggins, O. Oumarou, K. Kechedzhi, D. Abanin, R. Acharya, I. Aleiner, R. Allen, T. I. Andersen, K. Anderson, M. Ansmann, F. Arute, K. Arya, A. Asfaw, J. Atalaya, D. Bacon, J. C. Bardin, A. Bengtsson, S. Boixo, G. Bortoli, A. Bourassa , et al. (151 additional authors not shown)

Abstract: An important measure of the development of quantum computing platforms has been the simulation of increasingly complex physical systems. Prior to fault-tolerant quantum computing, robust error mitigation strategies are necessary to continue this growth. Here, we study physical simulation within the seniority-zero electron pairing subspace, which affords both a computational stepping stone to a ful… ▽ More An important measure of the development of quantum computing platforms has been the simulation of increasingly complex physical systems. Prior to fault-tolerant quantum computing, robust error mitigation strategies are necessary to continue this growth. Here, we study physical simulation within the seniority-zero electron pairing subspace, which affords both a computational stepping stone to a fully correlated model, and an opportunity to validate recently introduced ``purification-based'' error-mitigation strategies. We compare the performance of error mitigation based on doubling quantum resources in time (echo verification) or in space (virtual distillation), on up to $20$ qubits of a superconducting qubit quantum processor. We observe a reduction of error by one to two orders of magnitude below less sophisticated techniques (e.g. post-selection); the gain from error mitigation is seen to increase with the system size. Employing these error mitigation strategies enables the implementation of the largest variational algorithm for a correlated chemistry system to-date. Extrapolating performance from these results allows us to estimate minimum requirements for a beyond-classical simulation of electronic structure. We find that, despite the impressive gains from purification-based error mitigation, significant hardware improvements will be required for classically intractable variational chemistry simulations. △ Less

Submitted 19 October, 2022; originally announced October 2022.

Comments: 10 pages, 13 page supplementary material, 12 figures. Experimental data available at https://doi.org/10.5281/zenodo.7225821

Journal ref: Nat. Phys. (2023)

arXiv:2210.10255 [pdf, other]

Non-Abelian braiding of graph vertices in a superconducting processor

Authors: Trond I. Andersen, Yuri D. Lensky, Kostyantyn Kechedzhi, Ilya Drozdov, Andreas Bengtsson, Sabrina Hong, Alexis Morvan, Xiao Mi, Alex Opremcak, Rajeev Acharya, Richard Allen, Markus Ansmann, Frank Arute, Kunal Arya, Abraham Asfaw, Juan Atalaya, Ryan Babbush, Dave Bacon, Joseph C. Bardin, Gina Bortoli, Alexandre Bourassa, Jenna Bovaird, Leon Brill, Michael Broughton, Bob B. Buckley , et al. (144 additional authors not shown)

Abstract: Indistinguishability of particles is a fundamental principle of quantum mechanics. For all elementary and quasiparticles observed to date - including fermions, bosons, and Abelian anyons - this principle guarantees that the braiding of identical particles leaves the system unchanged. However, in two spatial dimensions, an intriguing possibility exists: braiding of non-Abelian anyons causes rotatio… ▽ More Indistinguishability of particles is a fundamental principle of quantum mechanics. For all elementary and quasiparticles observed to date - including fermions, bosons, and Abelian anyons - this principle guarantees that the braiding of identical particles leaves the system unchanged. However, in two spatial dimensions, an intriguing possibility exists: braiding of non-Abelian anyons causes rotations in a space of topologically degenerate wavefunctions. Hence, it can change the observables of the system without violating the principle of indistinguishability. Despite the well developed mathematical description of non-Abelian anyons and numerous theoretical proposals, the experimental observation of their exchange statistics has remained elusive for decades. Controllable many-body quantum states generated on quantum processors offer another path for exploring these fundamental phenomena. While efforts on conventional solid-state platforms typically involve Hamiltonian dynamics of quasi-particles, superconducting quantum processors allow for directly manipulating the many-body wavefunction via unitary gates. Building on predictions that stabilizer codes can host projective non-Abelian Ising anyons, we implement a generalized stabilizer code and unitary protocol to create and braid them. This allows us to experimentally verify the fusion rules of the anyons and braid them to realize their statistics. We then study the prospect of employing the anyons for quantum computation and utilize braiding to create an entangled state of anyons encoding three logical qubits. Our work provides new insights about non-Abelian braiding and - through the future inclusion of error correction to achieve topological protection - could open a path toward fault-tolerant quantum computing. △ Less

Submitted 31 May, 2023; v1 submitted 18 October, 2022; originally announced October 2022.

arXiv:2210.00823 [pdf, other]

doi 10.1109/TIP.2023.3327912

BVI-VFI: A Video Quality Database for Video Frame Interpolation

Authors: Duolikun Danier, Fan Zhang, David Bull

Abstract: Video frame interpolation (VFI) is a fundamental research topic in video processing, which is currently attracting increased attention across the research community. While the development of more advanced VFI algorithms has been extensively researched, there remains little understanding of how humans perceive the quality of interpolated content and how well existing objective quality assessment me… ▽ More Video frame interpolation (VFI) is a fundamental research topic in video processing, which is currently attracting increased attention across the research community. While the development of more advanced VFI algorithms has been extensively researched, there remains little understanding of how humans perceive the quality of interpolated content and how well existing objective quality assessment methods perform when measuring the perceived quality. In order to narrow this research gap, we have developed a new video quality database named BVI-VFI, which contains 540 distorted sequences generated by applying five commonly used VFI algorithms to 36 diverse source videos with various spatial resolutions and frame rates. We collected more than 10,800 quality ratings for these videos through a large scale subjective study involving 189 human subjects. Based on the collected subjective scores, we further analysed the influence of VFI algorithms and frame rates on the perceptual quality of interpolated videos. Moreover, we benchmarked the performance of 33 classic and state-of-the-art objective image/video quality metrics on the new database, and demonstrated the urgent requirement for more accurate bespoke quality assessment methods for VFI. To facilitate further research in this area, we have made BVI-VFI publicly available at https://github.com/danier97/BVI-VFI-database. △ Less

Submitted 21 October, 2023; v1 submitted 3 October, 2022; originally announced October 2022.

Showing 1–50 of 121 results for author: Buell, D