-
Achieving multi uav best viewpoint coordination in obstructed environments
Authors:
Mirko Baglioni,
Apurva Patil,
Luis Sentis,
Anahita Jamshidnejad
Abstract:
Wildfire suppression is a complex task that poses high risks to humans. Using robotic teams for wildfire suppression enhances the safety and efficiency of detecting, monitoring, and extinguishing fires. We propose a control architecture based on task hierarchical control for the autonomous steering of a system of flying robots in wildfire suppression. We incorporate a novel line-of-sight obstacle…
▽ More
Wildfire suppression is a complex task that poses high risks to humans. Using robotic teams for wildfire suppression enhances the safety and efficiency of detecting, monitoring, and extinguishing fires. We propose a control architecture based on task hierarchical control for the autonomous steering of a system of flying robots in wildfire suppression. We incorporate a novel line-of-sight obstacle avoidance method that calculates the best viewpoints and ensures an occlusion-free view for the suppression robot during the mission. Path integral control generates optimal trajectories towards the goals. We conduct an ablation study to assess the effectiveness of our approach by comparing it to scenarios where these key components are excluded, in order to validate the approach in simulations using Matlab and Unity. The results demonstrate significant performance improvements, with 44.0 % increase in effectiveness with the new line-of-sight obstacle avoidance task and up to 39.6 % improvement when using path integral control.
△ Less
Submitted 11 October, 2024;
originally announced October 2024.
-
Semantic Segmentation Based Quality Control of Histopathology Whole Slide Images
Authors:
Abhijeet Patil,
Garima Jain,
Harsh Diwakar,
Jay Sawant,
Tripti Bameta,
Swapnil Rane,
Amit Sethi
Abstract:
We developed a software pipeline for quality control (QC) of histopathology whole slide images (WSIs) that segments various regions, such as blurs of different levels, tissue regions, tissue folds, and pen marks. Given the necessity and increasing availability of GPUs for processing WSIs, the proposed pipeline comprises multiple lightweight deep learning models to strike a balance between accuracy…
▽ More
We developed a software pipeline for quality control (QC) of histopathology whole slide images (WSIs) that segments various regions, such as blurs of different levels, tissue regions, tissue folds, and pen marks. Given the necessity and increasing availability of GPUs for processing WSIs, the proposed pipeline comprises multiple lightweight deep learning models to strike a balance between accuracy and speed. The pipeline was evaluated in all TCGAs, which is the largest publicly available WSI dataset containing more than 11,000 histopathological images from 28 organs. It was compared to a previous work, which was not based on deep learning, and it showed consistent improvement in segmentation results across organs. To minimize annotation effort for tissue and blur segmentation, annotated images were automatically prepared by mosaicking patches (sub-images) from various WSIs whose labels were identified using a patch classification tool HistoROI. Due to the generality of our trained QC pipeline and its extensive testing the potential impact of this work is broad. It can be used for automated pre-processing any WSI cohort to enhance the accuracy and reliability of large-scale histopathology image analysis for both research and clinical use. We have made the trained models, training scripts, training data, and inference results publicly available at https://github.com/abhijeetptl5/wsisegqc, which should enable the research community to use the pipeline right out of the box or further customize it to new datasets and applications in the future.
△ Less
Submitted 4 October, 2024;
originally announced October 2024.
-
MMP: Towards Robust Multi-Modal Learning with Masked Modality Projection
Authors:
Niki Nezakati,
Md Kaykobad Reza,
Ameya Patil,
Mashhour Solh,
M. Salman Asif
Abstract:
Multimodal learning seeks to combine data from multiple input sources to enhance the performance of different downstream tasks. In real-world scenarios, performance can degrade substantially if some input modalities are missing. Existing methods that can handle missing modalities involve custom training or adaptation steps for each input modality combination. These approaches are either tied to sp…
▽ More
Multimodal learning seeks to combine data from multiple input sources to enhance the performance of different downstream tasks. In real-world scenarios, performance can degrade substantially if some input modalities are missing. Existing methods that can handle missing modalities involve custom training or adaptation steps for each input modality combination. These approaches are either tied to specific modalities or become computationally expensive as the number of input modalities increases. In this paper, we propose Masked Modality Projection (MMP), a method designed to train a single model that is robust to any missing modality scenario. We achieve this by randomly masking a subset of modalities during training and learning to project available input modalities to estimate the tokens for the masked modalities. This approach enables the model to effectively learn to leverage the information from the available modalities to compensate for the missing ones, enhancing missing modality robustness. We conduct a series of experiments with various baseline models and datasets to assess the effectiveness of this strategy. Experiments demonstrate that our approach improves robustness to different missing modality scenarios, outperforming existing methods designed for missing modalities or specific modality combinations.
△ Less
Submitted 7 October, 2024; v1 submitted 3 October, 2024;
originally announced October 2024.
-
Learning Wheelchair Tennis Navigation from Broadcast Videos with Domain Knowledge Transfer and Diffusion Motion Planning
Authors:
Zixuan Wu,
Zulfiqar Zaidi,
Adithya Patil,
Qingyu Xiao,
Matthew Gombolay
Abstract:
In this paper, we propose a novel and generalizable zero-shot knowledge transfer framework that distills expert sports navigation strategies from web videos into robotic systems with adversarial constraints and out-of-distribution image trajectories. Our pipeline enables diffusion-based imitation learning by reconstructing the full 3D task space from multiple partial views, warping it into 2D imag…
▽ More
In this paper, we propose a novel and generalizable zero-shot knowledge transfer framework that distills expert sports navigation strategies from web videos into robotic systems with adversarial constraints and out-of-distribution image trajectories. Our pipeline enables diffusion-based imitation learning by reconstructing the full 3D task space from multiple partial views, warping it into 2D image space, closing the planning loop within this 2D space, and transfer constrained motion of interest back to task space. Additionally, we demonstrate that the learned policy can serve as a local planner in conjunction with position control. We apply this framework in the wheelchair tennis navigation problem to guide the wheelchair into the ball-hitting region. Our pipeline achieves a navigation success rate of 97.67% in reaching real-world recorded tennis ball trajectories with a physical robot wheelchair, and achieve a success rate of 68.49% in a real-world, real-time experiment on a full-sized tennis court.
△ Less
Submitted 29 September, 2024;
originally announced September 2024.
-
Efficient Quality Control of Whole Slide Pathology Images with Human-in-the-loop Training
Authors:
Abhijeet Patil,
Harsh Diwakar,
Jay Sawant,
Nikhil Cherian Kurian,
Subhash Yadav,
Swapnil Rane,
Tripti Bameta,
Amit Sethi
Abstract:
Histopathology whole slide images (WSIs) are being widely used to develop deep learning-based diagnostic solutions, especially for precision oncology. Most of these diagnostic softwares are vulnerable to biases and impurities in the training and test data which can lead to inaccurate diagnoses. For instance, WSIs contain multiple types of tissue regions, at least some of which might not be relevan…
▽ More
Histopathology whole slide images (WSIs) are being widely used to develop deep learning-based diagnostic solutions, especially for precision oncology. Most of these diagnostic softwares are vulnerable to biases and impurities in the training and test data which can lead to inaccurate diagnoses. For instance, WSIs contain multiple types of tissue regions, at least some of which might not be relevant to the diagnosis. We introduce HistoROI, a robust yet lightweight deep learning-based classifier to segregate WSI into six broad tissue regions -- epithelium, stroma, lymphocytes, adipose, artifacts, and miscellaneous. HistoROI is trained using a novel human-in-the-loop and active learning paradigm that ensures variations in training data for labeling-efficient generalization. HistoROI consistently performs well across multiple organs, despite being trained on only a single dataset, demonstrating strong generalization. Further, we have examined the utility of HistoROI in improving the performance of downstream deep learning-based tasks using the CAMELYON breast cancer lymph node and TCGA lung cancer datasets. For the former dataset, the area under the receiver operating characteristic curve (AUC) for metastasis versus normal tissue of a neural network trained using weakly supervised learning increased from 0.88 to 0.92 by filtering the data using HistoROI. Similarly, the AUC increased from 0.88 to 0.93 for the classification between adenocarcinoma and squamous cell carcinoma on the lung cancer dataset. We also found that the performance of the HistoROI improves upon HistoQC for artifact detection on a test dataset of 93 annotated WSIs. The limitations of the proposed model are analyzed, and potential extensions are also discussed.
△ Less
Submitted 29 September, 2024;
originally announced September 2024.
-
Rapid Assessment of Stable Crystal Structures in Single Phase High Entropy Alloys Via Graph Neural Network Based Surrogate Modelling
Authors:
Nicholas Beaver,
Aniruddha Dive,
Marina Wong,
Keita Shimanuki,
Ananya Patil,
Anthony Ferrell,
Mohsen B. Kivy
Abstract:
In an effort to develop a rapid, reliable, and cost-effective method for predicting the structure of single-phase high entropy alloys, a Graph Neural Network (ALIGNN-FF) based approach was introduced. This method was successfully tested on 132 different high entropy alloys, and the results were analyzed and compared with density functional theory and valence electron concentration calculations. Ad…
▽ More
In an effort to develop a rapid, reliable, and cost-effective method for predicting the structure of single-phase high entropy alloys, a Graph Neural Network (ALIGNN-FF) based approach was introduced. This method was successfully tested on 132 different high entropy alloys, and the results were analyzed and compared with density functional theory and valence electron concentration calculations. Additionally, the effects of various factors, including lattice parameters and the number of supercells with unique atomic configurations, on the prediction accuracy were investigated. The ALIGNN-FF based approach was subsequently used to predict the structure of a novel cobalt-free 3d high entropy alloy, and the result was experimentally verified.
△ Less
Submitted 11 September, 2024;
originally announced September 2024.
-
Highly complex novel critical behavior from the intrinsic randomness of quantum mechanical measurements on critical ground states -- a controlled renormalization group analysis
Authors:
Rushikesh A. Patil,
Andreas W. W. Ludwig
Abstract:
We consider the effects of weak measurements on the quantum critical ground state of the one-dimensional (a) tricritical and (b) critical quantum Ising model, by measuring in (a) the local energy and in (b) the local spin operator in a lattice formulation. By employing a controlled renormalization group (RG) analysis we find that each problem exhibits highly complex novel scaling behavior, arising…
▽ More
We consider the effects of weak measurements on the quantum critical ground state of the one-dimensional (a) tricritical and (b) critical quantum Ising model, by measuring in (a) the local energy and in (b) the local spin operator in a lattice formulation. By employing a controlled renormalization group (RG) analysis we find that each problem exhibits highly complex novel scaling behavior, arising from the intrinsically indeterministic ('random') nature of quantum mechanical measurements, which is governed by a measurement-dominated RG fixed point that we study within an $ε$ expansion. In the tricritical Ising case (a) we find (i): multifractal scaling behavior of energy and spin correlations in the measured groundstate, corresponding to an infinite hierarchy of independent critical exponents and, equivalently, to a continuum of universal scaling exponents for each of these correlations; (ii): the presence of logarithmic factors multiplying powerlaws in correlation functions, a hallmark of 'logarithmic conformal field theories' (CFT); (iii): universal 'effective central charges' $c^{({\rm eff})}_n$ for the prefactors of the logarithm of subsystem size of the $n$th Rényi entropies, which are independent of each other for different $n$, in contrast to the unmeasured critical ground state, and (iv): a universal ("Affleck-Ludwig") 'effective boundary entropy' $S_{\rm{eff}}$ which we show, quite generally, to be related to the system-size independent part of the Shannon entropy of the measurement record, computed explicitly here to 1-loop order. - A subset of these results have so-far also been obtained within the $ε$ expansion for the measurement-dominated critical point in the critical Ising case (b).
△ Less
Submitted 3 September, 2024;
originally announced September 2024.
-
HER2 and FISH Status Prediction in Breast Biopsy H&E-Stained Images Using Deep Learning
Authors:
Ardhendu Sekhar,
Vrinda Goel,
Garima Jain,
Abhijeet Patil,
Ravi Kant Gupta,
Tripti Bameta,
Swapnil Rane,
Amit Sethi
Abstract:
The current standard for detecting human epidermal growth factor receptor 2 (HER2) status in breast cancer patients relies on HER2 amplification, identified through fluorescence in situ hybridization (FISH) or immunohistochemistry (IHC). However, hematoxylin and eosin (H\&E) tumor stains are more widely available, and accurately predicting HER2 status using H\&E could reduce costs and expedite tre…
▽ More
The current standard for detecting human epidermal growth factor receptor 2 (HER2) status in breast cancer patients relies on HER2 amplification, identified through fluorescence in situ hybridization (FISH) or immunohistochemistry (IHC). However, hematoxylin and eosin (H\&E) tumor stains are more widely available, and accurately predicting HER2 status using H\&E could reduce costs and expedite treatment selection. Deep Learning algorithms for H&E have shown effectiveness in predicting various cancer features and clinical outcomes, including moderate success in HER2 status prediction. In this work, we employed a customized weak supervision classification technique combined with MoCo-v2 contrastive learning to predict HER2 status. We trained our pipeline on 182 publicly available H&E Whole Slide Images (WSIs) from The Cancer Genome Atlas (TCGA), for which annotations by the pathology team at Yale School of Medicine are publicly available. Our pipeline achieved an Area Under the Curve (AUC) of 0.85 across four different test folds. Additionally, we tested our model on 44 H&E slides from the TCGA-BRCA dataset, which had an HER2 score of 2+ and included corresponding HER2 status and FISH test results. These cases are considered equivocal for IHC, requiring an expensive FISH test on their IHC slides for disambiguation. Our pipeline demonstrated an AUC of 0.81 on these challenging H&E slides. Reducing the need for FISH test can have significant implications in cancer treatment equity for underserved populations.
△ Less
Submitted 26 September, 2024; v1 submitted 25 August, 2024;
originally announced August 2024.
-
Stabilizer Entanglement Distillation and Efficient Fault-Tolerant Encoder
Authors:
Yu Shi,
Ashlesha Patil,
Saikat Guha
Abstract:
Entanglement is essential for quantum information processing but is limited by noise. We address this by developing high-yield entanglement distillation protocols with several advancements. (1) We extend the 2-to-1 recurrence entanglement distillation protocol to higher-rate n-to-(n-1) protocols that can correct any single-qubit errors. These protocols are evaluated through numerical simulations f…
▽ More
Entanglement is essential for quantum information processing but is limited by noise. We address this by developing high-yield entanglement distillation protocols with several advancements. (1) We extend the 2-to-1 recurrence entanglement distillation protocol to higher-rate n-to-(n-1) protocols that can correct any single-qubit errors. These protocols are evaluated through numerical simulations focusing on fidelity and yield. We also outline a method to adapt any classical error-correcting code for entanglement distillation, where the code can correct both bit-flip and phase-flip errors by incorporating Hadamard gates. (2) We propose a constant-depth decoder for stabilizer codes that transforms logical states into physical ones using single-qubit measurements. This decoder is applied to entanglement distillation protocols, reducing circuit depth and enabling protocols derived from advanced quantum error-correcting codes. We demonstrate this by evaluating the circuit complexity for entanglement distillation protocols based on surface codes and quantum convolutional codes. (3) Our stabilizer entanglement distillation techniques advance quantum computing. We propose a fault-tolerant protocol for constant-depth encoding and decoding of arbitrary quantum states, applicable to quantum low-density parity-check (qLDPC) codes and surface codes. This protocol is feasible with state-of-the-art reconfigurable atom arrays and surpasses the limits of conventional logarithmic depth encoders. Overall, our study integrates stabilizer formalism, measurement-based quantum computing, and entanglement distillation, advancing both quantum communication and computing.
△ Less
Submitted 12 August, 2024;
originally announced August 2024.
-
A Teacher Is Worth A Million Instructions
Authors:
Nikhil Kothari,
Ravindra Nayak,
Shreyas Shetty,
Amey Patil,
Nikesh Garera
Abstract:
Large Language Models(LLMs) have shown exceptional abilities, yet training these models can be quite challenging. There is a strong dependence on the quality of data and finding the best instruction tuning set. Further, the inherent limitations in training methods create substantial difficulties to train relatively smaller models with 7B and 13B parameters. In our research, we suggest an improved…
▽ More
Large Language Models(LLMs) have shown exceptional abilities, yet training these models can be quite challenging. There is a strong dependence on the quality of data and finding the best instruction tuning set. Further, the inherent limitations in training methods create substantial difficulties to train relatively smaller models with 7B and 13B parameters. In our research, we suggest an improved training method for these models by utilising knowledge from larger models, such as a mixture of experts (8x7B) architectures. The scale of these larger models allows them to capture a wide range of variations from data alone, making them effective teachers for smaller models. Moreover, we implement a novel post-training domain alignment phase that employs domain-specific expert models to boost domain-specific knowledge during training while preserving the model's ability to generalise. Fine-tuning Mistral 7B and 2x7B with our method surpasses the performance of state-of-the-art language models with more than 7B and 13B parameters: achieving up to $7.9$ in MT-Bench and $93.04\%$ on AlpacaEval.
△ Less
Submitted 27 June, 2024;
originally announced June 2024.
-
Advancements in Orthopaedic Arm Segmentation: A Comprehensive Review
Authors:
Abhishek Swami,
Snehal Farande,
Atharv Patil,
Atharva Parle,
Vivekanand Mane,
Prathamesh Thorat
Abstract:
The most recent advances in medical imaging that have transformed diagnosis, especially in the case of interpreting X-ray images, are actively involved in the healthcare sector. The advent of digital image processing technology and the implementation of deep learning models such as Convolutional Neural Networks (CNNs) have made the analysis of X-rays much more accurate and efficient. In this artic…
▽ More
The most recent advances in medical imaging that have transformed diagnosis, especially in the case of interpreting X-ray images, are actively involved in the healthcare sector. The advent of digital image processing technology and the implementation of deep learning models such as Convolutional Neural Networks (CNNs) have made the analysis of X-rays much more accurate and efficient. In this article, some essential techniques such as edge detection, region-growing technique, and thresholding approach, and the deep learning models such as variants of YOLOv8-which is the best object detection and segmentation framework-are reviewed. We further investigate that the traditional image processing techniques like segmentation are very much simple and provides the alternative to the advanced methods as well. Our review gives useful knowledge on the practical usage of the innovative and traditional approaches of manual X-ray interpretation. The discovered information will help professionals and researchers to gain more profound knowledge in digital interpretation techniques in medical imaging.
△ Less
Submitted 19 June, 2024;
originally announced June 2024.
-
Distilling Opinions at Scale: Incremental Opinion Summarization using XL-OPSUMM
Authors:
Sri Raghava Muddu,
Rupasai Rangaraju,
Tejpalsingh Siledar,
Swaroop Nath,
Pushpak Bhattacharyya,
Swaprava Nath,
Suman Banerjee,
Amey Patil,
Muthusamy Chelliah,
Sudhanshu Shekhar Singh,
Nikesh Garera
Abstract:
Opinion summarization in e-commerce encapsulates the collective views of numerous users about a product based on their reviews. Typically, a product on an e-commerce platform has thousands of reviews, each review comprising around 10-15 words. While Large Language Models (LLMs) have shown proficiency in summarization tasks, they struggle to handle such a large volume of reviews due to context limi…
▽ More
Opinion summarization in e-commerce encapsulates the collective views of numerous users about a product based on their reviews. Typically, a product on an e-commerce platform has thousands of reviews, each review comprising around 10-15 words. While Large Language Models (LLMs) have shown proficiency in summarization tasks, they struggle to handle such a large volume of reviews due to context limitations. To mitigate, we propose a scalable framework called Xl-OpSumm that generates summaries incrementally. However, the existing test set, AMASUM has only 560 reviews per product on average. Due to the lack of a test set with thousands of reviews, we created a new test set called Xl-Flipkart by gathering data from the Flipkart website and generating summaries using GPT-4. Through various automatic evaluations and extensive analysis, we evaluated the framework's efficiency on two datasets, AMASUM and Xl-Flipkart. Experimental results show that our framework, Xl-OpSumm powered by Llama-3-8B-8k, achieves an average ROUGE-1 F1 gain of 4.38% and a ROUGE-L F1 gain of 3.70% over the next best-performing model.
△ Less
Submitted 16 June, 2024;
originally announced June 2024.
-
Improving Harmonic Analysis using Multitapering: Precise frequency estimation of stellar oscillations using the harmonic F-test
Authors:
Aarya A. Patil,
Gwendolyn M. Eadie,
Joshua S. Speagle,
David J. Thomson
Abstract:
In Patil et. al 2024a, we developed a multitaper power spectrum estimation method, mtNUFFT, for analyzing time-series with quasi-regular spacing, and showed that it not only improves upon the statistical issues of the Lomb-Scargle periodogram, but also provides a factor of three speed up in some applications. In this paper, we combine mtNUFFT with the harmonic F-test to test the hypothesis that a…
▽ More
In Patil et. al 2024a, we developed a multitaper power spectrum estimation method, mtNUFFT, for analyzing time-series with quasi-regular spacing, and showed that it not only improves upon the statistical issues of the Lomb-Scargle periodogram, but also provides a factor of three speed up in some applications. In this paper, we combine mtNUFFT with the harmonic F-test to test the hypothesis that a strictly periodic signal or its harmonic (as opposed to e.g. a quasi-periodic signal) is present at a given frequency. This mtNUFFT/F-test combination shows that multitapering allows detection of periodic signals and precise estimation of their frequencies, thereby improving both power spectrum estimation and harmonic analysis. Using asteroseismic time-series data for the Kepler-91 red giant, we show that the F-test automatically picks up the harmonics of its transiting exoplanet as well as certain dipole ($l=1$) mixed modes. We use this example to highlight that we can distinguish between different types of stellar oscillations, e.g., transient (damped, stochastically-excited) and strictly periodic (undamped, heat-driven). We also illustrate the technique of dividing a time-series into chunks to further examine the transient versus periodic nature of stellar oscillations. The harmonic F-test combined with mtNUFFT is implemented in the public Python package tapify (https://github.com/aaryapatil/tapify), which opens opportunities to perform detailed investigations of periodic signals in time-domain astronomy.
△ Less
Submitted 28 May, 2024;
originally announced May 2024.
-
Filtered Corpus Training (FiCT) Shows that Language Models can Generalize from Indirect Evidence
Authors:
Abhinav Patil,
Jaap Jumelet,
Yu Ying Chiu,
Andy Lapastora,
Peter Shen,
Lexie Wang,
Clevis Willrich,
Shane Steinert-Threlkeld
Abstract:
This paper introduces Filtered Corpus Training, a method that trains language models (LMs) on corpora with certain linguistic constructions filtered out from the training data, and uses it to measure the ability of LMs to perform linguistic generalization on the basis of indirect evidence. We apply the method to both LSTM and Transformer LMs (of roughly comparable size), developing filtered corpor…
▽ More
This paper introduces Filtered Corpus Training, a method that trains language models (LMs) on corpora with certain linguistic constructions filtered out from the training data, and uses it to measure the ability of LMs to perform linguistic generalization on the basis of indirect evidence. We apply the method to both LSTM and Transformer LMs (of roughly comparable size), developing filtered corpora that target a wide range of linguistic phenomena. Our results show that while transformers are better qua LMs (as measured by perplexity), both models perform equally and surprisingly well on linguistic generalization measures, suggesting that they are capable of generalizing from indirect evidence.
△ Less
Submitted 6 August, 2024; v1 submitted 24 May, 2024;
originally announced May 2024.
-
An Improved Design for All-Photonic Quantum Repeaters
Authors:
Ashlesha Patil,
Saikat Guha
Abstract:
All-photonic quantum repeaters use multi-qubit photonic graph states, called repeater graph states (RGS), instead of matter-based quantum memories, for protection against predominantly loss errors. The RGS comprises tree-graph-encoded logical qubits for error correction at the repeaters and physical {\em link} qubits to create entanglement between neighboring repeaters. The two methods to generate…
▽ More
All-photonic quantum repeaters use multi-qubit photonic graph states, called repeater graph states (RGS), instead of matter-based quantum memories, for protection against predominantly loss errors. The RGS comprises tree-graph-encoded logical qubits for error correction at the repeaters and physical {\em link} qubits to create entanglement between neighboring repeaters. The two methods to generate the RGS are probabilistic stitching -- using linear optical Bell state measurements (fusion) -- of small entangled states prepared via multiplexed-probabilistic linear optical circuits fed with single photons, and a direct deterministic preparation using a small number of quantum-logic-capable solid-state emitters. The resource overhead due to fusions and the circuit depth of the quantum emitter system both increase with the size of the RGS. Therefore engineering a resource-efficient RGS is crucial. We propose a new RGS design, which achieves a higher entanglement rate for all-photonic quantum repeaters using fewer qubits than the previously known RGS would. We accomplish this by boosting the probability of entangling neighboring repeaters with tree-encoded link qubits. We also propose a new adaptive scheme to perform logical BSM on the link qubits for loss-only errors. The adaptive BSM outperforms the previous schemes for logical BSM on tree codes when the qubit loss probability is uniform. It reduces the number of optical modes required to perform logical BSM on link qubits to improve the entanglement rate further.
△ Less
Submitted 19 May, 2024;
originally announced May 2024.
-
Entanglement Routing using Quantum Error Correction for Distillation
Authors:
Ashlesha Patil,
Michele Pacenti,
Bane Vasić,
Saikat Guha,
Narayanan Rengaswamy
Abstract:
Bell-state measurement (BSM) on entangled states shared between quantum repeaters is the fundamental operation used to route entanglement in quantum networks. Performing BSMs on Werner states shared between repeaters leads to exponential decay in the fidelity of the end-to-end Werner state with the number of repeaters, necessitating entanglement distillation. Generally, entanglement routing protoc…
▽ More
Bell-state measurement (BSM) on entangled states shared between quantum repeaters is the fundamental operation used to route entanglement in quantum networks. Performing BSMs on Werner states shared between repeaters leads to exponential decay in the fidelity of the end-to-end Werner state with the number of repeaters, necessitating entanglement distillation. Generally, entanglement routing protocols use \emph{probabilistic} distillation techniques based on local operations and classical communication. In this work, we use quantum error correcting codes (QECCs) for \emph{deterministic} entanglement distillation to route Werner states on a chain of repeaters. To maximize the end-to-end distillable entanglement, which depends on the number and fidelity of end-to-end Bell pairs, we utilize global link-state knowledge to determine the optimal policy for scheduling distillation and BSMs at the repeaters. We analyze the effect of the QECC's properties on the entanglement rate and the number of quantum memories. We observe that low-rate codes produce high-fidelity end-to-end states owing to their excellent error-correcting capability, whereas high-rate codes yield a larger number of end-to-end states but of lower fidelity. The number of quantum memories used at repeaters increases with the code rate as well as the classical computation time of the QECC's decoder.
△ Less
Submitted 1 May, 2024;
originally announced May 2024.
-
Product Description and QA Assisted Self-Supervised Opinion Summarization
Authors:
Tejpalsingh Siledar,
Rupasai Rangaraju,
Sankara Sri Raghava Ravindra Muddu,
Suman Banerjee,
Amey Patil,
Sudhanshu Shekhar Singh,
Muthusamy Chelliah,
Nikesh Garera,
Swaprava Nath,
Pushpak Bhattacharyya
Abstract:
In e-commerce, opinion summarization is the process of summarizing the consensus opinions found in product reviews. However, the potential of additional sources such as product description and question-answers (QA) has been considered less often. Moreover, the absence of any supervised training data makes this task challenging. To address this, we propose a novel synthetic dataset creation (SDC) s…
▽ More
In e-commerce, opinion summarization is the process of summarizing the consensus opinions found in product reviews. However, the potential of additional sources such as product description and question-answers (QA) has been considered less often. Moreover, the absence of any supervised training data makes this task challenging. To address this, we propose a novel synthetic dataset creation (SDC) strategy that leverages information from reviews as well as additional sources for selecting one of the reviews as a pseudo-summary to enable supervised training. Our Multi-Encoder Decoder framework for Opinion Summarization (MEDOS) employs a separate encoder for each source, enabling effective selection of information while generating the summary. For evaluation, due to the unavailability of test sets with additional sources, we extend the Amazon, Oposum+, and Flipkart test sets and leverage ChatGPT to annotate summaries. Experiments across nine test sets demonstrate that the combination of our SDC approach and MEDOS model achieves on average a 14.5% improvement in ROUGE-1 F1 over the SOTA. Moreover, comparative analysis underlines the significance of incorporating additional sources for generating more informative summaries. Human evaluations further indicate that MEDOS scores relatively higher in coherence and fluency with 0.41 and 0.5 (-1 to 1) respectively, compared to existing models. To the best of our knowledge, we are the first to generate opinion summaries leveraging additional sources in a self-supervised setting.
△ Less
Submitted 8 April, 2024;
originally announced April 2024.
-
UDON: A case for offloading to general purpose compute on CXL memory
Authors:
Jon Hermes,
Josh Minor,
Minjun Wu,
Adarsh Patil,
Eric Van Hensbergen
Abstract:
Upcoming CXL-based disaggregated memory devices feature special purpose units to offload compute to near-memory. In this paper, we explore opportunities for offloading compute to general purpose cores on CXL memory devices, thereby enabling a greater utility and diversity of offload.
We study two classes of popular memory intensive applications: ML inference and vector database as candidates for…
▽ More
Upcoming CXL-based disaggregated memory devices feature special purpose units to offload compute to near-memory. In this paper, we explore opportunities for offloading compute to general purpose cores on CXL memory devices, thereby enabling a greater utility and diversity of offload.
We study two classes of popular memory intensive applications: ML inference and vector database as candidates for computational offload. The study uses Arm AArch64-based dual-socket NUMA systems to emulate CXL type-2 devices.
Our study shows promising results. With our ML inference model partitioning strategy for compute offload, we can place up to 90% data in remote memory with just 20% performance trade-off. Offloading Hierarchical Navigable Small World (HNSW) kernels in vector databases can provide upto 6.87$\times$ performance improvement with under 10% offload overhead.
△ Less
Submitted 3 April, 2024;
originally announced April 2024.
-
AI coach for badminton
Authors:
Dhruv Toshniwal,
Arpit Patil,
Nancy Vachhani
Abstract:
In the competitive realm of sports, optimal performance necessitates rigorous management of nutrition and physical conditioning. Specifically, in badminton, the agility and precision required make it an ideal candidate for motion analysis through video analytics. This study leverages advanced neural network methodologies to dissect video footage of badminton matches, aiming to extract detailed ins…
▽ More
In the competitive realm of sports, optimal performance necessitates rigorous management of nutrition and physical conditioning. Specifically, in badminton, the agility and precision required make it an ideal candidate for motion analysis through video analytics. This study leverages advanced neural network methodologies to dissect video footage of badminton matches, aiming to extract detailed insights into player kinetics and biomechanics. Through the analysis of stroke mechanics, including hand-hip coordination, leg positioning, and the execution angles of strokes, the research aims to derive predictive models that can suggest improvements in stance, technique, and muscle orientation. These recommendations are designed to mitigate erroneous techniques, reduce the risk of joint fatigue, and enhance overall performance. Utilizing a vast array of data available online, this research correlates players' physical attributes with their in-game movements to identify muscle activation patterns during play. The goal is to offer personalized training and nutrition strategies that align with the specific biomechanical demands of badminton, thereby facilitating targeted performance enhancements.
△ Less
Submitted 13 March, 2024;
originally announced March 2024.
-
Leveraging Domain Knowledge for Efficient Reward Modelling in RLHF: A Case-Study in E-Commerce Opinion Summarization
Authors:
Swaroop Nath,
Tejpalsingh Siledar,
Sankara Sri Raghava Ravindra Muddu,
Rupasai Rangaraju,
Harshad Khadilkar,
Pushpak Bhattacharyya,
Suman Banerjee,
Amey Patil,
Sudhanshu Shekhar Singh,
Muthusamy Chelliah,
Nikesh Garera
Abstract:
Reinforcement Learning from Human Feedback (RLHF) has become a dominating strategy in aligning Language Models (LMs) with human values/goals. The key to the strategy is learning a reward model ($\varphi$), which can reflect the latent reward model of humans. While this strategy has proven effective, the training methodology requires a lot of human preference annotation (usually in the order of ten…
▽ More
Reinforcement Learning from Human Feedback (RLHF) has become a dominating strategy in aligning Language Models (LMs) with human values/goals. The key to the strategy is learning a reward model ($\varphi$), which can reflect the latent reward model of humans. While this strategy has proven effective, the training methodology requires a lot of human preference annotation (usually in the order of tens of thousands) to train $\varphi$. Such a large-scale annotation is justifiable when it's a one-time effort, and the reward model is universally applicable. However, human goals are subjective and depend on the task, requiring task-specific preference annotations, which can be impractical to fulfill. To address this challenge, we propose a novel approach to infuse domain knowledge into $\varphi$, which reduces the amount of preference annotation required ($21\times$), omits Alignment Tax, and provides some interpretability. We validate our approach in E-Commerce Opinion Summarization, with a significant reduction in dataset size (to just $940$ samples) while advancing the SOTA ($\sim4$ point ROUGE-L improvement, $68\%$ of times preferred by humans over SOTA). Our contributions include a novel Reward Modeling technique and two new datasets: PromptOpinSumm (supervised data for Opinion Summarization) and OpinPref (a gold-standard human preference dataset). The proposed methodology opens up avenues for efficient RLHF, making it more adaptable to applications with varying human values. We release the artifacts (Code: github.com/efficient-rlhf. PromptOpinSumm: hf.co/prompt-opin-summ. OpinPref: hf.co/opin-pref) for usage under MIT License.
△ Less
Submitted 18 April, 2024; v1 submitted 23 February, 2024;
originally announced February 2024.
-
How to Sustain a Scientific Open-Source Software Ecosystem: Learning from the Astropy Project
Authors:
Jiayi Sun,
Aarya Patil,
Youhai Li,
Jin L. C. Guo,
Shurui Zhou
Abstract:
Scientific open-source software (OSS) has greatly benefited research communities through its transparent and collaborative nature. Given its critical role in scientific research, ensuring the sustainability of such software has become vital. Earlier studies have proposed sustainability strategies for conventional scientific software and open-source communities. However, it remains unclear whether…
▽ More
Scientific open-source software (OSS) has greatly benefited research communities through its transparent and collaborative nature. Given its critical role in scientific research, ensuring the sustainability of such software has become vital. Earlier studies have proposed sustainability strategies for conventional scientific software and open-source communities. However, it remains unclear whether these solutions can be easily adapted to the integrated framework of scientific OSS and its larger ecosystem. This study examines the challenges and opportunities to enhance the sustainability of scientific OSS in the context of interdisciplinary collaboration, open-source community, and multi-project ecosystem. We conducted a case study on a widely-used software ecosystem in the astrophysics domain, the Astropy Project, using a mixed-methods design approach. This approach includes an interview with core contributors regarding their participation in an interdisciplinary team, a survey of disengaged contributors about their motivations for contribution, reasons for disengagement, and suggestions for sustaining the communities, and finally, an analysis of cross-referenced issues and pull requests to understand best practices for collaboration on the ecosystem level. Our study reveals the implications of major challenges for sustaining scientific OSS and proposes concrete suggestions for tackling these challenges.
△ Less
Submitted 22 February, 2024;
originally announced February 2024.
-
One Prompt To Rule Them All: LLMs for Opinion Summary Evaluation
Authors:
Tejpalsingh Siledar,
Swaroop Nath,
Sankara Sri Raghava Ravindra Muddu,
Rupasai Rangaraju,
Swaprava Nath,
Pushpak Bhattacharyya,
Suman Banerjee,
Amey Patil,
Sudhanshu Shekhar Singh,
Muthusamy Chelliah,
Nikesh Garera
Abstract:
Evaluation of opinion summaries using conventional reference-based metrics rarely provides a holistic evaluation and has been shown to have a relatively low correlation with human judgments. Recent studies suggest using Large Language Models (LLMs) as reference-free metrics for NLG evaluation, however, they remain unexplored for opinion summary evaluation. Moreover, limited opinion summary evaluat…
▽ More
Evaluation of opinion summaries using conventional reference-based metrics rarely provides a holistic evaluation and has been shown to have a relatively low correlation with human judgments. Recent studies suggest using Large Language Models (LLMs) as reference-free metrics for NLG evaluation, however, they remain unexplored for opinion summary evaluation. Moreover, limited opinion summary evaluation datasets inhibit progress. To address this, we release the SUMMEVAL-OP dataset covering 7 dimensions related to the evaluation of opinion summaries: fluency, coherence, relevance, faithfulness, aspect coverage, sentiment consistency, and specificity. We investigate Op-I-Prompt a dimension-independent prompt, and Op-Prompts, a dimension-dependent set of prompts for opinion summary evaluation. Experiments indicate that Op-I-Prompt emerges as a good alternative for evaluating opinion summaries achieving an average Spearman correlation of 0.70 with humans, outperforming all previous approaches. To the best of our knowledge, we are the first to investigate LLMs as evaluators on both closed-source and open-source models in the opinion summarization domain.
△ Less
Submitted 9 June, 2024; v1 submitted 18 February, 2024;
originally announced February 2024.
-
On the Idempotent Graph of Matrix Ring
Authors:
Avinash Patil,
P. S. Momale,
C. M. Jadhav
Abstract:
Let F be a finite field and R = M2(F) be 2x2 matrix ring over F. In this paper, we explicitly determine all the idempotents in R. Using these idempotents, we study the idempotent graph of R whose vertex set is the set of non-trivial idempotents in R and two idempotents e, f are adjacent if ef = 0 or fe = 0. It is proved that the idempotent graph of R is connected regular graph with diameter 2. Its…
▽ More
Let F be a finite field and R = M2(F) be 2x2 matrix ring over F. In this paper, we explicitly determine all the idempotents in R. Using these idempotents, we study the idempotent graph of R whose vertex set is the set of non-trivial idempotents in R and two idempotents e, f are adjacent if ef = 0 or fe = 0. It is proved that the idempotent graph of R is connected regular graph with diameter 2. Its girth is also characterized. Further, we determine the Wiener and Harary index of the idempotent graph of R.
△ Less
Submitted 17 February, 2024;
originally announced February 2024.
-
Resource-efficient and loss-aware photonic graph state preparation using an array of quantum emitters, and application to all-photonic quantum repeaters
Authors:
Eneet Kaur,
Ashlesha Patil,
Saikat Guha
Abstract:
Multi-qubit photonic graph states are necessary for quantum communication and computation. Preparing photonic graph states using probabilistic stitching of single photons using linear optics results in a formidable resource requirement due to the need of multiplexing. Quantum emitters present a viable solution to prepare photonic graph states, as they enable controlled production of photons entang…
▽ More
Multi-qubit photonic graph states are necessary for quantum communication and computation. Preparing photonic graph states using probabilistic stitching of single photons using linear optics results in a formidable resource requirement due to the need of multiplexing. Quantum emitters present a viable solution to prepare photonic graph states, as they enable controlled production of photons entangled with the emitter qubit, and deterministic two-qubit interactions among emitters. A handful of emitters often suffice to generate useful photonic graph states that would otherwise require millions of single photon sources using the linear-optics method. But, photon loss poses an impediment to this method due to the large depth, i.e., age of the oldest photon, of the graph state, given the typically large number of slow and noisy two-qubit CNOT gates required on emitters. We propose an algorithm that can trade the number of emitters with the graph-state depth, while minimizing the number of emitter CNOTs. We apply our algorithm to generating a repeater graph state (RGS) for all-photonic repeaters. We find that our scheme achieves a far superior rate-vs.-distance performance than using the least number of emitters needed to generate the RGS. Yet, our scheme is able to get the same performance as the linear-optics method of generating the RGS where each emitter is used as a single-photon source, but with orders of magnitude fewer emitters.
△ Less
Submitted 1 February, 2024;
originally announced February 2024.
-
Multi-Agent Phase-Balancing around Polar Curves with Bounded Trajectories: An Experimental Study using Crazyflies and MoCap System
Authors:
Gaurav Singh Bhati,
KKN Shyam Sathvik,
Anuj Patil,
Anoop Jain
Abstract:
In this experimental work, we implement the control design from our earlier work on a swarm of Crazyflie 2.1 quad-copters by deriving the original control in terms of variables that are available to the user in this practical system. A suitable model is developed using the Crazyswarm2 package within ROS2 to facilitate the execution of the control law. We also discuss various components that are pa…
▽ More
In this experimental work, we implement the control design from our earlier work on a swarm of Crazyflie 2.1 quad-copters by deriving the original control in terms of variables that are available to the user in this practical system. A suitable model is developed using the Crazyswarm2 package within ROS2 to facilitate the execution of the control law. We also discuss various components that are part of this experiment and the challenges we encountered during the experimentation. Extensive experimental results, along with the links to the YouTube videos for actual Crazyflie quad-copters, are provided.
△ Less
Submitted 30 January, 2024;
originally announced January 2024.
-
Streaming Bilingual End-to-End ASR model using Attention over Multiple Softmax
Authors:
Aditya Patil,
Vikas Joshi,
Purvi Agrawal,
Rupesh Mehta
Abstract:
Even with several advancements in multilingual modeling, it is challenging to recognize multiple languages using a single neural model, without knowing the input language and most multilingual models assume the availability of the input language. In this work, we propose a novel bilingual end-to-end (E2E) modeling approach, where a single neural model can recognize both languages and also support…
▽ More
Even with several advancements in multilingual modeling, it is challenging to recognize multiple languages using a single neural model, without knowing the input language and most multilingual models assume the availability of the input language. In this work, we propose a novel bilingual end-to-end (E2E) modeling approach, where a single neural model can recognize both languages and also support switching between the languages, without any language input from the user. The proposed model has shared encoder and prediction networks, with language-specific joint networks that are combined via a self-attention mechanism. As the language-specific posteriors are combined, it produces a single posterior probability over all the output symbols, enabling a single beam search decoding and also allowing dynamic switching between the languages. The proposed approach outperforms the conventional bilingual baseline with 13.3%, 8.23% and 1.3% word error rate relative reduction on Hindi, English and code-mixed test sets, respectively.
△ Less
Submitted 21 January, 2024;
originally announced January 2024.
-
L3Cube-MahaSocialNER: A Social Media based Marathi NER Dataset and BERT models
Authors:
Harsh Chaudhari,
Anuja Patil,
Dhanashree Lavekar,
Pranav Khairnar,
Raviraj Joshi
Abstract:
This work introduces the L3Cube-MahaSocialNER dataset, the first and largest social media dataset specifically designed for Named Entity Recognition (NER) in the Marathi language. The dataset comprises 18,000 manually labeled sentences covering eight entity classes, addressing challenges posed by social media data, including non-standard language and informal idioms. Deep learning models, includin…
▽ More
This work introduces the L3Cube-MahaSocialNER dataset, the first and largest social media dataset specifically designed for Named Entity Recognition (NER) in the Marathi language. The dataset comprises 18,000 manually labeled sentences covering eight entity classes, addressing challenges posed by social media data, including non-standard language and informal idioms. Deep learning models, including CNN, LSTM, BiLSTM, and Transformer models, are evaluated on the individual dataset with IOB and non-IOB notations. The results demonstrate the effectiveness of these models in accurately recognizing named entities in Marathi informal text. The L3Cube-MahaSocialNER dataset offers user-centric information extraction and supports real-time applications, providing a valuable resource for public opinion analysis, news, and marketing on social media platforms. We also show that the zero-shot results of the regular NER model are poor on the social NER test set thus highlighting the need for more social NER datasets. The datasets and models are publicly available at https://github.com/l3cube-pune/MarathiNLP
△ Less
Submitted 30 December, 2023;
originally announced January 2024.
-
A Comprehensive Survey of Evaluation Techniques for Recommendation Systems
Authors:
Aryan Jadon,
Avinash Patil
Abstract:
The effectiveness of recommendation systems is pivotal to user engagement and satisfaction in online platforms. As these recommendation systems increasingly influence user choices, their evaluation transcends mere technical performance and becomes central to business success. This paper addresses the multifaceted nature of recommendations system evaluation by introducing a comprehensive suite of m…
▽ More
The effectiveness of recommendation systems is pivotal to user engagement and satisfaction in online platforms. As these recommendation systems increasingly influence user choices, their evaluation transcends mere technical performance and becomes central to business success. This paper addresses the multifaceted nature of recommendations system evaluation by introducing a comprehensive suite of metrics, each tailored to capture a distinct aspect of system performance. We discuss
* Similarity Metrics: to quantify the precision of content-based filtering mechanisms and assess the accuracy of collaborative filtering techniques.
* Candidate Generation Metrics: to evaluate how effectively the system identifies a broad yet relevant range of items.
* Predictive Metrics: to assess the accuracy of forecasted user preferences.
* Ranking Metrics: to evaluate the effectiveness of the order in which recommendations are presented.
* Business Metrics: to align the performance of the recommendation system with economic objectives.
Our approach emphasizes the contextual application of these metrics and their interdependencies. In this paper, we identify the strengths and limitations of current evaluation practices and highlight the nuanced trade-offs that emerge when optimizing recommendation systems across different metrics. The paper concludes by proposing a framework for selecting and interpreting these metrics to not only improve system performance but also to advance business goals. This work is to aid researchers and practitioners in critically assessing recommendation systems and fosters the development of more nuanced, effective, and economically viable personalization strategies. Our code is available at GitHub - https://github.com/aryan-jadon/Evaluation-Metrics-for-Recommendation-Systems.
△ Less
Submitted 12 January, 2024; v1 submitted 26 December, 2023;
originally announced December 2023.
-
Clifford Manipulations of Stabilizer States: A graphical rule book for Clifford unitaries and measurements on cluster states, and application to photonic quantum computing
Authors:
Ashlesha Patil,
Saikat Guha
Abstract:
Stabilizer states along with Clifford manipulations (unitary transformations and measurements) thereof -- despite being efficiently simulable on a classical computer -- are an important tool in quantum information processing, with applications to quantum computing, error correction and networking. Cluster states, defined on a graph, are a special class of stabilizer states that are central to meas…
▽ More
Stabilizer states along with Clifford manipulations (unitary transformations and measurements) thereof -- despite being efficiently simulable on a classical computer -- are an important tool in quantum information processing, with applications to quantum computing, error correction and networking. Cluster states, defined on a graph, are a special class of stabilizer states that are central to measurement based quantum computing, all-photonic quantum repeaters, distributed quantum computing, and entanglement distribution in a network. All cluster states are local-Clifford equivalent to a stabilizer state. In this paper, we review the stabilizer framework, and extend it, by: incorporating general stabilizer measurements such as multi-qubit fusions, and providing an explicit procedure -- using Karnaugh maps from Boolean algebra -- for converting arbitrary stabilizer gates into tableau operations of the CHP formalism for efficient stabilizer manipulations. Using these tools, we develop a graphical rule-book and a MATLAB simulator with a graphical user interface for arbitrary stabilizer manipulations of cluster states, a user of which, e.g., for research in quantum networks, will not require any background in quantum information or the stabilizer framework. We extend our graphical rule-book to include dual-rail photonic-qubit cluster state manipulations with probabilistically-heralded linear-optical circuits for various rotated Bell measurements, i.e., fusions (including new `Type-I' fusions we propose, where only one of the two fused qubits is destructively measured), by incorporating graphical rules for their success and failure modes. Finally, we show how stabilizer descriptions of multi-qubit fusions can be mapped to linear optical circuits.
△ Less
Submitted 4 December, 2023;
originally announced December 2023.
-
On Significance of Subword tokenization for Low Resource and Efficient Named Entity Recognition: A case study in Marathi
Authors:
Harsh Chaudhari,
Anuja Patil,
Dhanashree Lavekar,
Pranav Khairnar,
Raviraj Joshi,
Sachin Pande
Abstract:
Named Entity Recognition (NER) systems play a vital role in NLP applications such as machine translation, summarization, and question-answering. These systems identify named entities, which encompass real-world concepts like locations, persons, and organizations. Despite extensive research on NER systems for the English language, they have not received adequate attention in the context of low reso…
▽ More
Named Entity Recognition (NER) systems play a vital role in NLP applications such as machine translation, summarization, and question-answering. These systems identify named entities, which encompass real-world concepts like locations, persons, and organizations. Despite extensive research on NER systems for the English language, they have not received adequate attention in the context of low resource languages. In this work, we focus on NER for low-resource language and present our case study in the context of the Indian language Marathi. The advancement of NLP research revolves around the utilization of pre-trained transformer models such as BERT for the development of NER models. However, we focus on improving the performance of shallow models based on CNN, and LSTM by combining the best of both worlds. In the era of transformers, these traditional deep learning models are still relevant because of their high computational efficiency. We propose a hybrid approach for efficient NER by integrating a BERT-based subword tokenizer into vanilla CNN/LSTM models. We show that this simple approach of replacing a traditional word-based tokenizer with a BERT-tokenizer brings the accuracy of vanilla single-layer models closer to that of deep pre-trained models like BERT. We show the importance of using sub-word tokenization for NER and present our study toward building efficient NLP systems. The evaluation is performed on L3Cube-MahaNER dataset using tokenizers from MahaBERT, MahaGPT, IndicBERT, and mBERT.
△ Less
Submitted 3 December, 2023;
originally announced December 2023.
-
Exploring the Numerical Reasoning Capabilities of Language Models: A Comprehensive Analysis on Tabular Data
Authors:
Mubashara Akhtar,
Abhilash Shankarampeta,
Vivek Gupta,
Arpit Patil,
Oana Cocarascu,
Elena Simperl
Abstract:
Numbers are crucial for various real-world domains such as finance, economics, and science. Thus, understanding and reasoning with numbers are essential skills for language models to solve different tasks. While different numerical benchmarks have been introduced in recent years, they are limited to specific numerical aspects mostly. In this paper, we propose a hierarchical taxonomy for numerical…
▽ More
Numbers are crucial for various real-world domains such as finance, economics, and science. Thus, understanding and reasoning with numbers are essential skills for language models to solve different tasks. While different numerical benchmarks have been introduced in recent years, they are limited to specific numerical aspects mostly. In this paper, we propose a hierarchical taxonomy for numerical reasoning skills with more than ten reasoning types across four levels: representation, number sense, manipulation, and complex reasoning. We conduct a comprehensive evaluation of state-of-the-art models to identify reasoning challenges specific to them. Henceforth, we develop a diverse set of numerical probes employing a semi-automated approach. We focus on the tabular Natural Language Inference (TNLI) task as a case study and measure models' performance shifts. Our results show that no model consistently excels across all numerical reasoning types. Among the probed models, FlanT5 (few-/zero-shot) and GPT-3.5 (few-shot) demonstrate strong overall numerical reasoning skills compared to other models. Label-flipping probes indicate that models often exploit dataset artifacts to predict the correct labels.
△ Less
Submitted 3 November, 2023;
originally announced November 2023.
-
Explosive Chrysopoeia
Authors:
Jan Maurycy Uszko,
Stephen J. Eichhorn,
Avinash J. Patil,
Simon R. Hall
Abstract:
Fulminating gold, the first high-explosive compound to be discovered, disintegrates in a mysterious cloud of purple smoke, the nature of which has been speculated upon since its discovery in 1585. In this work, we show that the colour of the smoke is due to the presence of gold nanoparticles.
Fulminating gold, the first high-explosive compound to be discovered, disintegrates in a mysterious cloud of purple smoke, the nature of which has been speculated upon since its discovery in 1585. In this work, we show that the colour of the smoke is due to the presence of gold nanoparticles.
△ Less
Submitted 19 October, 2023;
originally announced October 2023.
-
A Hybrid Approach for Depression Classification: Random Forest-ANN Ensemble on Motor Activity Signals
Authors:
Anket Patil,
Dhairya Shah,
Abhishek Shah,
Mokshit Gala
Abstract:
Regarding the rising number of people suffering from mental health illnesses in today's society, the importance of mental health cannot be overstated. Wearable sensors, which are increasingly widely available, provide a potential way to track and comprehend mental health issues. These gadgets not only monitor everyday activities but also continuously record vital signs like heart rate, perhaps pro…
▽ More
Regarding the rising number of people suffering from mental health illnesses in today's society, the importance of mental health cannot be overstated. Wearable sensors, which are increasingly widely available, provide a potential way to track and comprehend mental health issues. These gadgets not only monitor everyday activities but also continuously record vital signs like heart rate, perhaps providing information on a person's mental state. Recent research has used these sensors in conjunction with machine learning methods to identify patterns relating to different mental health conditions, highlighting the immense potential of this data beyond simple activity monitoring. In this research, we present a novel algorithm called the Hybrid Random forest - Neural network that has been tailored to evaluate sensor data from depressed patients. Our method has a noteworthy accuracy of 80\% when evaluated on a special dataset that included both unipolar and bipolar depressive patients as well as healthy controls. The findings highlight the algorithm's potential for reliably determining a person's depression condition using sensor data, making a substantial contribution to the area of mental health diagnostics.
△ Less
Submitted 13 October, 2023;
originally announced October 2023.
-
A comparative data study on dinosaur, bird and human bone attributes -- A supporting study for convergent evolution
Authors:
Akshita Patil,
Nishchal Dwivedi
Abstract:
For over 165 million years, dinosaurs reigned on this planet. Their entire existence saw variations in their body size and mass . Understanding the relationship between various attributes such as femur length, breadth; humerus length, breadth; tibia length, breadth and body mass of dinosaurs contributes to our understanding of the Jurassic era and further provides reasoning for bone and body size…
▽ More
For over 165 million years, dinosaurs reigned on this planet. Their entire existence saw variations in their body size and mass . Understanding the relationship between various attributes such as femur length, breadth; humerus length, breadth; tibia length, breadth and body mass of dinosaurs contributes to our understanding of the Jurassic era and further provides reasoning for bone and body size evolution of modern day descendants of those from the Dinosauria clade. The following work consists of statistical evidence derived from an encyclopedic data set consisting of a wide variety of measurements pertaining to discovered fossils of a particular taxa of dinosaur. Our study establishes linearly regressive correspondence between femur and humerus length and radii. Furthermore, there is also a comparison with terrestrial bird bone lengths, to verify the claim of birds being closest alive species to dinosaurs. An analysis into bone ratios of early humans shows that terrestrial birds are closer to humans than that of dinosaurs. Not only on one hand it challenges the closeness of birds with dinosaurs, but on the other hand it makes a case of convergent evolution between birds and humans, due to their closeness in regressive fits.
A correlation between bone ratios of dinosaurs and early humans also advances understanding in the structural and physical distinctions between the two species. Overall, the work contains evaluation of dinosaur skeletons and promotes further exploration and research in the paleontological field to strengthen the conclusions drawn thus far.
△ Less
Submitted 21 September, 2023;
originally announced September 2023.
-
Entanglement Routing over Networks with Time Multiplexed Repeaters
Authors:
Emily A Van Milligen,
Eliana Jacobson,
Ashlesha Patil,
Gayane Vardoyan,
Don Towsley,
Saikat Guha
Abstract:
Quantum networks will be able to service consumers with long-distance entanglement by use of quantum repeaters that generate Bell pairs (or links) with their neighbors, iid with probability $p$ and perform Bell State Measurements (BSMs) on the links that succeed iid with probability $q$. While global link state knowledge is required to maximize the rate of entanglement generation between any two c…
▽ More
Quantum networks will be able to service consumers with long-distance entanglement by use of quantum repeaters that generate Bell pairs (or links) with their neighbors, iid with probability $p$ and perform Bell State Measurements (BSMs) on the links that succeed iid with probability $q$. While global link state knowledge is required to maximize the rate of entanglement generation between any two consumers, it increases the protocol latency due to the classical communication requirements and requires long quantum memory coherence times. We propose two entanglement routing protocols that require only local link state knowledge to relax the quantum memory coherence time requirements and reduce the protocol latency. These protocols utilize multi-path routing protocol and time multiplexed repeaters. The time multiplexed repeaters first generate links for $k$-time steps before performing BSMs on any pairs of links. Our two protocols differ in the decision rule used for performing BSMs at the repeater: the first being a static path based routing protocol and second a dynamic distance based routing protocol. The performance of these protocols depends on the quantum network topology and the consumers' location. We observe that the average entanglement rate and the latency increase with the time multiplexing block length, $k$, irrespective of the protocol. When a step function memory decoherence model is introduced such that qubits are held in the quantum memory for an exponentially distributed time with mean $μ$, an optimal $k$ ($k_\text{opt}$) value appears, such that for increasing $k$ beyond $k_{\rm opt}$ hurts the entanglement rate. $k_{\rm opt}$ decreases with $p$ and increases with $μ$. $k_{\rm opt}$ appears due to the tradeoff between benefits from time multiplexing and the increased likelihood of previously established Bell pairs decohering due to finite memory coherence times.
△ Less
Submitted 28 March, 2024; v1 submitted 29 August, 2023;
originally announced August 2023.
-
Simulator-Driven Deceptive Control via Path Integral Approach
Authors:
Apurva Patil,
Mustafa O. Karabag,
Takashi Tanaka,
Ufuk Topcu
Abstract:
We consider a setting where a supervisor delegates an agent to perform a certain control task, while the agent is incentivized to deviate from the given policy to achieve its own goal. In this work, we synthesize the optimal deceptive policies for an agent who attempts to hide its deviations from the supervisor's policy. We study the deception problem in the continuous-state discrete-time stochast…
▽ More
We consider a setting where a supervisor delegates an agent to perform a certain control task, while the agent is incentivized to deviate from the given policy to achieve its own goal. In this work, we synthesize the optimal deceptive policies for an agent who attempts to hide its deviations from the supervisor's policy. We study the deception problem in the continuous-state discrete-time stochastic dynamics setting and, using motivations from hypothesis testing theory, formulate a Kullback-Leibler control problem for the synthesis of deceptive policies. This problem can be solved using backward dynamic programming in principle, which suffers from the curse of dimensionality. However, under the assumption of deterministic state dynamics, we show that the optimal deceptive actions can be generated using path integral control. This allows the agent to numerically compute the deceptive actions via Monte Carlo simulations. Since Monte Carlo simulations can be efficiently parallelized, our approach allows the agent to generate deceptive control actions online. We show that the proposed simulation-driven control approach asymptotically converges to the optimal control distribution.
△ Less
Submitted 27 August, 2023;
originally announced August 2023.
-
Risk-Minimizing Two-Player Zero-Sum Stochastic Differential Game via Path Integral Control
Authors:
Apurva Patil,
Yujing Zhou,
David Fridovich-Keil,
Takashi Tanaka
Abstract:
This paper addresses a continuous-time risk-minimizing two-player zero-sum stochastic differential game (SDG), in which each player aims to minimize its probability of failure. Failure occurs in the event when the state of the game enters into predefined undesirable domains, and one player's failure is the other's success. We derive a sufficient condition for this game to have a saddle-point equil…
▽ More
This paper addresses a continuous-time risk-minimizing two-player zero-sum stochastic differential game (SDG), in which each player aims to minimize its probability of failure. Failure occurs in the event when the state of the game enters into predefined undesirable domains, and one player's failure is the other's success. We derive a sufficient condition for this game to have a saddle-point equilibrium and show that it can be solved via a Hamilton-Jacobi-Isaacs (HJI) partial differential equation (PDE) with Dirichlet boundary condition. Under certain assumptions on the system dynamics and cost function, we establish the existence and uniqueness of the saddle-point of the game. We provide explicit expressions for the saddle-point policies which can be numerically evaluated using path integral control. This allows us to solve the game online via Monte Carlo sampling of system trajectories. We implement our control synthesis framework on two classes of risk-minimizing zero-sum SDGs: a disturbance attenuation problem and a pursuit-evasion game. Simulation studies are presented to validate the proposed control synthesis framework.
△ Less
Submitted 22 August, 2023;
originally announced August 2023.
-
A Comparative Study of Text Embedding Models for Semantic Text Similarity in Bug Reports
Authors:
Avinash Patil,
Kihwan Han,
Aryan Jadon
Abstract:
Bug reports are an essential aspect of software development, and it is crucial to identify and resolve them quickly to ensure the consistent functioning of software systems. Retrieving similar bug reports from an existing database can help reduce the time and effort required to resolve bugs. In this paper, we compared the effectiveness of semantic textual similarity methods for retrieving similar…
▽ More
Bug reports are an essential aspect of software development, and it is crucial to identify and resolve them quickly to ensure the consistent functioning of software systems. Retrieving similar bug reports from an existing database can help reduce the time and effort required to resolve bugs. In this paper, we compared the effectiveness of semantic textual similarity methods for retrieving similar bug reports based on a similarity score. We explored several embedding models such as TF-IDF (Baseline), FastText, Gensim, BERT, and ADA. We used the Software Defects Data containing bug reports for various software projects to evaluate the performance of these models. Our experimental results showed that BERT generally outperformed the rest of the models regarding recall, followed by ADA, Gensim, FastText, and TFIDF. Our study provides insights into the effectiveness of different embedding methods for retrieving similar bug reports and highlights the impact of selecting the appropriate one for this task. Our code is available on GitHub.
△ Less
Submitted 30 November, 2023; v1 submitted 17 August, 2023;
originally announced August 2023.
-
WhaleVis: Visualizing the History of Commercial Whaling
Authors:
Ameya Patil,
Zoe Rand,
Trevor Branch,
Leilani Battle
Abstract:
Whales are an important part of the oceanic ecosystem. Although historic commercial whale hunting a.k.a. whaling has severely threatened whale populations, whale researchers are looking at historical whaling data to inform current whale status and future conservation efforts. To facilitate this, we worked with experts in aquatic and fishery sciences to create WhaleVis -- an interactive dashboard f…
▽ More
Whales are an important part of the oceanic ecosystem. Although historic commercial whale hunting a.k.a. whaling has severely threatened whale populations, whale researchers are looking at historical whaling data to inform current whale status and future conservation efforts. To facilitate this, we worked with experts in aquatic and fishery sciences to create WhaleVis -- an interactive dashboard for the commercial whaling dataset maintained by the International Whaling Commission (IWC). We characterize key analysis tasks among whale researchers for this database, most important of which is inferring spatial distribution of whale populations over time. In addition to facilitating analysis of whale catches based on the spatio-temporal attributes, we use whaling expedition details to plot the search routes of expeditions. We propose a model of the catch data as a graph, where nodes represent catch locations, and edges represent whaling expedition routes. This model facilitates visual estimation of whale search effort and in turn the spatial distribution of whale populations normalized by the search effort -- a well known problem in fisheries research. It further opens up new avenues for graph analysis on the data, including more rigorous computation of spatial distribution of whales normalized by the search effort, and enabling new insight generation. We demonstrate the use of our dashboard through a real life use case.
△ Less
Submitted 8 August, 2023;
originally announced August 2023.
-
Decoding the age-chemical structure of the Milky Way disk: An application of Copulas and Elicitable Maps
Authors:
Aarya A. Patil,
Jo Bovy,
Sebastian Jaimungal,
Neige Frankel,
Henry W. Leung
Abstract:
In the Milky Way, the distribution of stars in the $[α/\mathrm{Fe}]$ vs. $[\mathrm{Fe/H}]$ and $[\mathrm{Fe/H}]$ vs. age planes holds essential information about the history of star formation, accretion, and dynamical evolution of the Galactic disk. We investigate these planes by applying novel statistical methods called copulas and elicitable maps to the ages and abundances of red giants in the A…
▽ More
In the Milky Way, the distribution of stars in the $[α/\mathrm{Fe}]$ vs. $[\mathrm{Fe/H}]$ and $[\mathrm{Fe/H}]$ vs. age planes holds essential information about the history of star formation, accretion, and dynamical evolution of the Galactic disk. We investigate these planes by applying novel statistical methods called copulas and elicitable maps to the ages and abundances of red giants in the APOGEE survey. We find that the low- and high-$α$ disk stars have a clean separation in copula space and use this to provide an automated separation of the $α$ sequences using a purely statistical approach. This separation reveals that the high-$α$ disk ends at the same [$α$/Fe] and age at high $[\mathrm{Fe/H}]$ as the low-$[\mathrm{Fe/H}]$ start of the low-$α$ disk, thus supporting a sequential formation scenario for the high- and low-$α$ disks. We then combine copulas with elicitable maps to precisely obtain the correlation between stellar age $τ$ and metallicity $[\mathrm{Fe/H}]$ conditional on Galactocentric radius $R$ and height $z$ in the range $0 < R < 20$ kpc and $|z| < 2$ kpc. The resulting trends in the age-metallicity correlation with radius, height, and [$α$/Fe] demonstrate a $\approx 0$ correlation wherever kinematically-cold orbits dominate, while the naively-expected negative correlation is present where kinematically-hot orbits dominate. This is consistent with the effects of spiral-driven radial migration, which must be strong enough to completely flatten the age-metallicity structure of the low-$α$ disk.
△ Less
Submitted 15 June, 2023;
originally announced June 2023.
-
Leveraging Language Identification to Enhance Code-Mixed Text Classification
Authors:
Gauri Takawane,
Abhishek Phaltankar,
Varad Patwardhan,
Aryan Patil,
Raviraj Joshi,
Mukta S. Takalikar
Abstract:
The usage of more than one language in the same text is referred to as Code Mixed. It is evident that there is a growing degree of adaption of the use of code-mixed data, especially English with a regional language, on social media platforms. Existing deep-learning models do not take advantage of the implicit language information in the code-mixed text. Our study aims to improve BERT-based models…
▽ More
The usage of more than one language in the same text is referred to as Code Mixed. It is evident that there is a growing degree of adaption of the use of code-mixed data, especially English with a regional language, on social media platforms. Existing deep-learning models do not take advantage of the implicit language information in the code-mixed text. Our study aims to improve BERT-based models performance on low-resource Code-Mixed Hindi-English Datasets by experimenting with language augmentation approaches. We propose a pipeline to improve code-mixed systems that comprise data preprocessing, word-level language identification, language augmentation, and model training on downstream tasks like sentiment analysis. For language augmentation in BERT models, we explore word-level interleaving and post-sentence placement of language information. We have examined the performance of vanilla BERT-based models and their code-mixed HingBERT counterparts on respective benchmark datasets, comparing their results with and without using word-level language information. The models were evaluated using metrics such as accuracy, precision, recall, and F1 score. Our findings show that the proposed language augmentation approaches work well across different BERT models. We demonstrate the importance of augmenting code-mixed text with language information on five different code-mixed Hindi-English downstream datasets based on sentiment analysis, hate speech detection, and emotion detection.
△ Less
Submitted 8 June, 2023;
originally announced June 2023.
-
DiViNeT: 3D Reconstruction from Disparate Views via Neural Template Regularization
Authors:
Aditya Vora,
Akshay Gadi Patil,
Hao Zhang
Abstract:
We present a volume rendering-based neural surface reconstruction method that takes as few as three disparate RGB images as input. Our key idea is to regularize the reconstruction, which is severely ill-posed and leaving significant gaps between the sparse views, by learning a set of neural templates to act as surface priors. Our method, coined DiViNet, operates in two stages. It first learns the…
▽ More
We present a volume rendering-based neural surface reconstruction method that takes as few as three disparate RGB images as input. Our key idea is to regularize the reconstruction, which is severely ill-posed and leaving significant gaps between the sparse views, by learning a set of neural templates to act as surface priors. Our method, coined DiViNet, operates in two stages. It first learns the templates, in the form of 3D Gaussian functions, across different scenes, without 3D supervision. In the reconstruction stage, our predicted templates serve as anchors to help "stitch'' the surfaces over sparse regions. We demonstrate that our approach is not only able to complete the surface geometry but also reconstructs surface details to a reasonable extent from a few disparate input views. On the DTU and BlendedMVS datasets, our approach achieves the best reconstruction quality among existing methods in the presence of such sparse views and performs on par, if not better, with competing methods when dense views are employed as inputs.
△ Less
Submitted 1 November, 2023; v1 submitted 7 June, 2023;
originally announced June 2023.
-
The ACROBAT 2022 Challenge: Automatic Registration Of Breast Cancer Tissue
Authors:
Philippe Weitz,
Masi Valkonen,
Leslie Solorzano,
Circe Carr,
Kimmo Kartasalo,
Constance Boissin,
Sonja Koivukoski,
Aino Kuusela,
Dusan Rasic,
Yanbo Feng,
Sandra Sinius Pouplier,
Abhinav Sharma,
Kajsa Ledesma Eriksson,
Stephanie Robertson,
Christian Marzahl,
Chandler D. Gatenbee,
Alexander R. A. Anderson,
Marek Wodzinski,
Artur Jurgas,
Niccolò Marini,
Manfredo Atzori,
Henning Müller,
Daniel Budelmann,
Nick Weiss,
Stefan Heldmann
, et al. (16 additional authors not shown)
Abstract:
The alignment of tissue between histopathological whole-slide-images (WSI) is crucial for research and clinical applications. Advances in computing, deep learning, and availability of large WSI datasets have revolutionised WSI analysis. Therefore, the current state-of-the-art in WSI registration is unclear. To address this, we conducted the ACROBAT challenge, based on the largest WSI registration…
▽ More
The alignment of tissue between histopathological whole-slide-images (WSI) is crucial for research and clinical applications. Advances in computing, deep learning, and availability of large WSI datasets have revolutionised WSI analysis. Therefore, the current state-of-the-art in WSI registration is unclear. To address this, we conducted the ACROBAT challenge, based on the largest WSI registration dataset to date, including 4,212 WSIs from 1,152 breast cancer patients. The challenge objective was to align WSIs of tissue that was stained with routine diagnostic immunohistochemistry to its H&E-stained counterpart. We compare the performance of eight WSI registration algorithms, including an investigation of the impact of different WSI properties and clinical covariates. We find that conceptually distinct WSI registration methods can lead to highly accurate registration performances and identify covariates that impact performances across methods. These results establish the current state-of-the-art in WSI registration and guide researchers in selecting and developing methods.
△ Less
Submitted 29 May, 2023;
originally announced May 2023.
-
Comparative Study of Pre-Trained BERT Models for Code-Mixed Hindi-English Data
Authors:
Aryan Patil,
Varad Patwardhan,
Abhishek Phaltankar,
Gauri Takawane,
Raviraj Joshi
Abstract:
The term "Code Mixed" refers to the use of more than one language in the same text. This phenomenon is predominantly observed on social media platforms, with an increasing amount of adaptation as time goes on. It is critical to detect foreign elements in a language and process them correctly, as a considerable number of individuals are using code-mixed languages that could not be comprehended by u…
▽ More
The term "Code Mixed" refers to the use of more than one language in the same text. This phenomenon is predominantly observed on social media platforms, with an increasing amount of adaptation as time goes on. It is critical to detect foreign elements in a language and process them correctly, as a considerable number of individuals are using code-mixed languages that could not be comprehended by understanding one of those languages. In this work, we focus on low-resource Hindi-English code-mixed language and enhancing the performance of different code-mixed natural language processing tasks such as sentiment analysis, emotion recognition, and hate speech identification. We perform a comparative analysis of different Transformer-based language Models pre-trained using unsupervised approaches. We have included the code-mixed models like HingBERT, HingRoBERTa, HingRoBERTa-Mixed, mBERT, and non-code-mixed models like AlBERT, BERT, and RoBERTa for comparative analysis of code-mixed Hindi-English downstream tasks. We report state-of-the-art results on respective datasets using HingBERT-based models which are specifically pre-trained on real code-mixed text. Our HingBERT-based models provide significant improvements thus highlighting the poor performance of vanilla BERT models on code-mixed text.
△ Less
Submitted 26 May, 2023; v1 submitted 25 May, 2023;
originally announced May 2023.
-
RoSI: Recovering 3D Shape Interiors from Few Articulation Images
Authors:
Akshay Gadi Patil,
Yiming Qian,
Shan Yang,
Brian Jackson,
Eric Bennett,
Hao Zhang
Abstract:
The dominant majority of 3D models that appear in gaming, VR/AR, and those we use to train geometric deep learning algorithms are incomplete, since they are modeled as surface meshes and missing their interior structures. We present a learning framework to recover the shape interiors (RoSI) of existing 3D models with only their exteriors from multi-view and multi-articulation images. Given a set o…
▽ More
The dominant majority of 3D models that appear in gaming, VR/AR, and those we use to train geometric deep learning algorithms are incomplete, since they are modeled as surface meshes and missing their interior structures. We present a learning framework to recover the shape interiors (RoSI) of existing 3D models with only their exteriors from multi-view and multi-articulation images. Given a set of RGB images that capture a target 3D object in different articulated poses, possibly from only few views, our method infers the interior planes that are observable in the input images. Our neural architecture is trained in a category-agnostic manner and it consists of a motion-aware multi-view analysis phase including pose, depth, and motion estimations, followed by interior plane detection in images and 3D space, and finally multi-view plane fusion. In addition, our method also predicts part articulations and is able to realize and even extrapolate the captured motions on the target 3D object. We evaluate our method by quantitative and qualitative comparisons to baselines and alternative solutions, as well as testing on untrained object categories and real image inputs to assess its generalization capabilities.
△ Less
Submitted 13 April, 2023;
originally announced April 2023.
-
Fast Marching based Tissue Adaptive Delay Estimation for Aberration Corrected Delay and Sum Beamforming in Ultrasound Imaging
Authors:
M. S. Asif,
Gayathri Malamal,
A. N. Madhavanunni,
Vikram Melapudi,
V Rahul,
Abhijit Patil,
Rajesh Langoju,
Mahesh Raveendranatha Panicker
Abstract:
Conventional ultrasound (US) imaging employs the delay and sum (DAS) receive beamforming with dynamic receive focus for image reconstruction due to its simplicity and robustness. However, the DAS beamforming follows a geometrical method of delay estimation with a spatially constant speed-of-sound (SoS) of 1540 m/s throughout the medium irrespective of the tissue in-homogeneity. This approximation…
▽ More
Conventional ultrasound (US) imaging employs the delay and sum (DAS) receive beamforming with dynamic receive focus for image reconstruction due to its simplicity and robustness. However, the DAS beamforming follows a geometrical method of delay estimation with a spatially constant speed-of-sound (SoS) of 1540 m/s throughout the medium irrespective of the tissue in-homogeneity. This approximation leads to errors in delay estimations that accumulate with depth and degrades the resolution, contrast and overall accuracy of the US image. In this work, we propose a fast marching based DAS for focused transmissions which leverages the approximate SoS map to estimate the refraction corrected propagation delays for each pixel in the medium. The proposed approach is validated qualitatively and quantitatively for imaging depths of upto ~ 11 cm through simulations, where fat layer induced aberration is employed to alter the SoS in the medium. To the best of authors' knowledge, this is the first work considering the effect of SoS on image quality for deeper imaging.
△ Less
Submitted 19 April, 2023; v1 submitted 7 April, 2023;
originally announced April 2023.
-
Advances in Data-Driven Analysis and Synthesis of 3D Indoor Scenes
Authors:
Akshay Gadi Patil,
Supriya Gadi Patil,
Manyi Li,
Matthew Fisher,
Manolis Savva,
Hao Zhang
Abstract:
This report surveys advances in deep learning-based modeling techniques that address four different 3D indoor scene analysis tasks, as well as synthesis of 3D indoor scenes. We describe different kinds of representations for indoor scenes, various indoor scene datasets available for research in the aforementioned areas, and discuss notable works employing machine learning models for such scene mod…
▽ More
This report surveys advances in deep learning-based modeling techniques that address four different 3D indoor scene analysis tasks, as well as synthesis of 3D indoor scenes. We describe different kinds of representations for indoor scenes, various indoor scene datasets available for research in the aforementioned areas, and discuss notable works employing machine learning models for such scene modeling tasks based on these representations. Specifically, we focus on the analysis and synthesis of 3D indoor scenes. With respect to analysis, we focus on four basic scene understanding tasks -- 3D object detection, 3D scene segmentation, 3D scene reconstruction and 3D scene similarity. And for synthesis, we mainly discuss neural scene synthesis works, though also highlighting model-driven methods that allow for human-centric, progressive scene synthesis. We identify the challenges involved in modeling scenes for these tasks and the kind of machinery that needs to be developed to adapt to the data representation, and the task setting in general. For each of these tasks, we provide a comprehensive summary of the state-of-the-art works across different axes such as the choice of data representation, backbone, evaluation metric, input, output, etc., providing an organized review of the literature. Towards the end, we discuss some interesting research directions that have the potential to make a direct impact on the way users interact and engage with these virtual scene models, making them an integral part of the metaverse.
△ Less
Submitted 21 August, 2023; v1 submitted 6 April, 2023;
originally announced April 2023.
-
XWikiGen: Cross-lingual Summarization for Encyclopedic Text Generation in Low Resource Languages
Authors:
Dhaval Taunk,
Shivprasad Sagare,
Anupam Patil,
Shivansh Subramanian,
Manish Gupta,
Vasudeva Varma
Abstract:
Lack of encyclopedic text contributors, especially on Wikipedia, makes automated text generation for low resource (LR) languages a critical problem. Existing work on Wikipedia text generation has focused on English only where English reference articles are summarized to generate English Wikipedia pages. But, for low-resource languages, the scarcity of reference articles makes monolingual summariza…
▽ More
Lack of encyclopedic text contributors, especially on Wikipedia, makes automated text generation for low resource (LR) languages a critical problem. Existing work on Wikipedia text generation has focused on English only where English reference articles are summarized to generate English Wikipedia pages. But, for low-resource languages, the scarcity of reference articles makes monolingual summarization ineffective in solving this problem. Hence, in this work, we propose XWikiGen, which is the task of cross-lingual multi-document summarization of text from multiple reference articles, written in various languages, to generate Wikipedia-style text. Accordingly, we contribute a benchmark dataset, XWikiRef, spanning ~69K Wikipedia articles covering five domains and eight languages. We harness this dataset to train a two-stage system where the input is a set of citations and a section title and the output is a section-specific LR summary. The proposed system is based on a novel idea of neural unsupervised extractive summarization to coarsely identify salient information followed by a neural abstractive model to generate the section-specific text. Extensive experiments show that multi-domain training is better than the multi-lingual setup on average.
△ Less
Submitted 18 April, 2023; v1 submitted 22 March, 2023;
originally announced March 2023.
-
Active Coarse-to-Fine Segmentation of Moveable Parts from Real Images
Authors:
Ruiqi Wang,
Akshay Gadi Patil,
Fenggen Yu,
Hao Zhang
Abstract:
We introduce the first active learning (AL) model for high-accuracy instance segmentation of moveable parts from RGB images of real indoor scenes. Specifically, our goal is to obtain fully validated segmentation results by humans while minimizing manual effort. To this end, we employ a transformer that utilizes a masked-attention mechanism to supervise the active segmentation. To enhance the netwo…
▽ More
We introduce the first active learning (AL) model for high-accuracy instance segmentation of moveable parts from RGB images of real indoor scenes. Specifically, our goal is to obtain fully validated segmentation results by humans while minimizing manual effort. To this end, we employ a transformer that utilizes a masked-attention mechanism to supervise the active segmentation. To enhance the network tailored to moveable parts, we introduce a coarse-to-fine AL approach which first uses an object-aware masked attention and then a pose-aware one, leveraging the hierarchical nature of the problem and a correlation between moveable parts and object poses and interaction directions. When applying our AL model to 2,000 real images, we obtain fully validated moveable part segmentations with semantic labels, by only needing to manually annotate 11.45% of the images. This translates to significant (60%) time saving over manual effort required by the best non-AL model to attain the same segmentation accuracy. At last, we contribute a dataset of 2,550 real images with annotated moveable parts, demonstrating its superior quality and diversity over the best alternatives.
△ Less
Submitted 7 July, 2024; v1 submitted 20 March, 2023;
originally announced March 2023.
-
Robust Semi-Supervised Learning for Histopathology Images through Self-Supervision Guided Out-of-Distribution Scoring
Authors:
Nikhil Cherian Kurian,
Varsha S,
Abhijit Patil,
Shashikant Khade,
Amit Sethi
Abstract:
Semi-supervised learning (semi-SL) is a promising alternative to supervised learning for medical image analysis when obtaining good quality supervision for medical imaging is difficult. However, semi-SL assumes that the underlying distribution of unaudited data matches that of the few labeled samples, which is often violated in practical settings, particularly in medical images. The presence of ou…
▽ More
Semi-supervised learning (semi-SL) is a promising alternative to supervised learning for medical image analysis when obtaining good quality supervision for medical imaging is difficult. However, semi-SL assumes that the underlying distribution of unaudited data matches that of the few labeled samples, which is often violated in practical settings, particularly in medical images. The presence of out-of-distribution (OOD) samples in the unlabeled training pool of semi-SL is inevitable and can reduce the efficiency of the algorithm. Common preprocessing methods to filter out outlier samples may not be suitable for medical images that involve a wide range of anatomical structures and rare morphologies. In this paper, we propose a novel pipeline for addressing open-set supervised learning challenges in digital histology images. Our pipeline efficiently estimates an OOD score for each unlabelled data point based on self-supervised learning to calibrate the knowledge needed for a subsequent semi-SL framework. The outlier score derived from the OOD detector is used to modulate sample selection for the subsequent semi-SL stage, ensuring that samples conforming to the distribution of the few labeled samples are more frequently exposed to the subsequent semi-SL framework. Our framework is compatible with any semi-SL framework, and we base our experiments on the popular Mixmatch semi-SL framework. We conduct extensive studies on two digital pathology datasets, Kather colorectal histology dataset and a dataset derived from TCGA-BRCA whole slide images, and establish the effectiveness of our method by comparing with popular methods and frameworks in semi-SL algorithms through various experiments.
△ Less
Submitted 17 March, 2023;
originally announced March 2023.