subscribe to arXiv mailings

KinDEL: DNA-Encoded Library Dataset for Kinase Inhibitors

Authors: Benson Chen, Tomasz Danel, Patrick J. McEnaney, Nikhil Jain, Kirill Novikov, Spurti Umesh Akki, Joshua L. Turnbull, Virja Atul Pandya, Boris P. Belotserkovskii, Jared Bryce Weaver, Ankita Biswas, Dat Nguyen, Gabriel H. S. Dreiman, Mohammad Sultan, Nathaniel Stanley, Daniel M Whalen, Divya Kanichar, Christoph Klein, Emily Fox, R. Edward Watts

Abstract: DNA-Encoded Libraries (DEL) are combinatorial small molecule libraries that offer an efficient way to characterize diverse chemical spaces. Selection experiments using DELs are pivotal to drug discovery efforts, enabling high-throughput screens for hit finding. However, limited availability of public DEL datasets hinders the advancement of computational techniques designed to process such data. To… ▽ More DNA-Encoded Libraries (DEL) are combinatorial small molecule libraries that offer an efficient way to characterize diverse chemical spaces. Selection experiments using DELs are pivotal to drug discovery efforts, enabling high-throughput screens for hit finding. However, limited availability of public DEL datasets hinders the advancement of computational techniques designed to process such data. To bridge this gap, we present KinDEL, one of the first large, publicly available DEL datasets on two kinases: Mitogen-Activated Protein Kinase 14 (MAPK14) and Discoidin Domain Receptor Tyrosine Kinase 1 (DDR1). Interest in this data modality is growing due to its ability to generate extensive supervised chemical data that densely samples around select molecular structures. Demonstrating one such application of the data, we benchmark different machine learning techniques to develop predictive models for hit identification; in particular, we highlight recent structure-based probabilistic approaches. Finally, we provide biophysical assay data, both on- and off-DNA, to validate our models on a smaller subset of molecules. Data and code for our benchmarks can be found at: https://github.com/insitro/kindel. △ Less

Submitted 11 October, 2024; originally announced October 2024.

arXiv:2409.09595 [pdf, other]

Neutral pion to two-photons transition form factor revisited

Authors: M. Atif Sultan, Jiayin Kang, Adnan Bashir, Lei Chang

Abstract: Based upon a combined formalism of Schwinger-Dyson and Bethe-Salpeter equations in quantum chromodynamics (QCD), we propose a QCD kindred algebraic model for the dressed quark propagator, for the Bethe-Salpeter amplitude of the pion and the electromagnetic quark-photon interaction vertex. We then compute the $γ^{*}π^0γ$ transition form factor $G^{γ^{*}π^0γ}(Q^2)$ for a wide range of photon momentu… ▽ More Based upon a combined formalism of Schwinger-Dyson and Bethe-Salpeter equations in quantum chromodynamics (QCD), we propose a QCD kindred algebraic model for the dressed quark propagator, for the Bethe-Salpeter amplitude of the pion and the electromagnetic quark-photon interaction vertex. We then compute the $γ^{*}π^0γ$ transition form factor $G^{γ^{*}π^0γ}(Q^2)$ for a wide range of photon momentum transfer squared $Q^2$. The quark propagator is expanded out in its perturbative functional form but with dynamically generated dressed quark mass. It has complex conjugate pole singularities in the complex-momentum plane which is motivated by the solution of the quark gap equation with rainbow-ladder truncation of the infinite set of Schwinger-Dyson equations. This complex pole singularity structure of the quark propagator can be associated with a signal of confinement which prevents quarks to become stable asymptotic states. The Bethe-Salpeter amplitude is expressed without a spectral density function, which encapsulate its low and large momentum behaviour. The QCD evolution of the distribution amplitude is also incorporated into our model through the direct implementation of Efremov-Radyushkin-Brodsky-Lepage evolution equations. We include the effects of the quark anomalous magnetic moment in the description of the quark-photon vertex whose infrared enhancement is known to dictate hadronic properties. Once the QCD kindred model is constructed, we calculate the form factor $G^{γ^{*}π^0γ}(Q^2)$ and find it consistent with direct QCD-based studies as well as most available experimental data. It slightly exceeds the conformal limit for large $Q^2$ which might be attributed to the scaling violations in QCD. The associated interaction radius and neutral pion decay width turn out to be compatible with experimental data. △ Less

Submitted 14 September, 2024; originally announced September 2024.

Comments: 11 pages, 4 figures

arXiv:2409.04996 [pdf, ps, other]

Contact interaction treatment of $\mathcal{V}\to\mathcal{P}γ$ for light-quark mesons

Authors: Yehan Xu, M. Atif Sultan, Khépani Raya, Lei Chang

Abstract: The $\mathcal{V}\to\mathcal{P}γ$ and $η(η^\prime) \to γγ$ decays are evaluated within a Dyson-Schwinger and Bethe-Salpeter equations framework (here $\mathcal{V}=\{ρ^{\pm},K^{\star\pm},φ\}$ and $\mathcal{P}=\{π^{\pm},K^{\pm},η,η^{\prime}\}$). The so-called impulse approximation (IA) is employed in the computation of the decay constants involved and decay widths, and so in the estimation of the ass… ▽ More The $\mathcal{V}\to\mathcal{P}γ$ and $η(η^\prime) \to γγ$ decays are evaluated within a Dyson-Schwinger and Bethe-Salpeter equations framework (here $\mathcal{V}=\{ρ^{\pm},K^{\star\pm},φ\}$ and $\mathcal{P}=\{π^{\pm},K^{\pm},η,η^{\prime}\}$). The so-called impulse approximation (IA) is employed in the computation of the decay constants involved and decay widths, and so in the estimation of the associated charge and interaction radii. For their part, the required propagators and vertices stem from a contact interaction model, embedded within a beyond rainbow-ladder (RL) truncation that accounts for the typical ladder exchanges, quark anomalous magnetic moment, as well as the non-Abelian anomaly. While the examined transitions produce decay widths plainly compatible with the available experimental data, those processes involving the $η-η'$ mesons highlight the incompleteness of the IA when considering beyond RL effects in the interaction kernels. △ Less

Submitted 8 September, 2024; originally announced September 2024.

Comments: 9 pages

arXiv:2408.11879 [pdf, other]

Beyond Labels: Aligning Large Language Models with Human-like Reasoning

Authors: Muhammad Rafsan Kabir, Rafeed Mohammad Sultan, Ihsanul Haque Asif, Jawad Ibn Ahad, Fuad Rahman, Mohammad Ruhul Amin, Nabeel Mohammed, Shafin Rahman

Abstract: Aligning large language models (LLMs) with a human reasoning approach ensures that LLMs produce morally correct and human-like decisions. Ethical concerns are raised because current models are prone to generating false positives and providing malicious responses. To contribute to this issue, we have curated an ethics dataset named Dataset for Aligning Reasons (DFAR), designed to aid in aligning la… ▽ More Aligning large language models (LLMs) with a human reasoning approach ensures that LLMs produce morally correct and human-like decisions. Ethical concerns are raised because current models are prone to generating false positives and providing malicious responses. To contribute to this issue, we have curated an ethics dataset named Dataset for Aligning Reasons (DFAR), designed to aid in aligning language models to generate human-like reasons. The dataset comprises statements with ethical-unethical labels and their corresponding reasons. In this study, we employed a unique and novel fine-tuning approach that utilizes ethics labels and their corresponding reasons (L+R), in contrast to the existing fine-tuning approach that only uses labels (L). The original pre-trained versions, the existing fine-tuned versions, and our proposed fine-tuned versions of LLMs were then evaluated on an ethical-unethical classification task and a reason-generation task. Our proposed fine-tuning strategy notably outperforms the others in both tasks, achieving significantly higher accuracy scores in the classification task and lower misalignment rates in the reason-generation task. The increase in classification accuracies and decrease in misalignment rates indicate that the L+R fine-tuned models align more with human ethics. Hence, this study illustrates that injecting reasons has substantially improved the alignment of LLMs, resulting in more human-like responses. We have made the DFAR dataset and corresponding codes publicly available at https://github.com/apurba-nsu-rnd-lab/DFAR. △ Less

Submitted 20 August, 2024; originally announced August 2024.

Comments: Accepted in ICPR 2024

arXiv:2408.08388 [pdf, other]

Classification of High-dimensional Time Series in Spectral Domain using Explainable Features

Authors: Sarbojit Roy, Malik Shahid Sultan, Hernando Ombao

Abstract: Interpretable classification of time series presents significant challenges in high dimensions. Traditional feature selection methods in the frequency domain often assume sparsity in spectral density matrices (SDMs) or their inverses, which can be restrictive for real-world applications. In this article, we propose a model-based approach for classifying high-dimensional stationary time series by a… ▽ More Interpretable classification of time series presents significant challenges in high dimensions. Traditional feature selection methods in the frequency domain often assume sparsity in spectral density matrices (SDMs) or their inverses, which can be restrictive for real-world applications. In this article, we propose a model-based approach for classifying high-dimensional stationary time series by assuming sparsity in the difference between inverse SDMs. Our approach emphasizes the interpretability of model parameters, making it especially suitable for fields like neuroscience, where understanding differences in brain network connectivity across various states is crucial. The estimators for model parameters demonstrate consistency under appropriate conditions. We further propose using standard deep learning optimizers for parameter estimation, employing techniques such as mini-batching and learning rate scheduling. Additionally, we introduce a method to screen the most discriminatory frequencies for classification, which exhibits the sure screening property under general conditions. The flexibility of the proposed model allows the significance of covariates to vary across frequencies, enabling nuanced inferences and deeper insights into the underlying problem. The novelty of our method lies in the interpretability of the model parameters, addressing critical needs in neuroscience. The proposed approaches have been evaluated on simulated examples and the `Alert-vs-Drowsy' EEG dataset. △ Less

Submitted 15 August, 2024; originally announced August 2024.

arXiv:2407.10437 [pdf, other]

Gravitational form factors of pseudoscalar mesons in a contact interaction

Authors: M. Atif Sultan, Zanbin Xing, Khépani Raya, Adnan Bashir, Lei Chang

Abstract: Given the unique role played by the gravitational form factors (GFFs) in unraveling the internal mechanics of hadrons, we examine the GFFs of ground state pseudoscalar mesons $π$, $η_c$, $η_b$ and the hypothetical {\em strangeonium} $η_s(s\bar{s})$. We adopt the coupled framework of Dyson-Schwinger and Bethe-Salpeter equations within a contact interaction, and employ a novel approach to the dresse… ▽ More Given the unique role played by the gravitational form factors (GFFs) in unraveling the internal mechanics of hadrons, we examine the GFFs of ground state pseudoscalar mesons $π$, $η_c$, $η_b$ and the hypothetical {\em strangeonium} $η_s(s\bar{s})$. We adopt the coupled framework of Dyson-Schwinger and Bethe-Salpeter equations within a contact interaction, and employ a novel approach to the dressed amputated meson-meson scattering amplitude which makes connection with the energy-momentum tensor and with the GFFs. The resulting GFFs fulfill the anticipated symmetry constraints. The corresponding charge and mass radii and the $D-$term are also computed. We show that the $D-$term for the pseudoscalar mesons is bounded within the $(-1, -1/3)$ range; these bounds correspond to the massless (chiral limit) and infinitely massive cases, respectively. Considering the current interest in the GFFs, understanding the \textit{D}-term of pseudoscalar mesons and their GFFs can provide an important first step for future endeavors in the field. △ Less

Submitted 29 August, 2024; v1 submitted 15 July, 2024; originally announced July 2024.

Comments: 10 pages, 5 figures

arXiv:2407.07254 [pdf, other]

HAMIL-QA: Hierarchical Approach to Multiple Instance Learning for Atrial LGE MRI Quality Assessment

Authors: K M Arefeen Sultan, Md Hasibul Husain Hisham, Benjamin Orkild, Alan Morris, Eugene Kholmovski, Erik Bieging, Eugene Kwan, Ravi Ranjan, Ed DiBella, Shireen Elhabian

Abstract: The accurate evaluation of left atrial fibrosis via high-quality 3D Late Gadolinium Enhancement (LGE) MRI is crucial for atrial fibrillation management but is hindered by factors like patient movement and imaging variability. The pursuit of automated LGE MRI quality assessment is critical for enhancing diagnostic accuracy, standardizing evaluations, and improving patient outcomes. The deep learnin… ▽ More The accurate evaluation of left atrial fibrosis via high-quality 3D Late Gadolinium Enhancement (LGE) MRI is crucial for atrial fibrillation management but is hindered by factors like patient movement and imaging variability. The pursuit of automated LGE MRI quality assessment is critical for enhancing diagnostic accuracy, standardizing evaluations, and improving patient outcomes. The deep learning models aimed at automating this process face significant challenges due to the scarcity of expert annotations, high computational costs, and the need to capture subtle diagnostic details in highly variable images. This study introduces HAMIL-QA, a multiple instance learning (MIL) framework, designed to overcome these obstacles. HAMIL-QA employs a hierarchical bag and sub-bag structure that allows for targeted analysis within sub-bags and aggregates insights at the volume level. This hierarchical MIL approach reduces reliance on extensive annotations, lessens computational load, and ensures clinically relevant quality predictions by focusing on diagnostically critical image features. Our experiments show that HAMIL-QA surpasses existing MIL methods and traditional supervised approaches in accuracy, AUROC, and F1-Score on an LGE MRI scan dataset, demonstrating its potential as a scalable solution for LGE MRI quality assessment automation. The code is available at: $\href{https://github.com/arf111/HAMIL-QA}{\text{this https URL}}$ △ Less

Submitted 9 July, 2024; originally announced July 2024.

Comments: Accepted to MICCAI2024, 10 pages, 2 figures

arXiv:2407.06062 [pdf, ps, other]

Stellar structure in $f(R,T)$ gravity: some exact solutions

Authors: Aliya Batool, Abdul Malik Sultan, Gonzalo J. Olmo, Diego Rubiera-Garcia

Abstract: We find some exact solutions for constant-density and quark matter equations of state in stellar structure models framed within the $f(R,T)=R+λκ^2 T$ theory of gravity, where $R$ is the curvature scalar, $T$ the trace of the stress-energy tensor, and $λ$ some constant. These solutions correspond to specific values of the constant $λ$, and represent different compactness states of the corresponding… ▽ More We find some exact solutions for constant-density and quark matter equations of state in stellar structure models framed within the $f(R,T)=R+λκ^2 T$ theory of gravity, where $R$ is the curvature scalar, $T$ the trace of the stress-energy tensor, and $λ$ some constant. These solutions correspond to specific values of the constant $λ$, and represent different compactness states of the corresponding stars, though only those made of quark matter can be regarded as physical. The latter modify the compactness (Buchdahl) limit of neutron stars upwards for $λ>0$ until saturating the one of black holes. Our results show that it is possible to find useful insights on stellar structure in this class of theories, a fact that could be used for obtaining constraints on limiting masses such as the minimum hydrogen burning mass. △ Less

Submitted 8 July, 2024; originally announced July 2024.

Comments: 6 pages, 1 figure, revtex style

arXiv:2406.15657 [pdf, other]

FIRST: Faster Improved Listwise Reranking with Single Token Decoding

Authors: Revanth Gangi Reddy, JaeHyeok Doo, Yifei Xu, Md Arafat Sultan, Deevya Swain, Avirup Sil, Heng Ji

Abstract: Large Language Models (LLMs) have significantly advanced the field of information retrieval, particularly for reranking. Listwise LLM rerankers have showcased superior performance and generalizability compared to existing supervised approaches. However, conventional listwise LLM reranking methods lack efficiency as they provide ranking output in the form of a generated ordered sequence of candidat… ▽ More Large Language Models (LLMs) have significantly advanced the field of information retrieval, particularly for reranking. Listwise LLM rerankers have showcased superior performance and generalizability compared to existing supervised approaches. However, conventional listwise LLM reranking methods lack efficiency as they provide ranking output in the form of a generated ordered sequence of candidate passage identifiers. Further, they are trained with the typical language modeling objective, which treats all ranking errors uniformly--potentially at the cost of misranking highly relevant passages. Addressing these limitations, we introduce FIRST, a novel listwise LLM reranking approach leveraging the output logits of the first generated identifier to directly obtain a ranked ordering of the candidates. Further, we incorporate a learning-to-rank loss during training, prioritizing ranking accuracy for the more relevant passages. Empirical results demonstrate that FIRST accelerates inference by 50% while maintaining a robust ranking performance with gains across the BEIR benchmark. Finally, to illustrate the practical effectiveness of listwise LLM rerankers, we investigate their application in providing relevance feedback for retrievers during inference. Our results show that LLM rerankers can provide a stronger distillation signal compared to cross-encoders, yielding substantial improvements in retriever recall after relevance feedback. △ Less

Submitted 21 June, 2024; originally announced June 2024.

Comments: Preprint

arXiv:2406.11706 [pdf, other]

Prompts as Auto-Optimized Training Hyperparameters: Training Best-in-Class IR Models from Scratch with 10 Gold Labels

Authors: Jasper Xian, Saron Samuel, Faraz Khoubsirat, Ronak Pradeep, Md Arafat Sultan, Radu Florian, Salim Roukos, Avirup Sil, Christopher Potts, Omar Khattab

Abstract: We develop a method for training small-scale (under 100M parameter) neural information retrieval models with as few as 10 gold relevance labels. The method depends on generating synthetic queries for documents using a language model (LM), and the key step is that we automatically optimize the LM prompt that is used to generate these queries based on training quality. In experiments with the BIRCO… ▽ More We develop a method for training small-scale (under 100M parameter) neural information retrieval models with as few as 10 gold relevance labels. The method depends on generating synthetic queries for documents using a language model (LM), and the key step is that we automatically optimize the LM prompt that is used to generate these queries based on training quality. In experiments with the BIRCO benchmark, we find that models trained with our method outperform RankZephyr and are competitive with RankLLama, both of which are 7B parameter models trained on over 100K labels. These findings point to the power of automatic prompt optimization for synthetic dataset generation. △ Less

Submitted 17 June, 2024; originally announced June 2024.

arXiv:2406.06779 [pdf, other]

Synergistic Sensing: Application of SiNWs-PANI:MO$_x$ Heterostructures for Human Respiratory Monitoring

Authors: M. T. Sultan, A. Dumitru, E. A. Fakhri, R. E. Brophy, S. T. Ingvarsson, A. Manolescu, H. G. Svavarsson

Abstract: In this study we investigate novel hybrid structure of silicon nanowires (SiNWs) coated with PANI:metaloxide(MO$_x$) nanoparticles i.e., WO$_3$ and TiO$_2$. The SiNWs were fabricated using MACE, whereas PANI:MO$_x$ were deposited using chemical oxidative polymerization method on SiNWs. To this date little attempts has been done to utilize such hybrid structures for respiratory sensing. The structu… ▽ More In this study we investigate novel hybrid structure of silicon nanowires (SiNWs) coated with PANI:metaloxide(MO$_x$) nanoparticles i.e., WO$_3$ and TiO$_2$. The SiNWs were fabricated using MACE, whereas PANI:MO$_x$ were deposited using chemical oxidative polymerization method on SiNWs. To this date little attempts has been done to utilize such hybrid structures for respiratory sensing. The structures were characterized using RAMAN spectroscopy, X-ray diffraction, Electron disperssive spectroscopy, and Scanning electron microscopy. The electrical characterization to obtain respiratory sensing reveals excellent response compared to those obtained for SiNWs:MO$_x$ and SiNWs:PANI. Such enhancement in sensitivity is attributed to formation p-n heterojunction along side with wider conduction channel provided of PANI, increased porosity in SiNWs/PANI:WO$_3$ hybrid structures, providing active sites, increased oxygen vacancies and large surface area compared to that of pure MO$_x$ nanoparticles. Further, an improved drift in base line and sensor stability was established for the structure with PANI:WO$_3$ as compared to the PANI:TiO$_2$. △ Less

Submitted 10 June, 2024; originally announced June 2024.

Comments: 12 figures, 16 pages, 44 references

arXiv:2406.04346 [pdf, other]

doi 10.1145/3644815.3644981

Automating Patch Set Generation from Code Review Comments Using Large Language Models

Authors: Tajmilur Rahman, Rahul Singh, Mir Yousuf Sultan

Abstract: The advent of Large Language Models (LLMs) has revolutionized various domains of artificial intelligence, including the realm of software engineering. In this research, we evaluate the efficacy of pre-trained LLMs in replicating the tasks traditionally performed by developers in response to code review comments. We provide code contexts to five popular LLMs and obtain the suggested code-changes (p… ▽ More The advent of Large Language Models (LLMs) has revolutionized various domains of artificial intelligence, including the realm of software engineering. In this research, we evaluate the efficacy of pre-trained LLMs in replicating the tasks traditionally performed by developers in response to code review comments. We provide code contexts to five popular LLMs and obtain the suggested code-changes (patch sets) derived from real-world code-review comments. The performance of each model is meticulously assessed by comparing their generated patch sets against the historical data of human-generated patch-sets from the same repositories. This comparative analysis aims to determine the accuracy, relevance, and depth of the LLMs' feedback, thereby evaluating their readiness to support developers in responding to code-review comments. Novelty: This particular research area is still immature requiring a substantial amount of studies yet to be done. No prior research has compared the performance of existing Large Language Models (LLMs) in code-review comments. This in-progress study assesses current LLMs in code review and paves the way for future advancements in automated code quality assurance, reducing context-switching overhead due to interruptions from code change requests. △ Less

Submitted 9 April, 2024; originally announced June 2024.

Comments: 2 pages

arXiv:2403.00827 [pdf, other]

Self-Refinement of Language Models from External Proxy Metrics Feedback

Authors: Keshav Ramji, Young-Suk Lee, Ramón Fernandez Astudillo, Md Arafat Sultan, Tahira Naseem, Asim Munawar, Radu Florian, Salim Roukos

Abstract: It is often desirable for Large Language Models (LLMs) to capture multiple objectives when providing a response. In document-grounded response generation, for example, agent responses are expected to be relevant to a user's query while also being grounded in a given document. In this paper, we introduce Proxy Metric-based Self-Refinement (ProMiSe), which enables an LLM to refine its own initial re… ▽ More It is often desirable for Large Language Models (LLMs) to capture multiple objectives when providing a response. In document-grounded response generation, for example, agent responses are expected to be relevant to a user's query while also being grounded in a given document. In this paper, we introduce Proxy Metric-based Self-Refinement (ProMiSe), which enables an LLM to refine its own initial response along key dimensions of quality guided by external metrics feedback, yielding an overall better final response. ProMiSe leverages feedback on response quality through principle-specific proxy metrics, and iteratively refines its response one principle at a time. We apply ProMiSe to open source language models Flan-T5-XXL and Llama-2-13B-Chat, to evaluate its performance on document-grounded question answering datasets, MultiDoc2Dial and QuAC, demonstrating that self-refinement improves response quality. We further show that fine-tuning Llama-2-13B-Chat on the synthetic dialogue data generated by ProMiSe yields significant performance improvements over the zero-shot baseline as well as a supervised fine-tuned model on human annotated data. △ Less

Submitted 27 February, 2024; originally announced March 2024.

arXiv:2402.11770 [pdf, other]

Structured Chain-of-Thought Prompting for Few-Shot Generation of Content-Grounded QA Conversations

Authors: Md Arafat Sultan, Jatin Ganhotra, Ramón Fernandez Astudillo

Abstract: We introduce a structured chain-of-thought (SCoT) prompting approach to generating content-grounded multi-turn question-answer conversations using a pre-trained large language model (LLM). At the core of our proposal is a structured breakdown of the complex task into a number of states in a state machine, so that actions corresponding to various subtasks, e.g., content reading and utterance genera… ▽ More We introduce a structured chain-of-thought (SCoT) prompting approach to generating content-grounded multi-turn question-answer conversations using a pre-trained large language model (LLM). At the core of our proposal is a structured breakdown of the complex task into a number of states in a state machine, so that actions corresponding to various subtasks, e.g., content reading and utterance generation, can be executed in their own dedicated states. Each state leverages a unique set of resources including prompts and (optionally) additional tools to augment the generation process. Our experimental results show that SCoT prompting with designated states for hallucination mitigation increases agent faithfulness to grounding documents by up to 16.8%. When used as training data, our open-domain conversations synthesized from only 6 Wikipedia-based seed demonstrations train strong conversational QA agents; in out-of-domain evaluation, for example, we observe improvements of up to 13.9% over target domain gold data when the latter is augmented with our generated examples. △ Less

Submitted 19 February, 2024; v1 submitted 18 February, 2024; originally announced February 2024.

arXiv:2401.06356 [pdf, other]

An Empirical Investigation into the Effect of Parameter Choices in Knowledge Distillation

Authors: Md Arafat Sultan, Aashka Trivedi, Parul Awasthy, Avirup Sil

Abstract: We present a large-scale empirical study of how choices of configuration parameters affect performance in knowledge distillation (KD). An example of such a KD parameter is the measure of distance between the predictions of the teacher and the student, common choices for which include the mean squared error (MSE) and the KL-divergence. Although scattered efforts have been made to understand the dif… ▽ More We present a large-scale empirical study of how choices of configuration parameters affect performance in knowledge distillation (KD). An example of such a KD parameter is the measure of distance between the predictions of the teacher and the student, common choices for which include the mean squared error (MSE) and the KL-divergence. Although scattered efforts have been made to understand the differences between such options, the KD literature still lacks a systematic study on their general effect on student performance. We take an empirical approach to this question in this paper, seeking to find out the extent to which such choices influence student performance across 13 datasets from 4 NLP tasks and 3 student sizes. We quantify the cost of making sub-optimal choices and identify a single configuration that performs well across the board. △ Less

Submitted 18 February, 2024; v1 submitted 11 January, 2024; originally announced January 2024.

arXiv:2401.03169 [pdf, other]

QCD anomalies in electromagnetic processes: A solution to the $γ\to3π$ puzzle

Authors: Zanbin Xing, Hao Dang, M. Atif Sultan, Khépani Raya, Lei Chang

Abstract: In this work, the $γ\to3π$ form factor is calculated within the Dyson-Schwinger equations framework using a contact interaction model within the so-called modified rainbow ladder truncation. The present calculation takes into account the pseudovector component in the pion Bethe-Salpeter amplitude (BSA) and $π-π$ scattering effects, producing a $γ\to3π$ anomaly which is $1+6\mathcal{R}_π^2$ larger… ▽ More In this work, the $γ\to3π$ form factor is calculated within the Dyson-Schwinger equations framework using a contact interaction model within the so-called modified rainbow ladder truncation. The present calculation takes into account the pseudovector component in the pion Bethe-Salpeter amplitude (BSA) and $��-π$ scattering effects, producing a $γ\to3π$ anomaly which is $1+6\mathcal{R}_π^2$ larger than the low energy prediction. Here $\mathcal{R_π}$ is the relative ratio of the pseudovector and pseudoscalar components in the pion BSA; with our parameters input, this correction raises the $γ\to3π$ anomaly by around $10\%$. The main outcome of this work is the unveiling of the origin of such correction, which could be a possible explanation of the discrepancy between the existing experimental data and the low energy prediction. Moreover, it is highlighted how the magnitude of the anomaly is affected in effective theories that require an irremovable ultraviolet cutoff. We find that for both the anomalous processes $π\to2γ$ and $γ\to 3π$, the missing contribution to the anomaly can be compensated by the additional structures related with the quark anomalous magnetic moment. △ Less

Submitted 11 January, 2024; v1 submitted 6 January, 2024; originally announced January 2024.

Comments: 10 pages, 3 figures, references added

arXiv:2312.00953 [pdf, other]

Deep Image prior with StruCtUred Sparsity (DISCUS) for dynamic MRI reconstruction

Authors: Muhammad A. Sultan, Chong Chen, Yingmin Liu, Xuan Lei, Rizwan Ahmad

Abstract: High-quality training data are not always available in dynamic MRI. To address this, we propose a self-supervised deep learning method called deep image prior with structured sparsity (DISCUS) for reconstructing dynamic images. DISCUS is inspired by deep image prior (DIP) and recovers a series of images through joint optimization of network parameters and input code vectors. However, DISCUS additi… ▽ More High-quality training data are not always available in dynamic MRI. To address this, we propose a self-supervised deep learning method called deep image prior with structured sparsity (DISCUS) for reconstructing dynamic images. DISCUS is inspired by deep image prior (DIP) and recovers a series of images through joint optimization of network parameters and input code vectors. However, DISCUS additionally encourages group sparsity on frame-specific code vectors to discover the low-dimensional manifold that describes temporal variations across frames. Compared to prior work on manifold learning, DISCUS does not require specifying the manifold dimensionality. We validate DISCUS using three numerical studies. In the first study, we simulate a dynamic Shepp-Logan phantom with frames undergoing random rotations, translations, or both, and demonstrate that DISCUS can discover the dimensionality of the underlying manifold. In the second study, we use data from a realistic late gadolinium enhancement (LGE) phantom to compare DISCUS with compressed sensing (CS) and DIP, and to demonstrate the positive impact of group sparsity. In the third study, we use retrospectively undersampled single-shot LGE data from five patients to compare DISCUS with CS reconstructions. The results from these studies demonstrate that DISCUS outperforms CS and DIP, and that enforcing group sparsity on the code vectors helps discover true manifold dimensionality and provides additional performance gain. △ Less

Submitted 24 May, 2024; v1 submitted 1 December, 2023; originally announced December 2023.

Comments: To appear in 2024 ISBI

arXiv:2312.00936 [pdf, other]

Surface Coil Intensity Correction for MRI

Authors: Xuan Lei, Philip Schniter, Chong Chen, Muhammad A. Sultan, Rizwan Ahmad

Abstract: Modern MRI scanners utilize one or more arrays of small receive-only coils to collect k-space data. The sensitivity maps of the coils, when estimated using traditional methods, differ from the true sensitivity maps, which are generally unknown. Consequently, the reconstructed MR images exhibit undesired spatial variation in intensity. These intensity variations can be at least partially corrected… ▽ More Modern MRI scanners utilize one or more arrays of small receive-only coils to collect k-space data. The sensitivity maps of the coils, when estimated using traditional methods, differ from the true sensitivity maps, which are generally unknown. Consequently, the reconstructed MR images exhibit undesired spatial variation in intensity. These intensity variations can be at least partially corrected using pre-scan data. In this work, we propose an intensity correction method that utilizes pre-scan data. For demonstration, we apply our method to a digital phantom, as well as to cardiac MRI data collected on a commercial scanner by Siemens Healthineers. The code is available at https://github.com/OSU-MR/SCC. △ Less

Submitted 24 May, 2024; v1 submitted 1 December, 2023; originally announced December 2023.

arXiv:2311.08640 [pdf, other]

Multistage Collaborative Knowledge Distillation from a Large Language Model for Semi-Supervised Sequence Generation

Authors: Jiachen Zhao, Wenlong Zhao, Andrew Drozdov, Benjamin Rozonoyer, Md Arafat Sultan, Jay-Yoon Lee, Mohit Iyyer, Andrew McCallum

Abstract: We study semi-supervised sequence generation tasks, where the few labeled examples are too scarce to finetune a model, and meanwhile, few-shot prompted large language models (LLMs) exhibit room for improvement. In this paper, we present the discovery that a student model distilled from a few-shot prompted LLM can commonly generalize better than its teacher to unseen examples on such tasks. We find… ▽ More We study semi-supervised sequence generation tasks, where the few labeled examples are too scarce to finetune a model, and meanwhile, few-shot prompted large language models (LLMs) exhibit room for improvement. In this paper, we present the discovery that a student model distilled from a few-shot prompted LLM can commonly generalize better than its teacher to unseen examples on such tasks. We find that the student is able to learn a general pattern from the high-quality pseudolabels produced by the teacher during knowledge distillation (KD), and favorably not a general pattern from the low-quality pseudolables. Leveraging this discovery, we propose a new method, Multistage Collaborative Knowledge Distillation from an LLM (MCKD), for these tasks. MCKD first few-shot prompts an LLM to produce pseudolabels for unlabeled data. Then at each stage of an iterative KD process, a new pair of students is trained on disjoint partitions of the pseudolabeled data, and produces new and improved pseudolabels for their unseen partitions. We conduct extensive experiments on four syntactic and semantic parsing datasets and show the effectiveness of MCKD for low-resource semi-supervised sequence generation. On CRAFT biomedical parsing, for example, 3-stage MCKD with 50 labeled examples outperforms an LLM teacher and vanilla KD by 7.5% and 3.7% parsing F1, respectively, and matches the performance of supervised finetuning with 500 labeled examples. △ Less

Submitted 3 August, 2024; v1 submitted 14 November, 2023; originally announced November 2023.

Comments: ACL 2024

arXiv:2310.13961 [pdf, other]

Ensemble-Instruct: Generating Instruction-Tuning Data with a Heterogeneous Mixture of LMs

Authors: Young-Suk Lee, Md Arafat Sultan, Yousef El-Kurdi, Tahira Naseem Asim Munawar, Radu Florian, Salim Roukos, Ramón Fernandez Astudillo

Abstract: Using in-context learning (ICL) for data generation, techniques such as Self-Instruct (Wang et al., 2023) or the follow-up Alpaca (Taori et al., 2023) can train strong conversational agents with only a small amount of human supervision. One limitation of these approaches is that they resort to very large language models (around 175B parameters) that are also proprietary and non-public. Here we exp… ▽ More Using in-context learning (ICL) for data generation, techniques such as Self-Instruct (Wang et al., 2023) or the follow-up Alpaca (Taori et al., 2023) can train strong conversational agents with only a small amount of human supervision. One limitation of these approaches is that they resort to very large language models (around 175B parameters) that are also proprietary and non-public. Here we explore the application of such techniques to language models that are much smaller (around 10B--40B parameters) and have permissive licenses. We find the Self-Instruct approach to be less effective at these sizes and propose new ICL methods that draw on two main ideas: (a) Categorization and simplification of the ICL templates to make prompt learning easier for the LM, and (b) Ensembling over multiple LM outputs to help select high-quality synthetic examples. Our algorithm leverages the 175 Self-Instruct seed tasks and employs separate pipelines for instructions that require an input and instructions that do not. Empirical investigations with different LMs show that: (1) Our proposed method yields higher-quality instruction tuning data than Self-Instruct, (2) It improves performances of both vanilla and instruction-tuned LMs by significant margins, and (3) Smaller instruction-tuned LMs generate more useful outputs than their larger un-tuned counterparts. Our codebase is available at https://github.com/IBM/ensemble-instruct. △ Less

Submitted 21 October, 2023; originally announced October 2023.

Journal ref: EMNLP 2023

arXiv:2310.13769 [pdf, other]

Compositional Deep Probabilistic Models of DNA Encoded Libraries

Authors: Benson Chen, Mohammad M. Sultan, Theofanis Karaletsos

Abstract: DNA-Encoded Library (DEL) has proven to be a powerful tool that utilizes combinatorially constructed small molecules to facilitate highly-efficient screening assays. These selection experiments, involving multiple stages of washing, elution, and identification of potent binders via unique DNA barcodes, often generate complex data. This complexity can potentially mask the underlying signals, necess… ▽ More DNA-Encoded Library (DEL) has proven to be a powerful tool that utilizes combinatorially constructed small molecules to facilitate highly-efficient screening assays. These selection experiments, involving multiple stages of washing, elution, and identification of potent binders via unique DNA barcodes, often generate complex data. This complexity can potentially mask the underlying signals, necessitating the application of computational tools such as machine learning to uncover valuable insights. We introduce a compositional deep probabilistic model of DEL data, DEL-Compose, which decomposes molecular representations into their mono-synthon, di-synthon, and tri-synthon building blocks and capitalizes on the inherent hierarchical structure of these molecules by modeling latent reactions between embedded synthons. Additionally, we investigate methods to improve the observation models for DEL count data such as integrating covariate factors to more effectively account for data noise. Across two popular public benchmark datasets (CA-IX and HRP), our model demonstrates strong performance compared to count baselines, enriches the correct pharmacophores, and offers valuable insights via its intrinsic interpretable structure, thereby providing a robust tool for the analysis of DEL data. △ Less

Submitted 13 February, 2024; v1 submitted 20 October, 2023; originally announced October 2023.

arXiv:2310.08805 [pdf, other]

doi 10.1007/978-3-031-52448-6_22

Two-Stage Deep Learning Framework for Quality Assessment of Left Atrial Late Gadolinium Enhanced MRI Images

Authors: K M Arefeen Sultan, Benjamin Orkild, Alan Morris, Eugene Kholmovski, Erik Bieging, Eugene Kwan, Ravi Ranjan, Ed DiBella, Shireen Elhabian

Abstract: Accurate assessment of left atrial fibrosis in patients with atrial fibrillation relies on high-quality 3D late gadolinium enhancement (LGE) MRI images. However, obtaining such images is challenging due to patient motion, changing breathing patterns, or sub-optimal choice of pulse sequence parameters. Automated assessment of LGE-MRI image diagnostic quality is clinically significant as it would en… ▽ More Accurate assessment of left atrial fibrosis in patients with atrial fibrillation relies on high-quality 3D late gadolinium enhancement (LGE) MRI images. However, obtaining such images is challenging due to patient motion, changing breathing patterns, or sub-optimal choice of pulse sequence parameters. Automated assessment of LGE-MRI image diagnostic quality is clinically significant as it would enhance diagnostic accuracy, improve efficiency, ensure standardization, and contributes to better patient outcomes by providing reliable and high-quality LGE-MRI scans for fibrosis quantification and treatment planning. To address this, we propose a two-stage deep-learning approach for automated LGE-MRI image diagnostic quality assessment. The method includes a left atrium detector to focus on relevant regions and a deep network to evaluate diagnostic quality. We explore two training strategies, multi-task learning, and pretraining using contrastive learning, to overcome limited annotated data in medical imaging. Contrastive Learning result shows about $4\%$, and $9\%$ improvement in F1-Score and Specificity compared to Multi-Task learning when there's limited data. △ Less

Submitted 12 October, 2023; originally announced October 2023.

Comments: Accepted to STACOM 2023. 11 pages, 3 figures

arXiv:2307.16275 [pdf, other]

Stylized Projected GAN: A Novel Architecture for Fast and Realistic Image Generation

Authors: Md Nurul Muttakin, Malik Shahid Sultan, Robert Hoehndorf, Hernando Ombao

Abstract: Generative Adversarial Networks are used for generating the data using a generator and a discriminator, GANs usually produce high-quality images, but training GANs in an adversarial setting is a difficult task. GANs require high computation power and hyper-parameter regularization for converging. Projected GANs tackle the training difficulty of GANs by using transfer learning to project the genera… ▽ More Generative Adversarial Networks are used for generating the data using a generator and a discriminator, GANs usually produce high-quality images, but training GANs in an adversarial setting is a difficult task. GANs require high computation power and hyper-parameter regularization for converging. Projected GANs tackle the training difficulty of GANs by using transfer learning to project the generated and real samples into a pre-trained feature space. Projected GANs improve the training time and convergence but produce artifacts in the generated images which reduce the quality of the generated samples, we propose an optimized architecture called Stylized Projected GANs which integrates the mapping network of the Style GANs with Skip Layer Excitation of Fast GAN. The integrated modules are incorporated within the generator architecture of the Fast GAN to mitigate the problem of artifacts in the generated images. △ Less

Submitted 30 July, 2023; originally announced July 2023.

Comments: We present a new architecture for generating realistic images by combining mapping network of Style GANs and Projected GANs

arXiv:2306.04307 [pdf, other]

The chiral anomaly and the pion transition form factor: beyond the cutoff

Authors: Hao Dang, Zanbin Xing, M. Atif Sultan, Khépani Raya, Lei Chang

Abstract: In the presence of a momentum cutoff, effective theories seem unable to faithfully reproduce the so called chiral anomaly in the Standard Model. A novel prospect to overcome this related issue is discussed herein via the calculation of the $γ^{*}π^0γ$ transition form factor, $G^{γ^* π^0 γ}(Q^2)$, whose normalization is intimately connected with the chiral anomaly and dynamical chiral symmetry brea… ▽ More In the presence of a momentum cutoff, effective theories seem unable to faithfully reproduce the so called chiral anomaly in the Standard Model. A novel prospect to overcome this related issue is discussed herein via the calculation of the $γ^{*}π^0γ$ transition form factor, $G^{γ^* π^0 γ}(Q^2)$, whose normalization is intimately connected with the chiral anomaly and dynamical chiral symmetry breaking (DCSB). To compute such transition, we employ contact interaction model of Quantum Chromodynamics (QCD) under a modified rainbow ladder truncation, which automatically generates a quark anomalous magnetic moment term, weighted by a strenght parameter $ξ$. This term, whose origin is also connected with DCSB, is interpreted as an additional interaction that mimics the complex dynamics beyond the cutoff. By fixing $ξ$ to produce the value of $G^{γ^* π^0 γ}(0)$ dictated by the chiral anomaly, the computed transition form factor, as well as the interaction radius and neutral pion decay width, turn out to be comparable with QCD-based studies and experimental data. △ Less

Submitted 7 June, 2023; originally announced June 2023.

Comments: 9 pages, 2 figures

arXiv:2305.11744 [pdf, other]

ReFIT: Relevance Feedback from a Reranker during Inference

Authors: Revanth Gangi Reddy, Pradeep Dasigi, Md Arafat Sultan, Arman Cohan, Avirup Sil, Heng Ji, Hannaneh Hajishirzi

Abstract: Retrieve-and-rerank is a prevalent framework in neural information retrieval, wherein a bi-encoder network initially retrieves a pre-defined number of candidates (e.g., K=100), which are then reranked by a more powerful cross-encoder model. While the reranker often yields improved candidate scores compared to the retriever, its scope is confined to only the top K retrieved candidates. As a result,… ▽ More Retrieve-and-rerank is a prevalent framework in neural information retrieval, wherein a bi-encoder network initially retrieves a pre-defined number of candidates (e.g., K=100), which are then reranked by a more powerful cross-encoder model. While the reranker often yields improved candidate scores compared to the retriever, its scope is confined to only the top K retrieved candidates. As a result, the reranker cannot improve retrieval performance in terms of Recall@K. In this work, we propose to leverage the reranker to improve recall by making it provide relevance feedback to the retriever at inference time. Specifically, given a test instance during inference, we distill the reranker's predictions for that instance into the retriever's query representation using a lightweight update mechanism. The aim of the distillation loss is to align the retriever's candidate scores more closely with those produced by the reranker. The algorithm then proceeds by executing a second retrieval step using the updated query vector. We empirically demonstrate that this method, applicable to various retrieve-and-rerank frameworks, substantially enhances retrieval recall across multiple domains, languages, and modalities. △ Less

Submitted 28 May, 2024; v1 submitted 19 May, 2023; originally announced May 2023.

Comments: Preprint

arXiv:2303.08248 [pdf]

doi 10.5121/ijcnc.2023.15101

An Intrusion Detection Mechanism for MANETs Based on Deep Learning Artificial Neural Networks (ANNs)

Authors: Mohamad T Sultan, Hesham El Sayed, Manzoor Ahmed Khan

Abstract: Mobile Ad-hoc Network (MANET) is a distributed, decentralized network of wireless portable nodes connecting directly without any fixed communication base station or centralized administration. Nodes in MANET move continuously in random directions and follow an arbitrary manner, which presents numerous challenges to these networks and make them more susceptible to different security threats. Due to… ▽ More Mobile Ad-hoc Network (MANET) is a distributed, decentralized network of wireless portable nodes connecting directly without any fixed communication base station or centralized administration. Nodes in MANET move continuously in random directions and follow an arbitrary manner, which presents numerous challenges to these networks and make them more susceptible to different security threats. Due to this decentralized nature of their overall architecture, combined with the limitation of hardware resources, those infrastructure-less networks are more susceptible to different security attacks such as black hole attack, network partition, node selfishness, and Denial of Service (DoS) attacks. This work aims to present, investigate, and design an intrusion detection predictive technique for Mobile Ad hoc networks using deep learning artificial neural networks (ANNs). A simulation-based evaluation and a deep ANNs modelling for detecting and isolating a Denial of Service (DoS) attack are presented to improve the overall security level of Mobile ad hoc networks. △ Less

Submitted 14 March, 2023; originally announced March 2023.

arXiv:2303.00807 [pdf, other]

UDAPDR: Unsupervised Domain Adaptation via LLM Prompting and Distillation of Rerankers

Authors: Jon Saad-Falcon, Omar Khattab, Keshav Santhanam, Radu Florian, Martin Franz, Salim Roukos, Avirup Sil, Md Arafat Sultan, Christopher Potts

Abstract: Many information retrieval tasks require large labeled datasets for fine-tuning. However, such datasets are often unavailable, and their utility for real-world applications can diminish quickly due to domain shifts. To address this challenge, we develop and motivate a method for using large language models (LLMs) to generate large numbers of synthetic queries cheaply. The method begins by generati… ▽ More Many information retrieval tasks require large labeled datasets for fine-tuning. However, such datasets are often unavailable, and their utility for real-world applications can diminish quickly due to domain shifts. To address this challenge, we develop and motivate a method for using large language models (LLMs) to generate large numbers of synthetic queries cheaply. The method begins by generating a small number of synthetic queries using an expensive LLM. After that, a much less expensive one is used to create large numbers of synthetic queries, which are used to fine-tune a family of reranker models. These rerankers are then distilled into a single efficient retriever for use in the target domain. We show that this technique boosts zero-shot accuracy in long-tail domains and achieves substantially lower latency than standard reranking methods. △ Less

Submitted 13 October, 2023; v1 submitted 1 March, 2023; originally announced March 2023.

Comments: Long Paper at Empirical Methods in Natural Language Processing (EMNLP) 2023

arXiv:2302.08970 [pdf, other]

doi 10.1109/CAS56377.2022.9934678

Ge coated silicon nanowires as human respiratory sensing device

Authors: E. Fakhri, M. T. Sultan, A. Manolescu, S. Ingvarsson, H. G. Svavarsson

Abstract: We report on Ge coated silicon nanowires (SiNWs) sensors synthesized with metal assisted chemical etching and qualify their functionality as human respiratory sensor. The sensors were made from p-type single-crystalline (100) silicon wafers using a silver catalysed top-down etching, afterwards coated by 50 nm Ge thin layer using a magnetron sputtering. The Ge post-treatment were performed by rapid… ▽ More We report on Ge coated silicon nanowires (SiNWs) sensors synthesized with metal assisted chemical etching and qualify their functionality as human respiratory sensor. The sensors were made from p-type single-crystalline (100) silicon wafers using a silver catalysed top-down etching, afterwards coated by 50 nm Ge thin layer using a magnetron sputtering. The Ge post-treatment were performed by rapid thermal annealing (RTA) at 450 and 700 C degrees. The sensors were characterized by X-ray diffraction diffractogram and scanning electron microscopy. It is demonstrated that the sensors are highly sensitive as human breath detectors, with rapid response and frequency detect-ability. They are also shown to be a good candidate for human respiratory diseases diagnoses. △ Less

Submitted 17 February, 2023; originally announced February 2023.

Comments: International Semiconductor Conference - CAS 2022, Romania

arXiv:2301.12609 [pdf, other]

Knowledge Distillation $\approx$ Label Smoothing: Fact or Fallacy?

Authors: Md Arafat Sultan

Abstract: Originally proposed as a method for knowledge transfer from one model to another, some recent studies have suggested that knowledge distillation (KD) is in fact a form of regularization. Perhaps the strongest argument of all for this new perspective comes from its apparent similarities with label smoothing (LS). Here we re-examine this stated equivalence between the two methods by comparing the pr… ▽ More Originally proposed as a method for knowledge transfer from one model to another, some recent studies have suggested that knowledge distillation (KD) is in fact a form of regularization. Perhaps the strongest argument of all for this new perspective comes from its apparent similarities with label smoothing (LS). Here we re-examine this stated equivalence between the two methods by comparing the predictive confidences of the models they train. Experiments on four text classification tasks involving models of different sizes show that: (a) In most settings, KD and LS drive model confidence in completely opposite directions, and (b) In KD, the student inherits not only its knowledge but also its confidence from the teacher, reinforcing the classical knowledge transfer view. △ Less

Submitted 24 October, 2023; v1 submitted 29 January, 2023; originally announced January 2023.

Comments: EMNLP 2023

arXiv:2301.09715 [pdf, other]

PrimeQA: The Prime Repository for State-of-the-Art Multilingual Question Answering Research and Development

Authors: Avirup Sil, Jaydeep Sen, Bhavani Iyer, Martin Franz, Kshitij Fadnis, Mihaela Bornea, Sara Rosenthal, Scott McCarley, Rong Zhang, Vishwajeet Kumar, Yulong Li, Md Arafat Sultan, Riyaz Bhat, Radu Florian, Salim Roukos

Abstract: The field of Question Answering (QA) has made remarkable progress in recent years, thanks to the advent of large pre-trained language models, newer realistic benchmark datasets with leaderboards, and novel algorithms for key components such as retrievers and readers. In this paper, we introduce PRIMEQA: a one-stop and open-source QA repository with an aim to democratize QA re-search and facilitate… ▽ More The field of Question Answering (QA) has made remarkable progress in recent years, thanks to the advent of large pre-trained language models, newer realistic benchmark datasets with leaderboards, and novel algorithms for key components such as retrievers and readers. In this paper, we introduce PRIMEQA: a one-stop and open-source QA repository with an aim to democratize QA re-search and facilitate easy replication of state-of-the-art (SOTA) QA methods. PRIMEQA supports core QA functionalities like retrieval and reading comprehension as well as auxiliary capabilities such as question generation.It has been designed as an end-to-end toolkit for various use cases: building front-end applications, replicating SOTA methods on pub-lic benchmarks, and expanding pre-existing methods. PRIMEQA is available at : https://github.com/primeqa. △ Less

Submitted 25 January, 2023; v1 submitted 23 January, 2023; originally announced January 2023.

arXiv:2212.01340 [pdf, other]

Moving Beyond Downstream Task Accuracy for Information Retrieval Benchmarking

Authors: Keshav Santhanam, Jon Saad-Falcon, Martin Franz, Omar Khattab, Avirup Sil, Radu Florian, Md Arafat Sultan, Salim Roukos, Matei Zaharia, Christopher Potts

Abstract: Neural information retrieval (IR) systems have progressed rapidly in recent years, in large part due to the release of publicly available benchmarking tasks. Unfortunately, some dimensions of this progress are illusory: the majority of the popular IR benchmarks today focus exclusively on downstream task accuracy and thus conceal the costs incurred by systems that trade away efficiency for quality.… ▽ More Neural information retrieval (IR) systems have progressed rapidly in recent years, in large part due to the release of publicly available benchmarking tasks. Unfortunately, some dimensions of this progress are illusory: the majority of the popular IR benchmarks today focus exclusively on downstream task accuracy and thus conceal the costs incurred by systems that trade away efficiency for quality. Latency, hardware cost, and other efficiency considerations are paramount to the deployment of IR systems in user-facing settings. We propose that IR benchmarks structure their evaluation methodology to include not only metrics of accuracy, but also efficiency considerations such as a query latency and the corresponding cost budget for a reproducible hardware setting. For the popular IR benchmarks MS MARCO and XOR-TyDi, we show how the best choice of IR system varies according to how these efficiency considerations are chosen and weighed. We hope that future benchmarks will adopt these guidelines toward more holistic IR evaluation. △ Less

Submitted 2 December, 2022; originally announced December 2022.

arXiv:2212.00136 [pdf, other]

DEL-Dock: Molecular Docking-Enabled Modeling of DNA-Encoded Libraries

Authors: Kirill Shmilovich, Benson Chen, Theofanis Karaletsos, Mohammad M. Sultan

Abstract: DNA-Encoded Library (DEL) technology has enabled significant advances in hit identification by enabling efficient testing of combinatorially-generated molecular libraries. DEL screens measure protein binding affinity though sequencing reads of molecules tagged with unique DNA-barcodes that survive a series of selection experiments. Computational models have been deployed to learn the latent bindin… ▽ More DNA-Encoded Library (DEL) technology has enabled significant advances in hit identification by enabling efficient testing of combinatorially-generated molecular libraries. DEL screens measure protein binding affinity though sequencing reads of molecules tagged with unique DNA-barcodes that survive a series of selection experiments. Computational models have been deployed to learn the latent binding affinities that are correlated to the sequenced count data; however, this correlation is often obfuscated by various sources of noise introduced in its complicated data-generation process. In order to denoise DEL count data and screen for molecules with good binding affinity, computational models require the correct assumptions in their modeling structure to capture the correct signals underlying the data. Recent advances in DEL models have focused on probabilistic formulations of count data, but existing approaches have thus far been limited to only utilizing 2-D molecule-level representations. We introduce a new paradigm, DEL-Dock, that combines ligand-based descriptors with 3-D spatial information from docked protein-ligand complexes. 3-D spatial information allows our model to learn over the actual binding modality rather than using only structured-based information of the ligand. We show that our model is capable of effectively denoising DEL count data to predict molecule enrichment scores that are better correlated with experimental binding affinity measurements compared to prior works. Moreover, by learning over a collection of docked poses we demonstrate that our model, trained only on DEL data, implicitly learns to perform good docking pose selection without requiring external supervision from expensive-to-source protein crystal structures. △ Less

Submitted 14 December, 2022; v1 submitted 30 November, 2022; originally announced December 2022.

arXiv:2211.16634 [pdf, other]

SPARTAN: Sparse Hierarchical Memory for Parameter-Efficient Transformers

Authors: Ameet Deshpande, Md Arafat Sultan, Anthony Ferritto, Ashwin Kalyan, Karthik Narasimhan, Avirup Sil

Abstract: Fine-tuning pre-trained language models (PLMs) achieves impressive performance on a range of downstream tasks, and their sizes have consequently been getting bigger. Since a different copy of the model is required for each task, this paradigm is infeasible for storage-constrained edge devices like mobile phones. In this paper, we propose SPARTAN, a parameter efficient (PE) and computationally fast… ▽ More Fine-tuning pre-trained language models (PLMs) achieves impressive performance on a range of downstream tasks, and their sizes have consequently been getting bigger. Since a different copy of the model is required for each task, this paradigm is infeasible for storage-constrained edge devices like mobile phones. In this paper, we propose SPARTAN, a parameter efficient (PE) and computationally fast architecture for edge devices that adds hierarchically organized sparse memory after each Transformer layer. SPARTAN freezes the PLM parameters and fine-tunes only its memory, thus significantly reducing storage costs by re-using the PLM backbone for different tasks. SPARTAN contains two levels of memory, with only a sparse subset of parents being chosen in the first level for each input, and children cells corresponding to those parents being used to compute an output representation. This sparsity combined with other architecture optimizations improves SPARTAN's throughput by over 90% during inference on a Raspberry Pi 4 when compared to PE baselines (adapters) while also outperforming the latter by 0.1 points on the GLUE benchmark. Further, it can be trained 34% faster in a few-shot setting, while performing within 0.9 points of adapters. Qualitative analysis shows that different parent cells in SPARTAN specialize in different topics, thus dividing responsibility efficiently. △ Less

Submitted 29 November, 2022; originally announced November 2022.

arXiv:2210.01792 [pdf, other]

Sampling Streaming Data with Parallel Vector Quantization -- PVQ

Authors: Mujahid Sultan

Abstract: Accumulation of corporate data in the cloud has attracted more enterprise applications to the cloud creating data gravity. As a consequence, network traffic has become more cloud centric. This increase in cloud centric traffic poses new challenges in designing learning systems for streaming data due to class imbalance. The number of classes plays a vital role in the accuracy of the classifiers bui… ▽ More Accumulation of corporate data in the cloud has attracted more enterprise applications to the cloud creating data gravity. As a consequence, network traffic has become more cloud centric. This increase in cloud centric traffic poses new challenges in designing learning systems for streaming data due to class imbalance. The number of classes plays a vital role in the accuracy of the classifiers built from the data streams. In this paper, we present a vector quantization-based sampling method, which substantially reduces the class imbalance in data streams. We demonstrate its effectiveness by conducting experiments on network traffic and anomaly dataset with commonly used ML model building methods; Multilayered Perceptron on TensorFlow backend, Support Vector Machines, K-Nearest Neighbour, and Random Forests. We built models using parallel processing, batch processing, and randomly selecting samples. We show that the accuracy of classification models improves when the data streams are pre-processed with our method. We used out of the box hyper-parameters of these classifiers and auto sklearn for hyperparameter optimization. △ Less

Submitted 4 October, 2022; originally announced October 2022.

Comments: 9 pages

MSC Class: F.2.2; I.2.7

arXiv:2209.12991 [pdf]

Quality Control (QC) of FBK Preproduction 3D Si Sensors for ATLAS HL-LHC Upgrades

Authors: D M S Sultan, Md Arif Abdulla Samy, J. X. Ye, M. Boscardin, F. Ficorella, S. Ronchin, G. -F. Dalla Betta

Abstract: The challenging demands of the ATLAS High Luminosity (HL-LHC) Upgrade aim for a complete swap of new generation sensors that should cope with the ultimate radiation hardness. FBK has been one of the prime foundries to develop and fabricate such radiation-hard 3D silicon (Si) sensors. These sensors are chosen to be deployed into the innermost layer of the ATLAS Inner Tracker (ITk). Recently, a pre-… ▽ More The challenging demands of the ATLAS High Luminosity (HL-LHC) Upgrade aim for a complete swap of new generation sensors that should cope with the ultimate radiation hardness. FBK has been one of the prime foundries to develop and fabricate such radiation-hard 3D silicon (Si) sensors. These sensors are chosen to be deployed into the innermost layer of the ATLAS Inner Tracker (ITk). Recently, a pre-production batch of 3D Si sensors of 50x50 um2 pixel geometry, compatible with the full-size ITKPix (RD53B) readout chip, was fabricated. Two wafers holding temporary metal were diced at IZM, Germany, and a systematic QC test campaign was carried out at the University of Trento electronics laboratory. The paper briefly describes the 3D Si sensor design for ATLAS ITk and the required QC characterization setups. It comprises electrical tests (i.e., I-V, C-V, and I-T) of non-irradiated RD53B sensors. In addition, the study of several parametric analyses, i.e., oxide charge density, oxide thickness, inter-pixel resistance, inter-pixel capacitance, etc., are reported with the aid of Process Control Monitor (PCM) structures. △ Less

Submitted 28 September, 2022; v1 submitted 26 September, 2022; originally announced September 2022.

Comments: 8 pages, prepared for iWoRiD 2022 Proceeding

arXiv:2209.03607 [pdf, ps, other]

Solid State Detectors and Tracking for Snowmass

Authors: A. Affolder, A. Apresyan, S. Worm, M. Albrow, D. Ally, D. Ambrose, E. Anderssen, N. Apadula, P. Asenov, W. Armstrong, M. Artuso, A. Barbier, P. Barletta, L. Bauerdick, D. Berry, M. Bomben, M. Boscardin, J. Brau, W. Brooks, M. Breidenbach, J. Buckley, V. Cairo, R. Caputo, L. Carpenter, M. Centis-Vignali , et al. (110 additional authors not shown)

Abstract: Tracking detectors are of vital importance for collider-based high energy physics (HEP) experiments. The primary purpose of tracking detectors is the precise reconstruction of charged particle trajectories and the reconstruction of secondary vertices. The performance requirements from the community posed by the future collider experiments require an evolution of tracking systems, necessitating the… ▽ More Tracking detectors are of vital importance for collider-based high energy physics (HEP) experiments. The primary purpose of tracking detectors is the precise reconstruction of charged particle trajectories and the reconstruction of secondary vertices. The performance requirements from the community posed by the future collider experiments require an evolution of tracking systems, necessitating the development of new techniques, materials and technologies in order to fully exploit their physics potential. In this article we summarize the discussions and conclusions of the 2022 Snowmass Instrumentation Frontier subgroup on Solid State and Tracking Detectors (Snowmass IF03). △ Less

Submitted 19 October, 2022; v1 submitted 8 September, 2022; originally announced September 2022.

Comments: for the Snowmass Instrumentation Frontier Solid State Detector and Tracking community

arXiv:2208.03703 [pdf, other]

Granger Causality using Neural Networks

Authors: Malik Shahid Sultan, Samuel Horvath, Hernando Ombao

Abstract: Dependence between nodes in a network is an important concept that pervades many areas including finance, politics, sociology, genomics and the brain sciences. One way to characterize dependence between components of a multivariate time series data is via Granger Causality (GC). Standard traditional approaches to GC estimation / inference commonly assume linear dynamics, however such simplificatio… ▽ More Dependence between nodes in a network is an important concept that pervades many areas including finance, politics, sociology, genomics and the brain sciences. One way to characterize dependence between components of a multivariate time series data is via Granger Causality (GC). Standard traditional approaches to GC estimation / inference commonly assume linear dynamics, however such simplification does not hold in many real-world applications where signals are inherently non-linear. In such cases, imposing linear models such as vector autoregressive (VAR) models can lead to mis-characterization of true Granger Causal interactions. To overcome this limitation, Tank et al (IEEE Transactions on Pattern Analysis and Machine Learning, 2022) proposed a solution that uses neural networks with sparse regularization penalties. The regularization encourages learnable weights to be sparse, which enables inference on GC. This paper overcomes the limitations of current methods by leveraging advances in machine learning and deep learning which have been demonstrated to learn hidden patterns in the data. We propose novel classes of models that can handle underlying non-linearity in a computationally efficient manner, simultaneously providing GC and lag order selection. Firstly, we present the Learned Kernel VAR (LeKVAR) model that learns kernel parameterized by a shared neural net followed by penalization on learnable weights to discover GC structure. Secondly, we show one can directly decouple lags and individual time series importance via decoupled penalties. This is important as we want to select the lag order during the process of GC estimation. This decoupling acts as a filtering and can be extended to any DL model including Multi-Layer Perceptrons (MLP), Recurrent Neural Networks (RNN), Long Short Term Memory Networks (LSTM), Transformers etc, for simultaneous GC estimation and lag selection. △ Less

Submitted 7 August, 2024; v1 submitted 7 August, 2022; originally announced August 2022.

Comments: To be Submitted to a Journal work Presented at JSM. arXiv admin note: text overlap with arXiv:1802.05842 by other authors

arXiv:2206.12615 [pdf]

Effects of MAC Parameters on the Performance of IEEE 802.11 DCF in NS-3

Authors: Md. Abubakar Siddik, Jakia Akter Nitu, Natasha Islam, Most. Anju Ara Hasi, Jannatun Ferdous, Md. Mizanur Rahman, Md. Nahid Sultan

Abstract: This paper presents the design procedure of the NS-3 script for WLAN that is organized according to the hierarchical manner of TCP/IP model. We configure all layers by using NS-3 model objects and set and modify the values used by objects to investigate the effects of MAC parameters (access mechanism, CWmin, CWmax and retry limit) on the performance metrics viz. packet delivery ratio, packet lost… ▽ More This paper presents the design procedure of the NS-3 script for WLAN that is organized according to the hierarchical manner of TCP/IP model. We configure all layers by using NS-3 model objects and set and modify the values used by objects to investigate the effects of MAC parameters (access mechanism, CWmin, CWmax and retry limit) on the performance metrics viz. packet delivery ratio, packet lost ratio, aggregated throughput, and average delay. The simulation results show that RTS/CTS access mechanism outperforms basic access mechanism in saturated state, whereas the MAC parameters have no significant impact on network performance in non-saturated state. A higher value of CWmin improves the aggregated throughput in expense of average delay. The tradeoff relationships among the performance metrics are also observed in results for the optimal values of MAC parameters. Our design procedure represents a good guideline for new NS-3 users to design and modify script and results greatly benefit the network design and management. △ Less

Submitted 25 June, 2022; originally announced June 2022.

Comments: 20 pages

arXiv:2206.08441 [pdf, other]

GAAMA 2.0: An Integrated System that Answers Boolean and Extractive Questions

Authors: Scott McCarley, Mihaela Bornea, Sara Rosenthal, Anthony Ferritto, Md Arafat Sultan, Avirup Sil, Radu Florian

Abstract: Recent machine reading comprehension datasets include extractive and boolean questions but current approaches do not offer integrated support for answering both question types. We present a multilingual machine reading comprehension system and front-end demo that handles boolean questions by providing both a YES/NO answer and highlighting supporting evidence, and handles extractive questions by hi… ▽ More Recent machine reading comprehension datasets include extractive and boolean questions but current approaches do not offer integrated support for answering both question types. We present a multilingual machine reading comprehension system and front-end demo that handles boolean questions by providing both a YES/NO answer and highlighting supporting evidence, and handles extractive questions by highlighting the answer in the passage. Our system, GAAMA 2.0, is ranked first on the Tydi QA leaderboard at the time of this writing. We contrast two different implementations of our approach. The first includes several independent stacks of transformers allowing easy deployment of each component. The second is a single stack of transformers utilizing adapters to reduce GPU memory footprint in a resource-constrained environment. △ Less

Submitted 21 June, 2022; v1 submitted 16 June, 2022; originally announced June 2022.

arXiv:2206.05006 [pdf, other]

Structure and electrical behavior of silicon nanowires prepared by MACE process

Authors: R. Plugaru, E. Fakhri, C. Romanitan, I. Mihalache, G. Craciun, N. Plugaru, H. Ö. Árnason, M. T. Sultan, G. A. Nemnes, S. Ingvarsson, H. G. Svavarsson, A. Manolescu

Abstract: We report on the structure and electrical characteristics of silicon nanowire arrays prepared by metal assisted chemical etching (MACE) method, investigated by cross-sectional scanning electron microscopy (SEM) and high resolution X-ray diffraction (HR-XRD) methods. SEM micrographs show arrays of merged parallel nanowires, with lengths of 700 nm and 1000 nm, resulted after 1.5 min and 5 min etchin… ▽ More We report on the structure and electrical characteristics of silicon nanowire arrays prepared by metal assisted chemical etching (MACE) method, investigated by cross-sectional scanning electron microscopy (SEM) and high resolution X-ray diffraction (HR-XRD) methods. SEM micrographs show arrays of merged parallel nanowires, with lengths of 700 nm and 1000 nm, resulted after 1.5 min and 5 min etching time, respectively. X-ray reciprocal space maps (RSMs) around Si (004) reciprocal lattice point indicate the presence of 0D structural defects rather than of extended defects. The photoluminescence spectra exhibit emission bands at 1.70 eV and 1.61 eV, with intensity significantly higher in the case of longer wires and associated with the more defected surface. The transient photoluminescence spectroscopy reveals average lifetime of 60 $μ$s and 111 $μ$s for the two SiNW arrays, which correlate with a larger density of defects states in the latest case. The I-V characteristics of the nanowires, show a memristive behavior with the applied voltage sweep rate in the range 5V/s - 0.32V/s. We attribute this behavior to trap states which control the carrier concentration, and model this effect using an equivalent circuit. Photogeneration processes under excitation wavelengths in visible domain, 405 nm - 650 nm, and under light intensity in the range 20 - 100 mW/cm$^2$ provided a further insight into the trap states. △ Less

Submitted 10 June, 2022; originally announced June 2022.

Comments: 23 pages 18 figures

Journal ref: Surfaces and Interfaces Volume 33, October 2022, 102167

arXiv:2206.04991 [pdf, other]

Piezoresistance characterization of silicon nanowires in uniaxial and isostatic pressure variation

Authors: Elham Fakhri, Rodica Plugaru, Muhammad Taha Sultan, Thorsteinn Hanning Kristinsson, Hákon Örn Árnason, Neculai Plugaru, Andrei Manolescu, Snorri Ingvarsson, Halldor Gudfinnur Svavarsson

Abstract: Silicon nanowires (SiNWs) are known to exhibit large piezoresistance (PZR) effect, making it suitable for various sensing applications. Here, we report the results of a PZR investigation on randomly distributed and interconnected vertical silicon nanowire arrays as a pressure sensor. The samples were produced from p-type (100) Si wafers using a silver catalysed top-down etching process. The piezor… ▽ More Silicon nanowires (SiNWs) are known to exhibit large piezoresistance (PZR) effect, making it suitable for various sensing applications. Here, we report the results of a PZR investigation on randomly distributed and interconnected vertical silicon nanowire arrays as a pressure sensor. The samples were produced from p-type (100) Si wafers using a silver catalysed top-down etching process. The piezoresistance response of these SiNW arrays was analysed by measuring their I-V characteristics under applied uniaxial as well as isostatic pressure. The interconnected SiNWs exhibit increased mechanical stability in comparison with separated or periodic nanowires. The repeatability of the fabrication process and statistical distribution of measurements were also tested on several samples from different batches. A sensing resolution down to roughly \SI{1}{\milli\bar} pressure was observed with uniaxial force application, and more than two orders of magnitude resistance variation was determined for isostatic pressure below atmospheric pressure. △ Less

Submitted 23 June, 2022; v1 submitted 10 June, 2022; originally announced June 2022.

Comments: 7 pages 10 figures

Journal ref: Sensors 2022, 22(17), 6340

arXiv:2205.07257 [pdf, other]

Not to Overfit or Underfit the Source Domains? An Empirical Study of Domain Generalization in Question Answering

Authors: Md Arafat Sultan, Avirup Sil, Radu Florian

Abstract: Machine learning models are prone to overfitting their training (source) domains, which is commonly believed to be the reason why they falter in novel target domains. Here we examine the contrasting view that multi-source domain generalization (DG) is first and foremost a problem of mitigating source domain underfitting: models not adequately learning the signal already present in their multi-doma… ▽ More Machine learning models are prone to overfitting their training (source) domains, which is commonly believed to be the reason why they falter in novel target domains. Here we examine the contrasting view that multi-source domain generalization (DG) is first and foremost a problem of mitigating source domain underfitting: models not adequately learning the signal already present in their multi-domain training data. Experiments on a reading comprehension DG benchmark show that as a model learns its source domains better -- using familiar methods such as knowledge distillation (KD) from a bigger model -- its zero-shot out-of-domain utility improves at an even faster pace. Improved source domain learning also demonstrates superior out-of-domain generalization over three popular existing DG approaches that aim to limit overfitting. Our implementation of KD-based domain generalization is available via PrimeQA at: https://ibm.biz/domain-generalization-with-kd. △ Less

Submitted 24 October, 2022; v1 submitted 15 May, 2022; originally announced May 2022.

Comments: Accepted at EMNLP 2022

arXiv:2204.11373 [pdf, other]

doi 10.1145/3477495.3531878

Entity-Conditioned Question Generation for Robust Attention Distribution in Neural Information Retrieval

Authors: Revanth Gangi Reddy, Md Arafat Sultan, Martin Franz, Avirup Sil, Heng Ji

Abstract: We show that supervised neural information retrieval (IR) models are prone to learning sparse attention patterns over passage tokens, which can result in key phrases including named entities receiving low attention weights, eventually leading to model under-performance. Using a novel targeted synthetic data generation method that identifies poorly attended entities and conditions the generation ep… ▽ More We show that supervised neural information retrieval (IR) models are prone to learning sparse attention patterns over passage tokens, which can result in key phrases including named entities receiving low attention weights, eventually leading to model under-performance. Using a novel targeted synthetic data generation method that identifies poorly attended entities and conditions the generation episodes on those, we teach neural IR to attend more uniformly and robustly to all entities in a given passage. On two public IR benchmarks, we empirically show that the proposed method helps improve both the model's attention patterns and retrieval performance, including in zero-shot settings. △ Less

Submitted 24 April, 2022; originally announced April 2022.

Comments: Published at SIGIR 2022

arXiv:2204.09248 [pdf, ps, other]

Synthetic Target Domain Supervision for Open Retrieval QA

Authors: Revanth Gangi Reddy, Bhavani Iyer, Md Arafat Sultan, Rong Zhang, Avirup Sil, Vittorio Castelli, Radu Florian, Salim Roukos

Abstract: Neural passage retrieval is a new and promising approach in open retrieval question answering. In this work, we stress-test the Dense Passage Retriever (DPR) -- a state-of-the-art (SOTA) open domain neural retrieval model -- on closed and specialized target domains such as COVID-19, and find that it lags behind standard BM25 in this important real-world setting. To make DPR more robust under domai… ▽ More Neural passage retrieval is a new and promising approach in open retrieval question answering. In this work, we stress-test the Dense Passage Retriever (DPR) -- a state-of-the-art (SOTA) open domain neural retrieval model -- on closed and specialized target domains such as COVID-19, and find that it lags behind standard BM25 in this important real-world setting. To make DPR more robust under domain shift, we explore its fine-tuning with synthetic training examples, which we generate from unlabeled target domain text using a text-to-text generator. In our experiments, this noisy but fully automated target domain supervision gives DPR a sizable advantage over BM25 in out-of-domain settings, making it a more viable model in practice. Finally, an ensemble of BM25 and our improved DPR model yields the best results, further pushing the SOTA for open retrieval QA on multiple out-of-domain test sets. △ Less

Submitted 20 April, 2022; originally announced April 2022.

Comments: Published at SIGIR 2021

arXiv:2202.11828 [pdf, other]

Novel Sensors for Particle Tracking: a Contribution to the Snowmass Community Planning Exercise of 2021

Authors: M. R. Hoeferkamp, S. Seidel, S. Kim, J. Metcalfe, A. Sumant, H. Kagan, W. Trischuk, M. Boscardin, G. -F. Dalla Betta, D. M. S. Sultan, N. T. Fourches, C. Renard, A. Barbier, T. Mahajan, A. Minns, V. Tokranov, M. Yakimov, S. Oktyabrsky, C. Gingu, P. Murat, M. T. Hedges

Abstract: Five contemporary technologies are discussed in the context of their potential roles in particle tracking for future high energy physics applications. These include sensors of the 3D configuration, in both diamond and silicon, submicron-dimension pixels, thin film detectors, and scintillating quantum dots in gallium arsenide. Drivers of the technologies include radiation hardness, excellent positi… ▽ More Five contemporary technologies are discussed in the context of their potential roles in particle tracking for future high energy physics applications. These include sensors of the 3D configuration, in both diamond and silicon, submicron-dimension pixels, thin film detectors, and scintillating quantum dots in gallium arsenide. Drivers of the technologies include radiation hardness, excellent position, vertex, and timing resolution, simplified integration, and optimized power, cost, and material. △ Less

Submitted 23 February, 2022; originally announced February 2022.

Comments: 15 pages, 6 figures

arXiv:2112.08185 [pdf, other]

Learning Cross-Lingual IR from an English Retriever

Authors: Yulong Li, Martin Franz, Md Arafat Sultan, Bhavani Iyer, Young-Suk Lee, Avirup Sil

Abstract: We present DR.DECR (Dense Retrieval with Distillation-Enhanced Cross-Lingual Representation), a new cross-lingual information retrieval (CLIR) system trained using multi-stage knowledge distillation (KD). The teacher of DR.DECR relies on a highly effective but computationally expensive two-stage inference process consisting of query translation and monolingual IR, while the student, DR.DECR, execu… ▽ More We present DR.DECR (Dense Retrieval with Distillation-Enhanced Cross-Lingual Representation), a new cross-lingual information retrieval (CLIR) system trained using multi-stage knowledge distillation (KD). The teacher of DR.DECR relies on a highly effective but computationally expensive two-stage inference process consisting of query translation and monolingual IR, while the student, DR.DECR, executes a single CLIR step. We teach DR.DECR powerful multilingual representations as well as CLIR by optimizing two corresponding KD objectives. Learning useful representations of non-English text from an English-only retriever is accomplished through a cross-lingual token alignment algorithm that relies on the representation capabilities of the underlying multilingual encoders. In both in-domain and zero-shot out-of-domain evaluation, DR.DECR demonstrates far superior accuracy over direct fine-tuning with labeled CLIR data. It is also the best single-model retriever on the XOR-TyDi benchmark at the time of this writing. △ Less

Submitted 31 July, 2022; v1 submitted 15 December, 2021; originally announced December 2021.

Comments: Presented at NAACL 2022 main conference Code can be found at: https://github.com/primeqa/primeqa

arXiv:2104.07800 [pdf, other]

Towards Robust Neural Retrieval Models with Synthetic Pre-Training

Authors: Revanth Gangi Reddy, Vikas Yadav, Md Arafat Sultan, Martin Franz, Vittorio Castelli, Heng Ji, Avirup Sil

Abstract: Recent work has shown that commonly available machine reading comprehension (MRC) datasets can be used to train high-performance neural information retrieval (IR) systems. However, the evaluation of neural IR has so far been limited to standard supervised learning settings, where they have outperformed traditional term matching baselines. We conduct in-domain and out-of-domain evaluations of neura… ▽ More Recent work has shown that commonly available machine reading comprehension (MRC) datasets can be used to train high-performance neural information retrieval (IR) systems. However, the evaluation of neural IR has so far been limited to standard supervised learning settings, where they have outperformed traditional term matching baselines. We conduct in-domain and out-of-domain evaluations of neural IR, and seek to improve its robustness across different scenarios, including zero-shot settings. We show that synthetic training examples generated using a sequence-to-sequence generator can be effective towards this goal: in our experiments, pre-training with synthetic examples improves retrieval performance in both in-domain and out-of-domain evaluation on five different test sets. △ Less

Submitted 15 April, 2021; originally announced April 2021.

arXiv:2102.01143 [pdf, other]

doi 10.1109/ICTAI50040.2020.00178

toon2real: Translating Cartoon Images to Realistic Images

Authors: K. M. Arefeen Sultan, Mohammad Imrul Jubair, MD. Nahidul Islam, Sayed Hossain Khan

Abstract: In terms of Image-to-image translation, Generative Adversarial Networks (GANs) has achieved great success even when it is used in the unsupervised dataset. In this work, we aim to translate cartoon images to photo-realistic images using GAN. We apply several state-of-the-art models to perform this task; however, they fail to perform good quality translations. We observe that the shallow difference… ▽ More In terms of Image-to-image translation, Generative Adversarial Networks (GANs) has achieved great success even when it is used in the unsupervised dataset. In this work, we aim to translate cartoon images to photo-realistic images using GAN. We apply several state-of-the-art models to perform this task; however, they fail to perform good quality translations. We observe that the shallow difference between these two domains causes this issue. Based on this idea, we propose a method based on CycleGAN model for image translation from cartoon domain to photo-realistic domain. To make our model efficient, we implemented Spectral Normalization which added stability in our model. We demonstrate our experimental results and show that our proposed model has achieved the lowest Frechet Inception Distance score and better results compared to another state-of-the-art technique, UNIT. △ Less

Submitted 1 February, 2021; originally announced February 2021.

Comments: Accepted as a short paper at ICTAI 2020

arXiv:2012.01414 [pdf, other]

End-to-End QA on COVID-19: Domain Adaptation with Synthetic Training

Authors: Revanth Gangi Reddy, Bhavani Iyer, Md Arafat Sultan, Rong Zhang, Avi Sil, Vittorio Castelli, Radu Florian, Salim Roukos

Abstract: End-to-end question answering (QA) requires both information retrieval (IR) over a large document collection and machine reading comprehension (MRC) on the retrieved passages. Recent work has successfully trained neural IR systems using only supervised question answering (QA) examples from open-domain datasets. However, despite impressive performance on Wikipedia, neural IR lags behind traditional… ▽ More End-to-end question answering (QA) requires both information retrieval (IR) over a large document collection and machine reading comprehension (MRC) on the retrieved passages. Recent work has successfully trained neural IR systems using only supervised question answering (QA) examples from open-domain datasets. However, despite impressive performance on Wikipedia, neural IR lags behind traditional term matching approaches such as BM25 in more specific and specialized target domains such as COVID-19. Furthermore, given little or no labeled data, effective adaptation of QA systems can also be challenging in such target domains. In this work, we explore the application of synthetically generated QA examples to improve performance on closed-domain retrieval and MRC. We combine our neural IR and MRC systems and show significant improvements in end-to-end QA on the CORD-19 collection over a state-of-the-art open-domain QA baseline. △ Less

Submitted 2 December, 2020; originally announced December 2020.

Comments: Preprint

arXiv:2011.03435 [pdf, other]

Answer Span Correction in Machine Reading Comprehension

Authors: Revanth Gangi Reddy, Md Arafat Sultan, Efsun Sarioglu Kayi, Rong Zhang, Vittorio Castelli, Avirup Sil

Abstract: Answer validation in machine reading comprehension (MRC) consists of verifying an extracted answer against an input context and question pair. Previous work has looked at re-assessing the "answerability" of the question given the extracted answer. Here we address a different problem: the tendency of existing MRC systems to produce partially correct answers when presented with answerable questions.… ▽ More Answer validation in machine reading comprehension (MRC) consists of verifying an extracted answer against an input context and question pair. Previous work has looked at re-assessing the "answerability" of the question given the extracted answer. Here we address a different problem: the tendency of existing MRC systems to produce partially correct answers when presented with answerable questions. We explore the nature of such errors and propose a post-processing correction method that yields statistically significant performance improvements over state-of-the-art MRC systems in both monolingual and multilingual evaluation. △ Less

Submitted 6 November, 2020; originally announced November 2020.

Comments: Accepted in Findings of EMNLP 2020

Showing 1–50 of 95 results for author: Sultan, M