subscribe to arXiv mailings

Enhancing Long Context Performance in LLMs Through Inner Loop Query Mechanism

Authors: Yimin Tang, Yurong Xu, Ning Yan, Masood Mortazavi

Abstract: Transformers have a quadratic scaling of computational complexity with input size, which limits the input context window size of large language models (LLMs) in both training and inference. Meanwhile, retrieval-augmented generation (RAG) besed models can better handle longer contexts by using a retrieval system to filter out unnecessary information. However, most RAG methods only perform retrieval… ▽ More Transformers have a quadratic scaling of computational complexity with input size, which limits the input context window size of large language models (LLMs) in both training and inference. Meanwhile, retrieval-augmented generation (RAG) besed models can better handle longer contexts by using a retrieval system to filter out unnecessary information. However, most RAG methods only perform retrieval based on the initial query, which may not work well with complex questions that require deeper reasoning. We introduce a novel approach, Inner Loop Memory Augmented Tree Retrieval (ILM-TR), involving inner-loop queries, based not only on the query question itself but also on intermediate findings. At inference time, our model retrieves information from the RAG system, integrating data from lengthy documents at various levels of abstraction. Based on the information retrieved, the LLM generates texts stored in an area named Short-Term Memory (STM) which is then used to formulate the next query. This retrieval process is repeated until the text in STM converged. Our experiments demonstrate that retrieval with STM offers improvements over traditional retrieval-augmented LLMs, particularly in long context tests such as Multi-Needle In A Haystack (M-NIAH) and BABILong. △ Less

Submitted 11 October, 2024; originally announced October 2024.

arXiv:2409.19786 [pdf, other]

4D Metric-Semantic Mapping for Persistent Orchard Monitoring: Method and Dataset

Authors: Jiuzhou Lei, Ankit Prabhu, Xu Liu, Fernando Cladera, Mehrad Mortazavi, Reza Ehsani, Pratik Chaudhari, Vijay Kumar

Abstract: Automated persistent and fine-grained monitoring of orchards at the individual tree or fruit level helps maximize crop yield and optimize resources such as water, fertilizers, and pesticides while preventing agricultural waste. Towards this goal, we present a 4D spatio-temporal metric-semantic mapping method that fuses data from multiple sensors, including LiDAR, RGB camera, and IMU, to monitor th… ▽ More Automated persistent and fine-grained monitoring of orchards at the individual tree or fruit level helps maximize crop yield and optimize resources such as water, fertilizers, and pesticides while preventing agricultural waste. Towards this goal, we present a 4D spatio-temporal metric-semantic mapping method that fuses data from multiple sensors, including LiDAR, RGB camera, and IMU, to monitor the fruits in an orchard across their growth season. A LiDAR-RGB fusion module is designed for 3D fruit tracking and localization, which first segments fruits using a deep neural network and then tracks them using the Hungarian Assignment algorithm. Additionally, the 4D data association module aligns data from different growth stages into a common reference frame and tracks fruits spatio-temporally, providing information such as fruit counts, sizes, and positions. We demonstrate our method's accuracy in 4D metric-semantic mapping using data collected from a real orchard under natural, uncontrolled conditions with seasonal variations. We achieve a 3.1 percent error in total fruit count estimation for over 1790 fruits across 60 apple trees, along with accurate size estimation results with a mean error of 1.1 cm. The datasets, consisting of LiDAR, RGB, and IMU data of five fruit species captured across their growth seasons, along with corresponding ground truth data, will be made publicly available at: https://4d-metric-semantic-mapping.org/ △ Less

Submitted 29 September, 2024; originally announced September 2024.

arXiv:2408.02858 [pdf]

Medium-entropy Engineering of magnetism in layered antiferromagnet CuxNi2(1-x)CrxP2S6

Authors: Dinesh Upreti, Rabindra Basnet, M. M. Sharma, Santosh Karki Chhetri, Gokul Acharya, Md Rafique Un Nabi, Josh Sakon, Mansour Mortazavi, Jin Hu

Abstract: Engineering magnetism in layered magnets could result in novel phenomena related to two-dimensional (2D) magnetism, which can be useful for fundamental research and practical applications. Extensive doping efforts such as substitution and intercalation have been adopted to tune antiferromagnetic (AFM) properties in M2P2X6 compounds. The substitutional doping in this material family has mainly focu… ▽ More Engineering magnetism in layered magnets could result in novel phenomena related to two-dimensional (2D) magnetism, which can be useful for fundamental research and practical applications. Extensive doping efforts such as substitution and intercalation have been adopted to tune antiferromagnetic (AFM) properties in M2P2X6 compounds. The substitutional doping in this material family has mainly focused on bimetallic substitution. Recently, the metal substitution can also be extended to more than two metal elements, leading to medium and high-entropy alloys (MEAs and HEAs), which are fairly underexplored in layered magnetic systems including M2P2X6. In this work, we explored the magnetic properties of the previously unreported Cu- and Cr-substituted Ni2P2S6 i.e., CuxNi2(1-x)CrxP2S6. Our study reveals a relatively systematic evolution of AFM phases with substitution than that observed in traditional bimetallic substitution in M2P2X6. Furthermore, the Cu and Cr substitutions in Ni2P2S6 are found to enhance the ferromagnetic (FM) correlation, which is also accompanied by a possible weak FM phase at low temperatures for the intermediate compositions from 0.32 to 0.80. Our work provides a strategy to establish ferromagnetism in AFM M2P2X6 that can also be used for property tuning in other layered magnets. △ Less

Submitted 5 August, 2024; originally announced August 2024.

arXiv:2407.12662 [pdf]

Tuning Magnetism in Ising-type van der Waals Magnet FePS3 by Lithium Intercalation

Authors: Dinesh Upreti, Rabindra Basnet, M. M. Sharma, Santosh Karki Chhetri, Gokul Acharya, Md Rafique Un Nabi, Josh Sakon, Bo Da, Mansour Mortazavi, Jin Hu

Abstract: Recently, layered materials transition metal thiophosphate MPX3 (M = transition metals, X = S or Se) have gained significant attention because of their rich magnetic, optical, and electronic properties. Specifically, the diverse magnetic structures and the robustness of magnetism in the two-dimensional limit have made them prominent candidates to study two-dimensional magnetism. Numerous efforts s… ▽ More Recently, layered materials transition metal thiophosphate MPX3 (M = transition metals, X = S or Se) have gained significant attention because of their rich magnetic, optical, and electronic properties. Specifically, the diverse magnetic structures and the robustness of magnetism in the two-dimensional limit have made them prominent candidates to study two-dimensional magnetism. Numerous efforts such as substitutions and interlayer intercalations have been made to tune the properties of these materials, which has greatly deepened the understanding of the underlying mechanisms that govern the properties. In this work, we focus on modifying the magnetism of Ising-type antiferromagnet FePS3 using electrochemical lithium intercalation. Our work unveils the effectiveness of electrochemical intercalation as a controllable tool to modulating magnetism, including tuning magnetic ordering temperature and inducing low temperature spin-glass state, offering an approach for implementing this material into applications. △ Less

Submitted 17 July, 2024; originally announced July 2024.

arXiv:2406.08859 [pdf, other]

Fusion of regional and sparse attention in Vision Transformers

Authors: Nabil Ibtehaz, Ning Yan, Masood Mortazavi, Daisuke Kihara

Abstract: Modern vision transformers leverage visually inspired local interaction between pixels through attention computed within window or grid regions, in contrast to the global attention employed in the original ViT. Regional attention restricts pixel interactions within specific regions, while sparse attention disperses them across sparse grids. These differing approaches pose a challenge between maint… ▽ More Modern vision transformers leverage visually inspired local interaction between pixels through attention computed within window or grid regions, in contrast to the global attention employed in the original ViT. Regional attention restricts pixel interactions within specific regions, while sparse attention disperses them across sparse grids. These differing approaches pose a challenge between maintaining hierarchical relationships vs. capturing a global context. In this study, drawing inspiration from atrous convolution, we propose Atrous Attention, a blend of regional and sparse attention that dynamically integrates both local and global information while preserving hierarchical structures. Based on this, we introduce a versatile, hybrid vision transformer backbone called ACC-ViT, tailored for standard vision tasks. Our compact model achieves approximately 84% accuracy on ImageNet-1K with fewer than 28.5 million parameters, outperforming the state-of-the-art MaxViT by 0.42% while requiring 8.4% fewer parameters. △ Less

Submitted 13 June, 2024; originally announced June 2024.

Comments: Accepted as a Workshop Paper at T4V@CVPR2024. arXiv admin note: substantial text overlap with arXiv:2403.04200

arXiv:2405.19359 [pdf, other]

Modally Reduced Representation Learning of Multi-Lead ECG Signals through Simultaneous Alignment and Reconstruction

Authors: Nabil Ibtehaz, Masood Mortazavi

Abstract: Electrocardiogram (ECG) signals, profiling the electrical activities of the heart, are used for a plethora of diagnostic applications. However, ECG systems require multiple leads or channels of signals to capture the complete view of the cardiac system, which limits their application in smartwatches and wearables. In this work, we propose a modally reduced representation learning method for ECG si… ▽ More Electrocardiogram (ECG) signals, profiling the electrical activities of the heart, are used for a plethora of diagnostic applications. However, ECG systems require multiple leads or channels of signals to capture the complete view of the cardiac system, which limits their application in smartwatches and wearables. In this work, we propose a modally reduced representation learning method for ECG signals that is capable of generating channel-agnostic, unified representations for ECG signals. Through joint optimization of reconstruction and alignment, we ensure that the embeddings of the different channels contain an amalgamation of the overall information across channels while also retaining their specific information. On an independent test dataset, we generated highly correlated channel embeddings from different ECG channels, leading to a moderate approximation of the 12-lead signals from a single-channel embedding. Our generated embeddings can work as competent features for ECG signals for downstream tasks. △ Less

Submitted 24 May, 2024; originally announced May 2024.

Comments: Accepted as a Workshop Paper at TS4H@ICLR2024

Journal ref: ICLR 2024 Workshop on Learning from Time Series For Health

arXiv:2404.15241 [pdf]

doi 10.1103/PhysRevB.109.184429

Evolution of Magnetism in Magnetic Topological Semimetal NdSb$_x$Te$_{2-x+δ}$

Authors: Santosh Karki Chhetri, Rabindra Basnet, Jian Wang, Krishna Pandey, Gokul Acharya, Md Rafique Un Nabi, Dinesh Upreti, Josh Sakon, Mansour Mortazavi, Jin Hu

Abstract: Magnetic topological semimetals LnSbTe (Ln = Lanthanide) have attracted intensive attention because of the presence of interplay between magnetism, topological, and electron correlations depending on the choices of magnetic Ln elements. Recently, varying Sb-Te composition has been found to effectively control the electronic and magnetic states in LnSbxTe$_{2-x}$. With this motivation, we report th… ▽ More Magnetic topological semimetals LnSbTe (Ln = Lanthanide) have attracted intensive attention because of the presence of interplay between magnetism, topological, and electron correlations depending on the choices of magnetic Ln elements. Recently, varying Sb-Te composition has been found to effectively control the electronic and magnetic states in LnSbxTe$_{2-x}$. With this motivation, we report the evolution of magnetic properties with Sb-Te substitution in NdSb$_x$Te$_{2-x+δ}$. Our work reveals the interesting non-monotonic change in magnetic ordering temperature with varying composition stoichiometry. In addition, reducing the Sb content x drives the reorientation of moments from in-plane (ab-plane) to out-of-plane (c-axis) direction that results in the distinct magnetic structures for two end compounds NdTe$_2$ ($x = 0$) and NdSbTe ($x = 1$). Furthermore, the moment orientation in NdSb$_x$Te$_{2-x+δ}$ is also found to be strongly tunable upon application of weak magnetic field, leading to rich magnetic phases depending on the composition stoichiometry, temperature, and magnetic field. Such strong tuning of magnetism in this material establishes it as a promising platform for investigating tunable topological states and correlated topological physics. △ Less

Submitted 23 April, 2024; originally announced April 2024.

Comments: 35 pages, 5 figures

Report number: BK14576

Journal ref: PRB 2024

arXiv:2404.02091 [pdf]

doi 10.1103/PhysRevB.109.184405

Field-induced spin polarization in lightly Cr-substituted layered antiferromagnet NiPS3

Authors: Rabindra Basnet, Dinesh Upreti, Taksh Patel, Santosh Karki Chhetri, Gokul Acharya, Md Rafique Un Nabi, Manish Mani Sharma, Josh Sakon, Mansour Mortazavi, Jin Hu

Abstract: Tuning magnetic properties in layered magnets is an important route to realize novel phenomenon related to two-dimensional (2D) magnetism. Recently, tuning antiferromagnetic (AFM) properties through substitution and intercalation techniques have been widely studied in MPX3 compounds. Interesting phenomena, such as diverse AFM structures and even the signatures of ferrimagnetism, have been reported… ▽ More Tuning magnetic properties in layered magnets is an important route to realize novel phenomenon related to two-dimensional (2D) magnetism. Recently, tuning antiferromagnetic (AFM) properties through substitution and intercalation techniques have been widely studied in MPX3 compounds. Interesting phenomena, such as diverse AFM structures and even the signatures of ferrimagnetism, have been reported. However, long-range ferromagnetic (FM) ordering has remained elusive. In this work, we explored the magnetic properties of the previously unreported Cr-substituted NiPS3. We found that Cr substitution is extremely efficient in controlling spin orientation in NiPS3. Our study reveals a field-induced spin polarization in lightly (9%) Cr-substituted NiPS3, which is likely attributed to the attenuation of AFM interactions and magnetic anisotropy due to Cr doping. Our work provides a possible strategy to achieve FM phase in AFM MPX3, which could be useful for investigating 2D magnetism as well as potential device applications. △ Less

Submitted 2 April, 2024; originally announced April 2024.

Journal ref: Physical Review B 109, 184405 (2024)

arXiv:2403.04200 [pdf, other]

ACC-ViT : Atrous Convolution's Comeback in Vision Transformers

Authors: Nabil Ibtehaz, Ning Yan, Masood Mortazavi, Daisuke Kihara

Abstract: Transformers have elevated to the state-of-the-art vision architectures through innovations in attention mechanism inspired from visual perception. At present two classes of attentions prevail in vision transformers, regional and sparse attention. The former bounds the pixel interactions within a region; the latter spreads them across sparse grids. The opposing natures of them have resulted in a d… ▽ More Transformers have elevated to the state-of-the-art vision architectures through innovations in attention mechanism inspired from visual perception. At present two classes of attentions prevail in vision transformers, regional and sparse attention. The former bounds the pixel interactions within a region; the latter spreads them across sparse grids. The opposing natures of them have resulted in a dilemma between either preserving hierarchical relation or attaining a global context. In this work, taking inspiration from atrous convolution, we introduce Atrous Attention, a fusion of regional and sparse attention, which can adaptively consolidate both local and global information, while maintaining hierarchical relations. As a further tribute to atrous convolution, we redesign the ubiquitous inverted residual convolution blocks with atrous convolution. Finally, we propose a generalized, hybrid vision transformer backbone, named ACC-ViT, following conventional practices for standard vision tasks. Our tiny version model achieves $\sim 84 \%$ accuracy on ImageNet-1K, with less than $28.5$ million parameters, which is $0.42\%$ improvement over state-of-the-art MaxViT while having $8.4\%$ less parameters. In addition, we have investigated the efficacy of ACC-ViT backbone under different evaluation settings, such as finetuning, linear probing, and zero-shot learning on tasks involving medical image analysis, object detection, and language-image contrastive learning. ACC-ViT is therefore a strong vision backbone, which is also competitive in mobile-scale versions, ideal for niche applications with small datasets. △ Less

Submitted 6 March, 2024; originally announced March 2024.

arXiv:2310.15318 [pdf, other]

HetGPT: Harnessing the Power of Prompt Tuning in Pre-Trained Heterogeneous Graph Neural Networks

Authors: Yihong Ma, Ning Yan, Jiayu Li, Masood Mortazavi, Nitesh V. Chawla

Abstract: Graphs have emerged as a natural choice to represent and analyze the intricate patterns and rich information of the Web, enabling applications such as online page classification and social recommendation. The prevailing "pre-train, fine-tune" paradigm has been widely adopted in graph machine learning tasks, particularly in scenarios with limited labeled nodes. However, this approach often exhibits… ▽ More Graphs have emerged as a natural choice to represent and analyze the intricate patterns and rich information of the Web, enabling applications such as online page classification and social recommendation. The prevailing "pre-train, fine-tune" paradigm has been widely adopted in graph machine learning tasks, particularly in scenarios with limited labeled nodes. However, this approach often exhibits a misalignment between the training objectives of pretext tasks and those of downstream tasks. This gap can result in the "negative transfer" problem, wherein the knowledge gained from pre-training adversely affects performance in the downstream tasks. The surge in prompt-based learning within Natural Language Processing (NLP) suggests the potential of adapting a "pre-train, prompt" paradigm to graphs as an alternative. However, existing graph prompting techniques are tailored to homogeneous graphs, neglecting the inherent heterogeneity of Web graphs. To bridge this gap, we propose HetGPT, a general post-training prompting framework to improve the predictive performance of pre-trained heterogeneous graph neural networks (HGNNs). The key is the design of a novel prompting function that integrates a virtual class prompt and a heterogeneous feature prompt, with the aim to reformulate downstream tasks to mirror pretext tasks. Moreover, HetGPT introduces a multi-view neighborhood aggregation mechanism, capturing the complex neighborhood structure in heterogeneous graphs. Extensive experiments on three benchmark datasets demonstrate HetGPT's capability to enhance the performance of state-of-the-art HGNNs on semi-supervised node classification. △ Less

Submitted 23 January, 2024; v1 submitted 23 October, 2023; originally announced October 2023.

Comments: Accepted to WWW 2024 as research paper

arXiv:2306.16541 [pdf, other]

Envisioning a Next Generation Extended Reality Conferencing System with Efficient Photorealistic Human Rendering

Authors: Chuanyue Shen, Letian Zhang, Zhangsihao Yang, Masood Mortazavi, Xiyun Song, Liang Peng, Heather Yu

Abstract: Meeting online is becoming the new normal. Creating an immersive experience for online meetings is a necessity towards more diverse and seamless environments. Efficient photorealistic rendering of human 3D dynamics is the core of immersive meetings. Current popular applications achieve real-time conferencing but fall short in delivering photorealistic human dynamics, either due to limited 2D space… ▽ More Meeting online is becoming the new normal. Creating an immersive experience for online meetings is a necessity towards more diverse and seamless environments. Efficient photorealistic rendering of human 3D dynamics is the core of immersive meetings. Current popular applications achieve real-time conferencing but fall short in delivering photorealistic human dynamics, either due to limited 2D space or the use of avatars that lack realistic interactions between participants. Recent advances in neural rendering, such as the Neural Radiance Field (NeRF), offer the potential for greater realism in metaverse meetings. However, the slow rendering speed of NeRF poses challenges for real-time conferencing. We envision a pipeline for a future extended reality metaverse conferencing system that leverages monocular video acquisition and free-viewpoint synthesis to enhance data and hardware efficiency. Towards an immersive conferencing experience, we explore an accelerated NeRF-based free-viewpoint synthesis algorithm for rendering photorealistic human dynamics more efficiently. We show that our algorithm achieves comparable rendering quality while performing training and inference 44.5% and 213% faster than state-of-the-art methods, respectively. Our exploration provides a design basis for constructing metaverse conferencing systems that can handle complex application scenarios, including dynamic scene relighting with customized themes and multi-user conferencing that harmonizes real-world people into an extended world. △ Less

Submitted 28 June, 2023; originally announced June 2023.

Comments: Accepted to CVPR 2023 ECV Workshop

arXiv:2304.13818 [pdf]

Selecting Sustainable Optimal Stock by Using Multi-Criteria Fuzzy Decision-Making Approaches Based on the Development of the Gordon Model: A case study of the Toronto Stock Exchange

Authors: Mohsen Mortazavi

Abstract: Choosing the right stock portfolio with the highest efficiencies has always concerned accurate and legal investors. Investors have always been concerned about the accuracy and legitimacy of choosing the right stock portfolio with high efficiency. Therefore, this paper aims to determine the criteria for selecting an optimal stock portfolio with a high-efficiency ratio in the Toronto Stock Exchange… ▽ More Choosing the right stock portfolio with the highest efficiencies has always concerned accurate and legal investors. Investors have always been concerned about the accuracy and legitimacy of choosing the right stock portfolio with high efficiency. Therefore, this paper aims to determine the criteria for selecting an optimal stock portfolio with a high-efficiency ratio in the Toronto Stock Exchange using the integrated evaluation and decision-making trial laboratory (DEMATEL) model and Multi-Criteria Fuzzy decision-making approaches regarding the development of the Gordon model. In the current study, results obtained using combined multi-criteria fuzzy decision-making approaches, the practical factors, the relative weight of dividends, discount rate, and dividend growth rate have been comprehensively illustrated using combined multi-criteria fuzzy decision-making approaches. A group of 10 experts with at least a ten-year of experience in the stock exchange field was formed to review the different and new aspects of the subject (portfolio selection) to decide the interaction between the group members and the exchange of attitudes and ideas regarding the criteria. The sequence of influence and effectiveness of the main criteria with DEMATEL has shown that the profitability criterion interacts most with other criteria. The criteria of managing methods and operations (MPO), market, risk, and growth criteria are ranked next in terms of interaction with other criteria. This study concludes that regarding the model's appropriate and reliable validity in choosing the optimal stock portfolio, it is recommended that portfolio managers in companies, investment funds, and capital owners use the model to select stocks in the Toronto Stock Exchange optimally. △ Less

Submitted 26 April, 2023; originally announced April 2023.

arXiv:2211.02052 [pdf, ps, other]

Theta-Resonance: A Single-Step Reinforcement Learning Method for Design Space Exploration

Authors: Masood S. Mortazavi, Tiancheng Qin, Ning Yan

Abstract: Given an environment (e.g., a simulator) for evaluating samples in a specified design space and a set of weighted evaluation metrics -- one can use Theta-Resonance, a single-step Markov Decision Process (MDP), to train an intelligent agent producing progressively more optimal samples. In Theta-Resonance, a neural network consumes a constant input tensor and produces a policy as a set of conditiona… ▽ More Given an environment (e.g., a simulator) for evaluating samples in a specified design space and a set of weighted evaluation metrics -- one can use Theta-Resonance, a single-step Markov Decision Process (MDP), to train an intelligent agent producing progressively more optimal samples. In Theta-Resonance, a neural network consumes a constant input tensor and produces a policy as a set of conditional probability density functions (PDFs) for sampling each design dimension. We specialize existing policy gradient algorithms in deep reinforcement learning (D-RL) in order to use evaluation feedback (in terms of cost, penalty or reward) to update our policy network with robust algorithmic stability and minimal design evaluations. We study multiple neural architectures (for our policy network) within the context of a simple SoC design space and propose a method of constructing synthetic space exploration problems to compare and improve design space exploration (DSE) algorithms. Although we only present categorical design spaces, we also outline how to use Theta-Resonance in order to explore continuous and mixed continuous-discrete design spaces. △ Less

Submitted 17 November, 2022; v1 submitted 3 November, 2022; originally announced November 2022.

ACM Class: A.1; C.3; C.4; G.3; H.1; I.2; I.6; J.6

arXiv:2103.16083 [pdf, other]

Fully Convolutional Scene Graph Generation

Authors: Hengyue Liu, Ning Yan, Masood S. Mortazavi, Bir Bhanu

Abstract: This paper presents a fully convolutional scene graph generation (FCSGG) model that detects objects and relations simultaneously. Most of the scene graph generation frameworks use a pre-trained two-stage object detector, like Faster R-CNN, and build scene graphs using bounding box features. Such pipeline usually has a large number of parameters and low inference speed. Unlike these approaches, FCS… ▽ More This paper presents a fully convolutional scene graph generation (FCSGG) model that detects objects and relations simultaneously. Most of the scene graph generation frameworks use a pre-trained two-stage object detector, like Faster R-CNN, and build scene graphs using bounding box features. Such pipeline usually has a large number of parameters and low inference speed. Unlike these approaches, FCSGG is a conceptually elegant and efficient bottom-up approach that encodes objects as bounding box center points, and relationships as 2D vector fields which are named as Relation Affinity Fields (RAFs). RAFs encode both semantic and spatial features, and explicitly represent the relationship between a pair of objects by the integral on a sub-region that points from subject to object. FCSGG only utilizes visual features and still generates strong results for scene graph generation. Comprehensive experiments on the Visual Genome dataset demonstrate the efficacy, efficiency, and generalizability of the proposed method. FCSGG achieves highly competitive results on recall and zero-shot recall with significantly reduced inference time. △ Less

Submitted 30 March, 2021; originally announced March 2021.

Comments: CVPR 2021 Oral

arXiv:2010.15288 [pdf, other]

doi 10.21437/Interspeech.2020

Speech-Image Semantic Alignment Does Not Depend on Any Prior Classification Tasks

Authors: Masood S. Mortazavi

Abstract: Semantically-aligned $(speech, image)$ datasets can be used to explore "visually-grounded speech". In a majority of existing investigations, features of an image signal are extracted using neural networks "pre-trained" on other tasks (e.g., classification on ImageNet). In still others, pre-trained networks are used to extract audio features prior to semantic embedding. Without "transfer learning"… ▽ More Semantically-aligned $(speech, image)$ datasets can be used to explore "visually-grounded speech". In a majority of existing investigations, features of an image signal are extracted using neural networks "pre-trained" on other tasks (e.g., classification on ImageNet). In still others, pre-trained networks are used to extract audio features prior to semantic embedding. Without "transfer learning" through pre-trained initialization or pre-trained feature extraction, previous results have tended to show low rates of recall in $speech \rightarrow image$ and $image \rightarrow speech$ queries. Choosing appropriate neural architectures for encoders in the speech and image branches and using large datasets, one can obtain competitive recall rates without any reliance on any pre-trained initialization or feature extraction: $(speech,image)$ semantic alignment and $speech \rightarrow image$ and $image \rightarrow speech$ retrieval are canonical tasks worthy of independent investigation of their own and allow one to explore other questions---e.g., the size of the audio embedder can be reduced significantly with little loss of recall rates in $speech \rightarrow image$ and $image \rightarrow speech$ queries. △ Less

Submitted 28 October, 2020; originally announced October 2020.

MSC Class: 68T01; 68T05; 68T07; 68T10; 62P15 ACM Class: I.2; I.2.0; I.2.6; I.2.7; I.2.11; I.5; I.5.1; I.5.2; I.5.4; I.4.10; H.5.1; H.5.2; H.3.3

Journal ref: Proceedings of INTERSPEECH 2020

arXiv:2003.03877 [pdf, other]

FoCL: Feature-Oriented Continual Learning for Generative Models

Authors: Qicheng Lao, Mehrzad Mortazavi, Marzieh Tahaei, Francis Dutil, Thomas Fevens, Mohammad Havaei

Abstract: In this paper, we propose a general framework in continual learning for generative models: Feature-oriented Continual Learning (FoCL). Unlike previous works that aim to solve the catastrophic forgetting problem by introducing regularization in the parameter space or image space, FoCL imposes regularization in the feature space. We show in our experiments that FoCL has faster adaptation to distribu… ▽ More In this paper, we propose a general framework in continual learning for generative models: Feature-oriented Continual Learning (FoCL). Unlike previous works that aim to solve the catastrophic forgetting problem by introducing regularization in the parameter space or image space, FoCL imposes regularization in the feature space. We show in our experiments that FoCL has faster adaptation to distributional changes in sequentially arriving tasks, and achieves the state-of-the-art performance for generative models in task incremental learning. We discuss choices of combined regularization spaces towards different use case scenarios for boosted performance, e.g., tasks that have high variability in the background. Finally, we introduce a forgetfulness measure that fairly evaluates the degree to which a model suffers from forgetting. Interestingly, the analysis of our proposed forgetfulness score also implies that FoCL tends to have a mitigated forgetting for future tasks. △ Less

Submitted 8 March, 2020; originally announced March 2020.

arXiv:2003.02314 [pdf, other]

The Impact of Hole Geometry on Relative Robustness of In-Painting Networks: An Empirical Study

Authors: Masood S. Mortazavi, Ning Yan

Abstract: In-painting networks use existing pixels to generate appropriate pixels to fill "holes" placed on parts of an image. A 2-D in-painting network's input usually consists of (1) a three-channel 2-D image, and (2) an additional channel for the "holes" to be in-painted in that image. In this paper, we study the robustness of a given in-painting neural network against variations in hole geometry distrib… ▽ More In-painting networks use existing pixels to generate appropriate pixels to fill "holes" placed on parts of an image. A 2-D in-painting network's input usually consists of (1) a three-channel 2-D image, and (2) an additional channel for the "holes" to be in-painted in that image. In this paper, we study the robustness of a given in-painting neural network against variations in hole geometry distributions. We observe that the robustness of an in-painting network is dependent on the probability distribution function (PDF) of the hole geometry presented to it during its training even if the underlying image dataset used (in training and testing) does not alter. We develop an experimental methodology for testing and evaluating relative robustness of in-painting networks against four different kinds of hole geometry PDFs. We examine a number of hypothesis regarding (1) the natural bias of in-painting networks to the hole distribution used for their training, (2) the underlying dataset's ability to differentiate relative robustness as hole distributions vary in a train-test (cross-comparison) grid, and (3) the impact of the directional distribution of edges in the holes and in the image dataset. We present results for L1, PSNR and SSIM quality metrics and develop a specific measure of relative in-painting robustness to be used in cross-comparison grids based on these quality metrics. (One can incorporate other quality metrics in this relative measure.) The empirical work reported here is an initial step in a broader and deeper investigation of "filling the blank" neural networks' sensitivity, robustness and regularization with respect to hole "geometry" PDFs, and it suggests further research in this domain. △ Less

Submitted 4 March, 2020; originally announced March 2020.

arXiv:1906.02848 [pdf]

Si-based GeSn photodetectors towards mid-infrared imaging applications

Authors: Huong Tran, Thach Pham, Joe Margetis, Yiyin Zhou, Wei Dou, Perry C. Grant, Joshua M. Grant, Sattar Alkabi, Greg Sun, Richard A. Soref, John Tolle, Yong-Hang Zhang, Wei Du, Baohua Li, Mansour Mortazavi, Shui-Qing Yu

Abstract: This paper reports a comprehensive study of Si-based GeSn mid-infrared photodetectors, which includes: 1) the demonstration of a set of photoconductors with Sn compositions ranging from 10.5% to 22.3%, showing the cut-off wavelength has been extended to 3.65 um. The measured maximum D* of 1.1x10^10 cmHz^(1/2)W(-1) is comparable to that of commercial extended-InGaAs detectors; 2) the development of… ▽ More This paper reports a comprehensive study of Si-based GeSn mid-infrared photodetectors, which includes: 1) the demonstration of a set of photoconductors with Sn compositions ranging from 10.5% to 22.3%, showing the cut-off wavelength has been extended to 3.65 um. The measured maximum D* of 1.1x10^10 cmHz^(1/2)W(-1) is comparable to that of commercial extended-InGaAs detectors; 2) the development of surface passivation technique on photodiode based on in-depth analysis of dark current mechanism, effectively reducing the dark current. Moreover, mid-infrared images were obtained using GeSn photodetectors, showing the comparable image quality with that acquired by using commercial PbSe detectors. △ Less

Submitted 6 June, 2019; originally announced June 2019.

Comments: 25 pages, 8 figures

arXiv:1810.02523 [pdf]

UHV-CVD Growth of High Quality GeSn Using SnCl4: From Growth Optimization to Prototype Devices

Authors: P. C. Grant, W. Dou, B. Alharthi, J. M. Grant, H. Tran, G. Abernathy, A. Mosleh, W. Du, 5 B. Li, M. Mortazavi, H. A. Naseem, S. Q. Yu

Abstract: The persistent interest of the epitaxy of group IV alloy GeSn is mainly driven by the demand of efficient light source that could be monolithically integrated on Si for mid-infrared Si photonics. For chemical vapor deposition of GeSn, the exploration of parameter window is difficult from the beginning due to its non-equilibrium growth condition. In this work, we demonstrated the effective pathway… ▽ More The persistent interest of the epitaxy of group IV alloy GeSn is mainly driven by the demand of efficient light source that could be monolithically integrated on Si for mid-infrared Si photonics. For chemical vapor deposition of GeSn, the exploration of parameter window is difficult from the beginning due to its non-equilibrium growth condition. In this work, we demonstrated the effective pathway to achieve the high quality GeSn with high Sn incorporation. The GeSn films were grown on Ge-buffered Si via ultra-high vacuum chemical vapor deposition using GeH4 and SnCl4 as precursor gasses. The influence of both SnCl4 flow fraction and growth temperature on the Sn incorporation and material quality were investigated. The key to achieve effective Sn incorporation and high material quality is to explore the proper parameter match between SnCl4 supply and growth temperature, which is also called optimized growth regime. The Sn precipitation is significantly suppressed in optimized growth regime, leading to more Sn incorporation into Ge and enhanced material quality. The prototype GeSn photoconductors were fabricated with typical samples, showing the promising devices applications towards mid-infrared optoelectronics. △ Less

Submitted 5 October, 2018; originally announced October 2018.

arXiv:1708.05927 [pdf]

Si-based GeSn lasers with wavelength coverage of 2 to 3 μm and operating temperatures up to 180 K

Authors: Joe Margetis, Sattar Al-Kabi, Wei Du, Wei Dou, Yiyin Zhou, Thach Pham, Perry Grant, Seyed Ghetmiri, Aboozar Mosleh, Baohua Li, Jifeng Liu, Greg Sun, Richard Soref, John Tolle, Mansour Mortazavi, Shui-Qing Yu

Abstract: A Si-based monolithic laser is highly desirable for full integration of Si-photonics. Lasing from direct bandgap group-IV GeSn alloy has opened a completely new venue from the traditional III-V integration approach. We demonstrated optically pumped GeSn lasers on Si with broad wavelength coverage from 2 to 3 μm. The GeSn alloys were grown using newly developed approaches with an industry standard… ▽ More A Si-based monolithic laser is highly desirable for full integration of Si-photonics. Lasing from direct bandgap group-IV GeSn alloy has opened a completely new venue from the traditional III-V integration approach. We demonstrated optically pumped GeSn lasers on Si with broad wavelength coverage from 2 to 3 μm. The GeSn alloys were grown using newly developed approaches with an industry standard chemical vapor deposition reactor and low-cost commercially available precursors. The achieved maximum Sn composition of 17.5% exceeded the generally acknowledged Sn incorporation limits for using similar deposition chemistries. The highest lasing temperature was measured as 180 K with the active layer thickness as thin as 260 nm. The unprecedented lasing performance is mainly due to the unique growth approaches, which offer high-quality epitaxial materials. The results reported in this work show a major advance towards Si-based mid-infrared laser sources for integrated photonics. △ Less

Submitted 19 August, 2017; originally announced August 2017.

Comments: 34 pages, 12 figures

MSC Class: 00A79

arXiv:1605.06341 [pdf, other]

Quantum Tunneling of Thermal Protons Through Pristine Graphene

Authors: Igor Poltavsky, Limin Zheng, Majid Mortazavi, Alexandre Tkatchenko

Abstract: Atomically thin two-dimensional materials such as graphene and hexagonal boron nitride have recently been found to exhibit appreciable permeability to thermal protons, making these materials emerging candidates for separation technologies [S. Hu et al., Nature 516, 227 (2014); M. Lozada-Hidalgo et al., Science 351, 68 (2016).]. These remarkable findings remain unexplained by density-functional ele… ▽ More Atomically thin two-dimensional materials such as graphene and hexagonal boron nitride have recently been found to exhibit appreciable permeability to thermal protons, making these materials emerging candidates for separation technologies [S. Hu et al., Nature 516, 227 (2014); M. Lozada-Hidalgo et al., Science 351, 68 (2016).]. These remarkable findings remain unexplained by density-functional electronic structure calculations, which instead yield barriers that exceed by 1.0 eV those found in experiments. Here we resolve this puzzle by demonstrating that the proton transfer through pristine graphene is driven by quantum nuclear effects, which substantially reduce the transport barrier by up to 1.4 eV compared to the results of classical molecular dynamics (MD). Our Feynman-Kac path-integral MD simulations unambiguously reveal the quantum tunneling mechanism of strongly interacting hydrogen ions through two-dimensional materials. In addition, we predict a strong isotope effect of 1 eV on the transport barrier for graphene in vacuum and at room temperature. These findings not only shed light on the graphene permeability to protons and deuterons, but also offer new insights for controlling the underlying quantum ion transport mechanisms in nanostructured separation membranes. △ Less

Submitted 12 April, 2017; v1 submitted 20 May, 2016; originally announced May 2016.

arXiv:1601.05448 [pdf, other]

Parallel variable-density particle-laden turbulence simulation

Authors: Hadi Pouransari, Milad Mortazavi, Ali Mani

Abstract: We have developed a fully parallel C++/MPI based simulation code for variable-density particle-laden turbulent flows. The fluid is represented through a uniform Eulerian staggered grid, while particles are modeled using a Lagrangian point-particle framework. Spatial discretization is second-order accurate, and time integration has a fourth-order accuracy. Two-way coupling of the particles with the… ▽ More We have developed a fully parallel C++/MPI based simulation code for variable-density particle-laden turbulent flows. The fluid is represented through a uniform Eulerian staggered grid, while particles are modeled using a Lagrangian point-particle framework. Spatial discretization is second-order accurate, and time integration has a fourth-order accuracy. Two-way coupling of the particles with the background flow is considered in both momentum and energy equations. The code is fully modular and abstracted, and easily can be extended or modified. We have considered two different boundary conditions. We have also developed a novel parallel linear solver for the variable density Poisson equation that arises in the calculation. △ Less

Submitted 20 January, 2016; originally announced January 2016.

Comments: In 2015, Annual Research Briefs, Center for Turbulence Research, Stanford University

arXiv:1406.7474 [pdf]

On the Optimization of non-Dense Metabolic Networks in non-Equilibrium State Utilizing 2D-Lattice Simulation

Authors: Erfan Khaji, Mahsa Mortazavi

Abstract: Modeling and optimization of metabolic networks has been one of the hottest topics in computational systems biology within recent years. However, the complexity and uncertainty of these networks in addition to the lack of necessary data has resulted in more efforts to design and usage of more capable models which fit to realistic conditions. In this paper, instead of optimizing networks in equilib… ▽ More Modeling and optimization of metabolic networks has been one of the hottest topics in computational systems biology within recent years. However, the complexity and uncertainty of these networks in addition to the lack of necessary data has resulted in more efforts to design and usage of more capable models which fit to realistic conditions. In this paper, instead of optimizing networks in equilibrium condition, the optimization of dynamic networks in non-equilibrium states including low number of molecules has been studied using a 2-D lattice simulation. A prototyped network has been simulated with such approach, and has been optimized using Swarm Particle Algorithm the results of which are presented in addition to the relevant plots. △ Less

Submitted 29 June, 2014; originally announced June 2014.

Comments: 10 Figures, 9 Pages

arXiv:1204.2673 [pdf, ps, other]

doi 10.1088/1751-8113/45/24/244011

A new class of $f$-deformed charge coherent states and their nonclassical properties

Authors: M Mortazavi, M K Tavassoly

Abstract: Two-mode charge (pair) coherent states has been introduced previously by using $<η|$ representation. In the present paper we reobtain these states by a rather different method. Then, using the nonlinear coherent states approach and based on a simple manner by which the representation of two-mode charge coherent states is introduced, we generalize the bosonic creation and annihilation operators to… ▽ More Two-mode charge (pair) coherent states has been introduced previously by using $<η|$ representation. In the present paper we reobtain these states by a rather different method. Then, using the nonlinear coherent states approach and based on a simple manner by which the representation of two-mode charge coherent states is introduced, we generalize the bosonic creation and annihilation operators to the $f$-deformed ladder operators and construct a new class of $f$-deformed charge coherent states. Unlike the (linear) pair coherent states, our presented structure has the potentiality to generate a large class of pair coherent states with various nonclassicality signs and physical properties which are of interest. Along this purpose, we use a few well-known nonlinearity functions associated with particular quantum systems as some physical appearances of our presented formalism. After introducing the explicit form of the above correlated states in two-mode Fock-space, several nonclassicality features of the corresponding states (as well as the two-mode linear charge coherent states) are numerically investigated by calculating quadrature squeezing, Mandel parameter, second-order correlation function, second-order correlation function between the two modes and Cauchy-Schwartz inequality. Also, the oscillatory behaviour of the photon count and the quasi-probability (Husimi) function of the associated states will be discussed. △ Less

Submitted 12 April, 2012; originally announced April 2012.

Comments: 22 pages, Accepted for J. Phys A: Math. Theor. Special Issue on Coherent States

Showing 1–24 of 24 results for author: Mortazavi, M