Recent advances in microscopy have enabled the rapid generation of terabytes of image data in cell biology and biomedical research. Vision-language models (VLMs) offer a promising solution for large-scale biological image analysis, enhancing researchers' efficiency, identifying new image biomarkers, and accelerating hypothesis generation and scientific discovery. However, there is a lack of standardized, diverse, and large-scale vision-language benchmarks to evaluate VLMs' perception and cognition capabilities in biological image understanding. To address this gap, we introduce \mu-Bench, an expert-curated benchmark encompassing 22 biomedical tasks across various scientific disciplines (biology, pathology), microscopy modalities (electron, fluorescence, light), scales (subcellular, cellular, tissue), and organisms in both normal and abnormal states. We evaluate state-of-the-art biomedical, pathology, and general VLMs on \mu-Bench and find that: i) current models struggle on all categories, even for basic tasks such as distinguishing microscopy modalities; ii) current specialist models fine-tuned on biomedical data often perform worse than generalist models; iii) fine-tuning in specific microscopy domains can cause catastrophic forgetting, eroding prior biomedical knowledge encoded in their base model. iv) weight interpolation between fine-tuned and pre-trained models offers one solution to forgetting and improves general performance across biomedical tasks. We release \mu-Bench under a permissive license to accelerate the research and development of microscopy foundation models.
Transformer-based Language Models have become ubiquitous in Natural Language Processing (NLP) due to their impressive performance on various tasks. However, expensive training as well as inference remains a significant impediment to their widespread applicability. While enforcing sparsity at various levels of the model architecture has found promise in addressing scaling and efficiency issues, there remains a disconnect between how sparsity affects network topology. Inspired by brain neuronal networks, we explore sparsity approaches through the lens of network topology. Specifically, we exploit mechanisms seen in biological networks, such as preferential attachment and redundant synapse pruning, and show that principled, model-agnostic sparsity approaches are performant and efficient across diverse NLP tasks, spanning both classification (such as natural language inference) and generation (summarization, machine translation), despite our sole objective not being optimizing performance. NeuroPrune is competitive with (or sometimes superior to) baselines on performance and can be up to $10$x faster in terms of training time for a given level of sparsity, simultaneously exhibiting measurable improvements in inference time in many cases.
Payel Das, Subhajit Chaudhury, Elliot Nelson, Igor Melnyk, Sarath Swaminathan, Sihui Dai, Aurélie Lozano, Georgios Kollias, Vijil Chenthamarakshan, Jiří, Navrátil, Soham Dan, Pin-Yu Chen Efficient and accurate updating of knowledge stored in Large Language Models (LLMs) is one of the most pressing research challenges today. This paper presents Larimar - a novel, brain-inspired architecture for enhancing LLMs with a distributed episodic memory. Larimar's memory allows for dynamic, one-shot updates of knowledge without the need for computationally expensive re-training or fine-tuning. Experimental results on multiple fact editing benchmarks demonstrate that Larimar attains accuracy comparable to most competitive baselines, even in the challenging sequential editing setup, but also excels in speed - yielding speed-ups of 8-10x depending on the base LLM - as well as flexibility due to the proposed architecture being simple, LLM-agnostic, and hence general. We further provide mechanisms for selective fact forgetting, information leakage prevention, and input context length generalization with Larimar and show their effectiveness. Our code is available at https://github.com/IBM/larimar
Hyewon Jeong, Sarah Jabbour, Yuzhe Yang, Rahul Thapta, Hussein Mozannar, William Jongwon Han, Nikita Mehandru, Michael Wornow, Vladislav Lialin, Xin Liu, Alejandro Lozano, Jiacheng Zhu, Rafal Dariusz Kocielnik, Keith Harrigian, Haoran Zhang, Edward Lee, Milos Vukadinovic, Aparna Balagopalan, Vincent Jeanselme, Katherine Matton, et al (23) Mar 06 2024
cs.LG arXiv:2403.01628v2
The third ML4H symposium was held in person on December 10, 2023, in New Orleans, Louisiana, USA. The symposium included research roundtable sessions to foster discussions between participants and senior researchers on timely and relevant topics for the \acML4H community. Encouraged by the successful virtual roundtables in the previous year, we organized eleven in-person roundtables and four virtual roundtables at ML4H 2022. The organization of the research roundtables at the conference involved 17 Senior Chairs and 19 Junior Chairs across 11 tables. Each roundtable session included invited senior chairs (with substantial experience in the field), junior chairs (responsible for facilitating the discussion), and attendees from diverse backgrounds with interest in the session's topic. Herein we detail the organization process and compile takeaways from these roundtable discussions, including recent advances, applications, and open challenges for each topic. We conclude with a summary and lessons learned across all roundtables. This document serves as a comprehensive review paper, summarizing the recent advancements in machine learning for healthcare as contributed by foremost researchers in the field.
Protein function annotation is an important yet challenging task in biology. Recent deep learning advancements show significant potential for accurate function prediction by learning from protein sequences and structures. Nevertheless, these predictor-based methods often overlook the modeling of protein similarity, an idea commonly employed in traditional approaches using sequence or structure retrieval tools. To fill this gap, we first study the effect of inter-protein similarity modeling by benchmarking retriever-based methods against predictors on protein function annotation tasks. Our results show that retrievers can match or outperform predictors without large-scale pre-training. Building on these insights, we introduce a novel variational pseudo-likelihood framework, ProtIR, designed to improve function predictors by incorporating inter-protein similarity modeling. This framework iteratively refines knowledge between a function predictor and retriever, thereby combining the strengths of both predictors and retrievers. ProtIR showcases around 10% improvement over vanilla predictor-based methods. Besides, it achieves performance on par with protein language model-based methods, yet without the need for massive pre-training, highlighting the efficacy of our framework. Code will be released upon acceptance.
Matching patients to clinical trials is a key unsolved challenge in bringing new drugs to market. Today, identifying patients who meet a trial's eligibility criteria is highly manual, taking up to 1 hour per patient. Automated screening is challenging, however, as it requires understanding unstructured clinical text. Large language models (LLMs) offer a promising solution. In this work, we explore their application to trial matching. First, we design an LLM-based system which, given a patient's medical history as unstructured clinical text, evaluates whether that patient meets a set of inclusion criteria (also specified as free text). Our zero-shot system achieves state-of-the-art scores on the n2c2 2018 cohort selection benchmark. Second, we improve the data and cost efficiency of our method by identifying a prompting strategy which matches patients an order of magnitude faster and more cheaply than the status quo, and develop a two-stage retrieval pipeline that reduces the number of tokens processed by up to a third while retaining high performance. Third, we evaluate the interpretability of our system by having clinicians evaluate the natural language justifications generated by the LLM for each eligibility decision, and show that it can output coherent explanations for 97% of its correct decisions and 75% of its incorrect ones. Our results establish the feasibility of using LLMs to accelerate clinical trial operations.
Protein language models are a powerful tool for learning protein representations through pre-training on vast protein sequence datasets. However, traditional protein language models lack explicit structural supervision, despite its relevance to protein function. To address this issue, we introduce the integration of remote homology detection to distill structural information into protein language models without requiring explicit protein structures as input. We evaluate the impact of this structure-informed training on downstream protein function prediction tasks. Experimental results reveal consistent improvements in function annotation accuracy for EC number and GO term prediction. Performance on mutant datasets, however, varies based on the relationship between targeted properties and protein structures. This underscores the importance of considering this relationship when applying structure-aware training to protein function prediction tasks. Code and model weights are available at https://github.com/DeepGraphLearning/esm-s.
This article proposes a generative neural network architecture for spatially consistent air-to-ground channel modeling. The approach considers the trajectories of uncrewed aerial vehicles along typical urban paths, capturing spatial dependencies within received signal strength (RSS) sequences from multiple cellular base stations (gNBs). Through the incorporation of conditioning data, the model accurately discriminates between gNBs and drives the correlation matrix distance between real and generated sequences to minimal values. This enables evaluating performance and mobility management metrics with spatially (and by extension temporally) consistent RSS values, rather than independent snapshots. For some tasks underpinned by these metrics, say handovers, consistency is essential.
We address the problem of learning Granger causality from asynchronous, interdependent, multi-type event sequences. In particular, we are interested in discovering instance-level causal structures in an unsupervised manner. Instance-level causality identifies causal relationships among individual events, providing more fine-grained information for decision-making. Existing work in the literature either requires strong assumptions, such as linearity in the intensity function, or heuristically defined model parameters that do not necessarily meet the requirements of Granger causality. We propose Instance-wise Self-Attentive Hawkes Processes (ISAHP), a novel deep learning framework that can directly infer the Granger causality at the event instance level. ISAHP is the first neural point process model that meets the requirements of Granger causality. It leverages the self-attention mechanism of the transformer to align with the principles of Granger causality. We empirically demonstrate that ISAHP is capable of discovering complex instance-level causal structures that cannot be handled by classical models. We also show that ISAHP achieves state-of-the-art performance in proxy tasks involving type-level causal discovery and instance-level event type prediction.
Object detection is integral to a bevy of real-world applications, from robotics to medical image analysis. To be used reliably in such applications, models must be capable of handling unexpected - or novel - objects. The open world object detection (OWD) paradigm addresses this challenge by enabling models to detect unknown objects and learn discovered ones incrementally. However, OWD method development is hindered due to the stringent benchmark and task definitions. These definitions effectively prohibit foundation models. Here, we aim to relax these definitions and investigate the utilization of pre-trained foundation models in OWD. First, we show that existing benchmarks are insufficient in evaluating methods that utilize foundation models, as even naive integration methods nearly saturate these benchmarks. This result motivated us to curate a new and challenging benchmark for these models. Therefore, we introduce a new benchmark that includes five real-world application-driven datasets, including challenging domains such as aerial and surgical images, and establish baselines. We exploit the inherent connection between classes in application-driven datasets and introduce a novel method, Foundation Object detection Model for the Open world, or FOMO, which identifies unknown objects based on their shared attributes with the base known objects. FOMO has ~3x unknown object mAP compared to baselines on our benchmark. However, our results indicate a significant place for improvement - suggesting a great research opportunity in further scaling object detection methods to real-world domains. Our code and benchmark are available at https://orrzohar.github.io/projects/fomo/.
The improvements in received signal power brought about by a reflective intelligent surface (RIS) might be overstated if background propagation mechanisms such as reflections, scattering, and diffraction are ignored. This paper addresses this issue for non-line-of-sight indoor settings, contrasting the energy conveyed by an RIS with the energy already reaching the receiver through environmental reflections. And, to prevent artifacts, such naturally occurring reflections are not modeled via approximate methods, but rather through a rigorous physics-based formulation. It is found that the environment contributes a level of energy commensurate with that of an ideal RIS of considerable size; to have substantial impact, an actual RIS would have to generously exceed this size.
The quickly-expanding nature of published medical literature makes it challenging for clinicians and researchers to keep up with and summarize recent, relevant findings in a timely manner. While several closed-source summarization tools based on large language models (LLMs) now exist, rigorous and systematic evaluations of their outputs are lacking. Furthermore, there is a paucity of high-quality datasets and appropriate benchmark tasks with which to evaluate these tools. We address these issues with four contributions: we release Clinfo.ai, an open-source WebApp that answers clinical questions based on dynamically retrieved scientific literature; we specify an information retrieval and abstractive summarization task to evaluate the performance of such retrieval-augmented LLM systems; we release a dataset of 200 questions and corresponding answers derived from published systematic reviews, which we name PubMed Retrieval and Synthesis (PubMedRS-200); and report benchmark results for Clinfo.ai and other publicly available OpenQA systems on PubMedRS-200.
With increasing frequencies, bandwidths, and array apertures, the phenomenon of beam squint arises as a serious impairment to beamforming. Fully digital arrays with true time delay per antenna element are a potential solution, but they require downconversion at each element. This paper shows that hybrid arrays can perform essentially as well as digital arrays once the number of radio-frequency chains exceeds a certain threshold that is far below the number of elements. The result is robust, holding also for suboptimum but highly appealing beamspace architectures.
This article puts the spotlight on the receiver front-end (RFE), an integral part of any wireless device that information theory typically idealizes into a mere addition of noise. While this idealization was sound in the past, as operating frequencies, bandwidths, and antenna counts rise, a soaring amount of power is required for the RFE to behave accordingly. Containing this surge in power expenditure exposes a harsher behavior on the part of the RFE (more noise, nonlinearities, and coarse quantization), setting up a tradeoff between the spectral efficiency under such nonidealities and the efficiency in the use of energy by the RFE. With the urge for radically better power consumptions and energy efficiencies in 6G, this emerges as an issue on which information theory can cast light at a fundamental level. More broadly, this article advocates the interest of having information theory embrace the device power consumption in its analyses. In turn, this calls for new models and abstractions such as the ones herein put together for the RFE, and for a more holistic perspective.
Scott L. Fleming, Alejandro Lozano, William J. Haberkorn, Jenelle A. Jindal, Eduardo P. Reis, Rahul Thapa, Louis Blankemeier, Julian Z. Genkins, Ethan Steinberg, Ashwin Nayak, Birju S. Patel, Chia-Chun Chiang, Alison Callahan, Zepeng Huo, Sergios Gatidis, Scott J. Adams, Oluseyi Fayanju, Shreya J. Shah, Thomas Savage, Ethan Goh, et al (10) The ability of large language models (LLMs) to follow natural language instructions with human-level fluency suggests many opportunities in healthcare to reduce administrative burden and improve quality of care. However, evaluating LLMs on realistic text generation tasks for healthcare remains challenging. Existing question answering datasets for electronic health record (EHR) data fail to capture the complexity of information needs and documentation burdens experienced by clinicians. To address these challenges, we introduce MedAlign, a benchmark dataset of 983 natural language instructions for EHR data. MedAlign is curated by 15 clinicians (7 specialities), includes clinician-written reference responses for 303 instructions, and provides 276 longitudinal EHRs for grounding instruction-response pairs. We used MedAlign to evaluate 6 general domain LLMs, having clinicians rank the accuracy and quality of each LLM response. We found high error rates, ranging from 35% (GPT-4) to 68% (MPT-7B-Instruct), and an 8.3% drop in accuracy moving from 32k to 2k context lengths for GPT-4. Finally, we report correlations between clinician rankings and automated natural language generation metrics as a way to rank LLMs without human review. We make MedAlign available under a research data use agreement to enable LLM evaluations on tasks aligned with clinician needs and preferences.
This paper provides a deterministic channel model for a scenario where wireless connectivity is established through a reflection off a smooth planar surface of an infinite extent. The developed model is rigorously built upon the physics of wave propagation and is as precise as tight are the unboundedness and smoothness assumptions on the surface. This model allows establishing how line-of-sight multiantenna communication is altered by a reflection off an electrically large surface, a situation of high interest for mmWave and terahertz frequencies.
Jun 27 2023
cs.CY arXiv:2306.13685v1
Purpose - The purpose of this study is to develop a game-based mobile application to help learners practice mathematical patterns and structures. Method - The study followed a mixed-method research design and prototyping methodology to guide the study in developing the mobile application. An instrument based on the Octalysis framework was developed as an evaluation tool for the study. Results - The study developed a mobile application based on the Octalysis framework. The application has fully achieved all its intended features based on the rating provided by the students and IT experts. Conclusion - The study successfully developed a mobile learning application for mathematical patterns and structures. By incorporating GBL principles and the Octalysis framework, the app achieved its intended features and received positive evaluations from students and IT experts. This highlights the potential of the app in promoting mathematical learning. Recommendations - This study recommends that the application be further enhanced to include other topics. Incorporating other game-based principles and approaches like timed questions and the difficulty level is also worth pursuing. Actual testing for end-users is also needed to verify the application's effectiveness. Practical Implications - Successful development of a game-based mobile app for practicing mathematical patterns and structures can transform education technology by engaging learners and enhancing their experience. This study provides valuable insights for future researchers developing similar applications, highlighting the potential to revolutionize traditional approaches and create an interactive learning environment for improving mathematical abilities.
Wireless communication technology has progressed dramatically over the past 25 years, in terms of societal adoption as well as technical sophistication. In 1998, mobile phones were still in the process of becoming compact and affordable devices that could be widely utilized in both developed and developing countries. There were "only" 300 million mobile subscribers in the world [1]. Cellular networks were among the first privatized telecommunication markets, and competition turned the devices into fashion accessories with attractive designs that could be individualized. The service was circumscribed to telephony and text messaging, but it was groundbreaking in that, for the first time, telecommunication was between people rather than locations. Wireless networks have changed dramatically over the past few decades, enabling this revolution in service provisioning and making it possible to accommodate the ensuing dramatic growth in traffic. There are many contributing components, including new air interfaces for faster transmission, channel coding for enhanced reliability, improved source compression to remove redundancies, and leaner protocols to reduce overheads. Signal processing is at the core of these improvements, but nowhere has it played a bigger role than in the development of multiantenna communication. This article tells the story of how major signal processing advances have transformed the early multiantenna concepts into mainstream technology over the past 25 years. The story therefore begins somewhat arbitrarily in 1998. A broad account of the state-of-the-art signal processing techniques for wireless systems by 1998 can be found in [2], and its contrast with recent textbooks such as [3]-[5] reveals the dramatic leap forward that has taken place in the interim.
Learning effective protein representations is critical in a variety of tasks in biology such as predicting protein functions. Recent sequence representation learning methods based on Protein Language Models (PLMs) excel in sequence-based tasks, but their direct adaptation to tasks involving protein structures remains a challenge. In contrast, structure-based methods leverage 3D structural information with graph neural networks and geometric pre-training methods show potential in function prediction tasks, but still suffers from the limited number of available structures. To bridge this gap, our study undertakes a comprehensive exploration of joint protein representation learning by integrating a state-of-the-art PLM (ESM-2) with distinct structure encoders (GVP, GearNet, CDConv). We introduce three representation fusion strategies and explore different pre-training techniques. Our method achieves significant improvements over existing sequence- and structure-based methods, setting new state-of-the-art for function annotation. This study underscores several important design choices for fusing protein sequence and structure information. Our implementation is available at https://github.com/DeepGraphLearning/ESM-GearNet.
Jan 31 2023
cs.LG arXiv:2301.12068v2
Self-supervised pre-training methods on proteins have recently gained attention, with most approaches focusing on either protein sequences or structures, neglecting the exploration of their joint distribution, which is crucial for a comprehensive understanding of protein functions by integrating co-evolutionary information and structural characteristics. In this work, inspired by the success of denoising diffusion models in generative tasks, we propose the DiffPreT approach to pre-train a protein encoder by sequence-structure joint diffusion modeling. DiffPreT guides the encoder to recover the native protein sequences and structures from the perturbed ones along the joint diffusion trajectory, which acquires the joint distribution of sequences and structures. Considering the essential protein conformational variations, we enhance DiffPreT by a method called Siamese Diffusion Trajectory Prediction (SiamDiff) to capture the correlation between different conformers of a protein. SiamDiff attains this goal by maximizing the mutual information between representations of diffusion trajectories of structurally-correlated conformers. We study the effectiveness of DiffPreT and SiamDiff on both atom- and residue-level structure-based protein understanding tasks. Experimental results show that the performance of DiffPreT is consistently competitive on all tasks, and SiamDiff achieves new state-of-the-art performance, considering the mean ranks on all tasks. Our implementation is available at https://github.com/DeepGraphLearning/SiamDiff.
An alternative derivation is provided for the degrees of freedom (DOF) formula on line-of-sight (LOS) channels via Landau's eigenvalue theorem for bandlimited signals. Compared to other approaches, Landau's theorem provides a general framework to compute the DOF in arbitrary environments, this framework is herein specialized to LOS propagation. The development shows how the spatially bandlimited nature of the channel relates to its geometry under the paraxial approximation that applies to most LOS settings of interest.
Inverse protein folding, the process of designing sequences that fold into a specific 3D structure, is crucial in bio-engineering and drug discovery. Traditional methods rely on experimentally resolved structures, but these cover only a small fraction of protein sequences. Forward folding models like AlphaFold offer a potential solution by accurately predicting structures from sequences. However, these models are too slow for integration into the optimization loop of inverse folding models during training. To address this, we propose using knowledge distillation on folding model confidence metrics, such as pTM or pLDDT scores, to create a faster and end-to-end differentiable distilled model. This model can then be used as a structure consistency regularizer in training the inverse folding model. Our technique is versatile and can be applied to other design tasks, such as sequence-based protein infilling. Experimental results show that our method outperforms non-regularized baselines, yielding up to 3% improvement in sequence recovery and up to 45% improvement in protein diversity while maintaining structural consistency in generated sequences. Code is available at https://github.com/IBM/AFDistill
5G millimeter-wave (mmWave) cellular networks are in the early phase of commercial deployments and present a unique opportunity for robust, high-data-rate communication to unmanned aerial vehicles (UAVs). A fundamental question is whether and how mmWave networks designed for terrestrial users should be modified to serve UAVs. The paper invokes realistic cell layouts, antenna patterns, and channel models trained from extensive ray tracing data to assess the performance of various network alternatives. Importantly, the study considers the addition of dedicated uptilted rooftop-mounted cells for aerial coverage, as well as novel spectrum sharing modes between terrestrial and aerial network operators. The effect of power control and of multiuser multiple-input multiple-output are also studied.
We provide a deterministic channel model for a scenario where wireless connectivity is established through a reflection from a planar smooth surface of an infinite extent. The developed model is rigorously built upon the physics of wave propagation, and is as precise as tight are the unboundedness and smoothness assumptions on the surface. This model allows establishing that line-of-sight spatial multiplexing can take place via reflection off an electrically large surface, a situation of high interest for mmWave and terahertz frequencies.
Mar 14 2022
cs.LG arXiv:2203.06125v5
Learning effective protein representations is critical in a variety of tasks in biology such as predicting protein function or structure. Existing approaches usually pretrain protein language models on a large number of unlabeled amino acid sequences and then finetune the models with some labeled data in downstream tasks. Despite the effectiveness of sequence-based approaches, the power of pretraining on known protein structures, which are available in smaller numbers only, has not been explored for protein property prediction, though protein structures are known to be determinants of protein function. In this paper, we propose to pretrain protein representations according to their 3D structures. We first present a simple yet effective encoder to learn the geometric features of a protein. We pretrain the protein graph encoder by leveraging multiview contrastive learning and different self-prediction tasks. Experimental results on both function prediction and fold classification tasks show that our proposed pretraining methods outperform or are on par with the state-of-the-art sequence-based methods, while using much less pretraining data. Our implementation is available at https://github.com/DeepGraphLearning/GearNet.
We introduce a new class of auto-encoders for directed graphs, motivated by a direct extension of the Weisfeiler-Leman algorithm to pairs of node labels. The proposed model learns pairs of interpretable latent representations for the nodes of directed graphs, and uses parameterized graph convolutional network (GCN) layers for its encoder and an asymmetric inner product decoder. Parameters in the encoder control the weighting of representations exchanged between neighboring nodes. We demonstrate the ability of the proposed model to learn meaningful latent embeddings and achieve superior performance on the directed link prediction task on several popular network datasets.
Challenges in the field of retinal prostheses motivate the development of retinal models to accurately simulate Retinal Ganglion Cells (RGCs) responses. The goal of retinal prostheses is to enable blind individuals to solve complex, reallife visual tasks. In this paper, we introduce the functional assessment (FA) of retinal models, which describes the concept of evaluating the performance of retinal models on visual understanding tasks. We present a machine learning method for FA: we feed traditional machine learning classifiers with RGC responses generated by retinal models, to solve object and digit recognition tasks (CIFAR-10, MNIST, Fashion MNIST, Imagenette). We examined critical FA aspects, including how the performance of FA depends on the task, how to optimally feed RGC responses to the classifiers and how the number of output neurons correlates with the model's accuracy. To increase the number of output neurons, we manipulated input images - by splitting and then feeding them to the retinal model and we found that image splitting does not significantly improve the model's accuracy. We also show that differences in the structure of datasets result in largely divergent performance of the retinal model (MNIST and Fashion MNIST exceeded 80% accuracy, while CIFAR-10 and Imagenette achieved ~40%). Furthermore, retinal models which perform better in standard evaluation, i.e. more accurately predict RGC response, perform better in FA as well. However, unlike standard evaluation, FA results can be straightforwardly interpreted in the context of comparing the quality of visual perception.
Computational protein design, i.e. inferring novel and diverse protein sequences consistent with a given structure, remains a major unsolved challenge. Recently, deep generative models that learn from sequences alone or from sequences and structures jointly have shown impressive performance on this task. However, those models appear limited in terms of modeling structural constraints, capturing enough sequence diversity, or both. Here we consider three recently proposed deep generative frameworks for protein design: (AR) the sequence-based autoregressive generative model, (GVP) the precise structure-based graph neural network, and Fold2Seq that leverages a fuzzy and scale-free representation of a three-dimensional fold, while enforcing structure-to-sequence (and vice versa) consistency. We benchmark these models on the task of computational design of antibody sequences, which demand designing sequences with high diversity for functional implication. The Fold2Seq framework outperforms the two other baselines in terms of diversity of the designed sequences, while maintaining the typical fold.
This paper tackles the problem of single-user multiple-input multiple-output communication with 1-bit digital-to-analog and analog-to-digital converters. With the information-theoretic capacity as benchmark, the complementary strategies of beamforming and equiprobable signaling are contrasted in the regimes of operational interest, and the ensuing spectral efficiencies are characterized. Various canonical channel types are considered, with emphasis on line-of-sight settings under both spherical and planar wavefronts, respectively representative of short and long transmission ranges at mmWave and terahertz frequencies. In all cases, a judicious combination of beamforming and equiprobable signaling is shown to operate within a modest gap from capacity.
What will the future of UAV cellular communications be? In this tutorial article, we address such a compelling yet difficult question by embarking on a journey from 5G to 6G and sharing a large number of realistic case studies supported by original results. We start by overviewing the status quo on UAV communications from an industrial standpoint, providing fresh updates from the 3GPP and detailing new 5G NR features in support of aerial devices. We then show the potential and the limitations of such features. In particular, we demonstrate how sub-6 GHz massive MIMO can successfully tackle cell selection and interference challenges, we showcase encouraging mmWave coverage evaluations in both urban and suburban/rural settings, and we examine the peculiarities of direct device-to-device communications in the sky. Moving on, we sneak a peek at next-generation UAV communications, listing some of the use cases envisioned for the 2030s. We identify the most promising 6G enablers for UAV communication, those expected to take the performance and reliability to the next level. For each of these disruptive new paradigms (non-terrestrial networks, cell-free architectures, artificial intelligence, reconfigurable intelligent surfaces, and THz communications), we gauge the prospective benefits for UAVs and discuss the main technological hurdles that stand in the way. All along, we distil our numerous findings into essential takeaways, and we identify key open problems worthy of further study.
A relentless trend in wireless communications is the hunger for bandwidth, and fresh bandwidth is only to be found at ever-higher frequencies. While 5G systems are seizing the mmWave band, the attention of researchers is shifting already to the terahertz range. In that distant land of tiny wavelengths, antenna arrays can serve for more than power-enhancing beamforming. Defying lower-frequency wisdom, spatial multiplexing becomes feasible even in line-of-sight conditions. This paper reviews the underpinnings of this phenomenon, and it surveys recent results on the ensuing information-theoretic capacity. Reconfigurable array architectures are put forth that can closely approach such capacity, practical challenges are discussed, and supporting experimental evidence is presented.
We study the training of regularized neural networks where the regularizer can be non-smooth and non-convex. We propose a unified framework for stochastic proximal gradient descent, which we term ProxGen, that allows for arbitrary positive preconditioners and lower semi-continuous regularizers. Our framework encompasses standard stochastic proximal gradient methods without preconditioners as special cases, which have been extensively studied in various settings. Not only that, we present two important update rules beyond the well-known standard methods as a byproduct of our approach: (i) the first closed-form proximal mappings of $\ell_q$ regularization ($0 \leq q \leq 1$) for adaptive stochastic gradient methods, and (ii) a revised version of ProxQuant that fixes a caveat of the original approach for quantization-specific regularizers. We analyze the convergence of ProxGen and show that the whole family of ProxGen enjoys the same convergence rate as stochastic proximal gradient descent without preconditioners. We also empirically show the superiority of proximal methods compared to subgradient-based approaches via extensive experiments. Interestingly, our results indicate that proximal methods with non-convex regularizers are more effective than those with convex regularizers.
Recent theoretical works based on the neural tangent kernel (NTK) have shed light on the optimization and generalization of over-parameterized networks, and partially bridge the gap between their practical success and classical learning theory. Especially, using the NTK-based approach, the following three representative results were obtained: (1) A training error bound was derived to show that networks can fit any finite training sample perfectly by reflecting a tighter characterization of training speed depending on the data complexity. (2) A generalization error bound invariant of network size was derived by using a data-dependent complexity measure (CMD). It follows from this CMD bound that networks can generalize arbitrary smooth functions. (3) A simple and analytic kernel function was derived as indeed equivalent to a fully-trained network. This kernel outperforms its corresponding network and the existing gold standard, Random Forests, in few shot learning. For all of these results to hold, the network scaling factor $\kappa$ should decrease w.r.t. sample size n. In this case of decreasing $\kappa$, however, we prove that the aforementioned results are surprisingly erroneous. It is because the output value of trained network decreases to zero when $\kappa$ decreases w.r.t. n. To solve this problem, we tighten key bounds by essentially removing $\kappa$-affected values. Our tighter analysis resolves the scaling problem and enables the validation of the original NTK-based results.
This paper establishes an upper bound on the capacity of line-of-sight multiantenna channels over all possible antenna arrangements and shows that uniform linear arrays (ULAs) with an SNR-dependent rotation of transmitter or receiver can closely approach such capacity---and in fact achieve it at low and high SNR, and asymptotically in the numbers of antennas. Then, as an alternative to mechanically rotating ULAs, we propose to electronically select among multiple ULAs having a radial disposition at either transmitter or receiver, and we bound the shortfall from capacity as a function of the number of such ULAs. With only three ULAs, properly angled, 96% of the capacity can be achieved. Finally, we further introduce reduced-complexity precoders and linear receivers that capitalize on the structure of the channels spawned by these configurable ULA architectures.
Adaptive gradient approaches that automatically adjust the learning rate on a per-feature basis have been very popular for training deep networks. This rich class of algorithms includes Adagrad, RMSprop, Adam, and recent extensions. All these algorithms have adopted diagonal matrix adaptation, due to the prohibitive computational burden of manipulating full matrices in high-dimensions. In this paper, we show that block-diagonal matrix adaptation can be a practical and powerful solution that can effectively utilize structural characteristics of deep learning architectures, and significantly improve convergence and out-of-sample generalization. We present a general framework with block-diagonal matrix updates via coordinate grouping, which includes counterparts of the aforementioned algorithms, prove their convergence in non-convex optimization, highlighting benefits compared to diagonal versions. In addition, we propose an efficient spectrum-clipping scheme that benefits from superior generalization performance of Sgd. Extensive experiments reveal that block-diagonal approaches achieve state-of-the-art results on several deep learning tasks, and can outperform adaptive diagonal methods, vanilla Sgd, as well as a modified version of full-matrix adaptation proposed very recently.
This paper presents analytical expressions for the signal-to-interference ratio (SIR) and the spectral efficiency in macrocellular networks with massive MIMO conjugate beamforming, both with a uniform and a channel-dependent power allocation. These expressions, which apply to very general network geometries, are asymptotic in the strength of the shadowing. Through Monte-Carlo simulation, we verify their accuracy for relevant network topologies and shadowing strengths. Also, since the analysis does not include pilot contamination, we further gauge through Monte-Carlo simulation the deviation that this phenomenon causes with respect to our results, and hence the scope of the analysis.
CLEVER (Cross-Lipschitz Extreme Value for nEtwork Robustness) is an Extreme Value Theory (EVT) based robustness score for large-scale deep neural networks (DNNs). In this paper, we propose two extensions on this robustness score. First, we provide a new formal robustness guarantee for classifier functions that are twice differentiable. We apply extreme value theory on the new formal robustness guarantee and the estimated robustness is called second-order CLEVER score. Second, we discuss how to handle gradient masking, a common defensive technique, using CLEVER with Backward Pass Differentiable Approximation (BPDA). With BPDA applied, CLEVER can evaluate the intrinsic robustness of neural networks of a broader class -- networks with non-differentiable input transformations. We demonstrate the effectiveness of CLEVER with BPDA in experiments on a 121-layer Densenet model trained on the ImageNet dataset.
We consider multi-response and multitask regression models, where the parameter matrix to be estimated is expected to have an unknown grouping structure. The groupings can be along tasks, or features, or both, the last one indicating a bi-cluster or "checkerboard" structure. Discovering this grouping structure along with parameter inference makes sense in several applications, such as multi-response Genome-Wide Association Studies. This additional structure can not only can be leveraged for more accurate parameter estimation, but it also provides valuable information on the underlying data mechanisms (e.g. relationships among genotypes and phenotypes in GWAS). In this paper, we propose two formulations to simultaneously learn the parameter matrix and its group structures, based on convex regularization penalties. We present optimization approaches to solve the resulting problems and provide numerical convergence guarantees. Our approaches are validated on extensive simulations and real datasets concerning phenotypes and genotypes of plant varieties.
Jul 27 2017
cs.CL arXiv:1707.08290v1
Entropy is a fundamental property of a repertoire. Here, we present an efficient algorithm to estimate the entropy of types with the help of Zhang's estimator. The algorithm takes advantage of the fact that the number of different frequencies in a text is in general much smaller than the number of types. We justify the convenience of the algorithm by means of an analysis of the statistical properties of texts from more than 1000 languages. Our work opens up various possibilities for future research.
In this paper, we focus on online representation learning in non-stationary environments which may require continuous adaptation of model architecture. We propose a novel online dictionary-learning (sparse-coding) framework which incorporates the addition and deletion of hidden units (dictionary elements), and is inspired by the adult neurogenesis phenomenon in the dentate gyrus of the hippocampus, known to be associated with improved cognitive function and adaptation to new environments. In the online learning setting, where new input instances arrive sequentially in batches, the neuronal-birth is implemented by adding new units with random initial weights (random dictionary elements); the number of new units is determined by the current performance (representation error) of the dictionary, higher error causing an increase in the birth rate. Neuronal-death is implemented by imposing l1/l2-regularization (group sparsity) on the dictionary within the block-coordinate descent optimization at each iteration of our online alternating minimization scheme, which iterates between the code and dictionary updates. Finally, hidden unit connectivity adaptation is facilitated by introducing sparsity in dictionary elements. Our empirical evaluation on several real-life datasets (images and language) as well as on synthetic data demonstrates that the proposed approach can considerably outperform the state-of-art fixed-size (nonadaptive) online sparse coding of Mairal et al. (2009) in the presence of nonstationary data. Moreover, we identify certain properties of the data (e.g., sparse inputs with nearly non-overlapping supports) and of the model (e.g., dictionary sparsity) associated with such improvements.
This paper shows how the application of stochastic geometry to the analysis of wireless networks is greatly facilitated by (i) a clear separation of time scales, (ii) the abstraction of small-scale effects via ergodicity, and (iii) an interference model that reflects the receiver's lack of knowledge of how each individual interference term is faded. These procedures render the analysis both more manageable and more precise, as well as more amenable to the incorporation of subsequent features. In particular, the paper presents analytical characterizations of the ergodic spectral efficiency of cellular networks with single-user multiple-input multiple-output (MIMO) and sectorization. These characterizations, in the form of easy-to-evaluate expressions, encompass the coverage, the distribution of spectral efficiency over the network locations, and the average thereof.
This paper investigates the feasibility of mmWave frequencies for personal networks of wireless wearable devices in enclosed settings (e.g., commuter trains, subways, airplanes, airports, or offices). At these frequencies, specular reflections off surfaces are expected to contribute intended signal power and, simultaneously, to aggravate the interference at the receivers. Meanwhile, blockages by obstacles and people---including the individuals wearing the devices---are expected to shield receivers from interference. With the aid of stochastic geometry and random shape theory, we assess the interplay of surface reflections and blockages for dense deployments of wearable networks equipped with directional antenna arrays in relevant indoor settings.
Innovation is among the key factors driving a country's economic and social growth. But what are the factors that make a country innovative? How do they differ across different parts of the world and different stages of development? In this work done in collaboration with the World Economic Forum (WEF), we analyze the scores obtained through executive opinion surveys that constitute the WEF's Global Competitiveness Index in conjunction with other country-level metrics and indicators to identify actionable levers of innovation. The findings can help country leaders and organizations shape the policies to drive developmental activities and increase the capacity of innovation.
This paper investigates the design of precoders for single-user multiple-input multiple-output (MIMO) channels, and in particular for finite-alphabet signals. Based on an asymptotic expression for the mutual information of channels exhibiting line-of-sight components and rather general antenna correlations, precoding structures that decompose the general channel into a set of parallel subchannel pairs are proposed. Then, a low-complexity iterative algorithm is devised to maximize the sum mutual information of all pairs. The proposed algorithm significantly reduces the computational load of existing approaches with only minimal loss in performance. The complexity savings increase with the number of transmit antennas and with the cardinality of the signal alphabet, making it possible to support values thereof that were unmanageable with existing solutions. Most importantly, the proposed solution does not require instantaneous channel state information (CSI) at the transmitter, but only statistical CSI.
We consider the problem of removing and replacing clouds in satellite image sequences, which has a wide range of applications in remote sensing. Our approach first detects and removes the cloud-contaminated part of the image sequences. It then recovers the missing scenes from the clean parts using the proposed "TECROMAC" (TEmporally Contiguous RObust MAtrix Completion) objective. The objective function balances temporal smoothness with a low rank solution while staying close to the original observations. The matrix whose the rows are pixels and columnsare days corresponding to the image, has low-rank because the pixels reflect land-types such as vegetation, roads and lakes and there are relatively few variations as a result. We provide efficient optimization algorithms for TECROMAC, so we can exploit images containing millions of pixels. Empirical results on real satellite image sequences, as well as simulated data, demonstrate that our approach is able to recover underlying images from heavily cloud-contaminated observations.
In this paper, we investigate the design of multiple-input multiple-output single-user precoders for finite-alphabet signals under the premise of statistical channel-state information at the transmitter. Based on an asymptotic expression for the mutual information of channels exhibiting antenna correlations, we propose a low-complexity iterative algorithm that radically reduces the computational load of existing approaches by orders of magnitude with only minimal losses in performance. The savings increase with the number of transmit antennas and with the cardinality of the signal alphabet, making it possible to supports values thereof that were unwieldy in existing solutions.
We propose a general matrix-valued multiple kernel learning framework for high-dimensional nonlinear multivariate regression problems. This framework allows a broad class of mixed norm regularizers, including those that induce sparsity, to be imposed on a dictionary of vector-valued Reproducing Kernel Hilbert Spaces. We develop a highly scalable and eigendecomposition-free algorithm that orchestrates two inexact solvers for simultaneously learning both the input and output components of separable matrix-valued kernels. As a key application enabled by our framework, we show how high-dimensional causal inference tasks can be naturally cast as sparse function estimation problems, leading to novel nonlinear extensions of a class of Graphical Granger Causality techniques. Our algorithmic developments and extensive empirical studies are complemented by theoretical analyses in terms of Rademacher generalization bounds.
This paper presents a framework that enables characterizing analytically the spectral efficiency achievable by D2D (device-to-device) communication integrated with a cellular network. This framework is based on a stochastic geometry formulation with a novel approach to the modeling of interference and with the added possibility of incorporating exclusion regions to protect cellular receivers from excessive interference from active D2D transmitters. To illustrate the potential of the framework, a number of examples are provided. These examples confirm the potential of D2D communication in situations of strong traffic locality as well as the effectiveness of properly sized exclusion regions.
What will 5G be? What it will not be is an incremental advance on 4G. The previous four generations of cellular technology have each been a major paradigm shift that has broken backwards compatibility. And indeed, 5G will need to be a paradigm shift that includes very high carrier frequencies with massive bandwidths, extreme base station and device densities and unprecedented numbers of antennas. But unlike the previous four generations, it will also be highly integrative: tying any new 5G air interface and spectrum together with LTE and WiFi to provide universal high-rate coverage and a seamless user experience. To support this, the core network will also have to reach unprecedented levels of flexibility and intelligence, spectrum regulation will need to be rethought and improved, and energy and cost efficiencies will become even more critical considerations. This paper discusses all of these topics, identifying key challenges for future research and preliminary 5G standardization activities, while providing a comprehensive overview of the current literature, and in particular of the papers appearing in this special issue.
This paper characterizes the performance of coordinated beamforming with dynamic clustering. A downlink model based on stochastic geometry is put forth to analyze the performance of such base station (BS) coordination strategy. Analytical expressions for the complementary cumulative distribution function (CCDF) of the instantaneous signal-to-interference ratio (SIR) are derived in terms of relevant system parameters, chiefly the number of BSs forming the coordination clusters, the number of antennas per BS, and the pathloss exponent. Utilizing this CCDF, with pilot overheads further incorporated into the analysis, we formulate the optimization of the BS coordination clusters for a given fading coherence. Our results indicate that (i) coordinated beamforming is most beneficial to users that are in the outer part of their cells yet in the inner part of their coordination cluster, and that (ii) the optimal cluster cardinality for the typical user is small and it scales with the fading coherence. Simulation results verify the exactness of the SIR distributions derived for stochastic geometries, which are further compared with the corresponding distributions for deterministic grid networks.
We consider new formulations and methods for sparse quantile regression in the high-dimensional setting. Quantile regression plays an important role in many applications, including outlier-robust exploratory analysis in gene selection. In addition, the sparsity consideration in quantile regression enables the exploration of the entire conditional distribution of the response variable given the predictors and therefore yields a more comprehensive view of the important predictors. We propose a generalized OMP algorithm for variable selection, taking the misfit loss to be either the traditional quantile loss or a smooth version we call quantile Huber, and compare the resulting greedy approaches with convex sparsity-regularized formulations. We apply a recently proposed interior point methodology to efficiently solve all convex formulations as well as convex subproblems in the generalized OMP setting, pro- vide theoretical guarantees of consistent estimation, and demonstrate the performance of our approach using empirical studies of simulated and genomic datasets.
New research directions will lead to fundamental changes in the design of future 5th generation (5G) cellular networks. This paper describes five technologies that could lead to both architectural and component disruptive design changes: device-centric architectures, millimeter Wave, Massive-MIMO, smarter devices, and native support to machine-2-machine. The key ideas for each technology are described, along with their potential impact on 5G and the research challenges that remain.
This paper presents a new connection between the generalized Marcum-Q function and the confluent hypergeometric function of two variables, phi3. This result is then applied to the closed-form characterization of the bivariate Nakagami-m distribution and of the distribution of the minimum eigenvalue of correlated non-central Wishart matrices, both important in communication theory. New expressions for the corresponding cumulative distributions are obtained and a number of communication-theoretic problems involving them are pointed out.
We propose a general matrix-valued multiple kernel learning framework for high-dimensional nonlinear multivariate regression problems. This framework allows a broad class of mixed norm regularizers, including those that induce sparsity, to be imposed on a dictionary of vector-valued Reproducing Kernel Hilbert Spaces. We develop a highly scalable and eigendecomposition-free algorithm that orchestrates two inexact solvers for simultaneously learning both the input and output components of separable matrix-valued kernels. As a key application enabled by our framework, we show how high-dimensional causal inference tasks can be naturally cast as sparse function estimation problems, leading to novel nonlinear extensions of a class of Graphical Granger Causality techniques. Our algorithmic developments and extensive empirical studies are complemented by theoretical analyses in terms of Rademacher generalization bounds.
Interference alignment (IA) is a cooperative transmission strategy that, under some conditions, achieves the interference channel's maximum number of degrees of freedom. Realizing IA gains, however, is contingent upon providing transmitters with sufficiently accurate channel knowledge. In this paper, we study the performance of IA in multiple-input multiple-output systems where channel knowledge is acquired through training and analog feedback. We design the training and feedback system to maximize IA's effective sum-rate: a non-asymptotic performance metric that accounts for estimation error, training and feedback overhead, and channel selectivity. We characterize effective sum-rate with overhead in relation to various parameters such as signal-to-noise ratio, Doppler spread, and feedback channel quality. A main insight from our analysis is that, by properly designing the CSI acquisition process, IA can provide good sum-rate performance in a very wide range of fading scenarios. Another observation from our work is that such overhead-aware analysis can help solve a number of practical network design problems. To demonstrate the concept of overhead-aware network design, we consider the example problem of finding the optimal number of cooperative IA users based on signal power and mobility.
Cooperation is viewed as a key ingredient for interference management in wireless systems. This paper shows that cooperation has fundamental limitations. The main result is that even full cooperation between transmitters cannot in general change an interference-limited network to a noise-limited network. The key idea is that there exists a spectral efficiency upper bound that is independent of the transmit power. First, a spectral efficiency upper bound is established for systems that rely on pilot-assisted channel estimation; in this framework, cooperation is shown to be possible only within clusters of limited size, which are subject to out-of-cluster interference whose power scales with that of the in-cluster signals. Second, an upper bound is also shown to exist when cooperation is through noncoherent communication; thus, the spectral efficiency limitation is not a by-product of the reliance on pilot-assisted channel estimation. Consequently, existing literature that routinely assumes the high-power spectral efficiency scales with the log of the transmit power provides only a partial characterization. The complete characterization proposed in this paper subdivides the high-power regime into a degrees-of-freedom regime, where the scaling with the log of the transmit power holds approximately, and a saturation regime, where the spectral efficiency hits a ceiling that is independent of the power. Using a cellular system as an example, it is demonstrated that the spectral efficiency saturates at power levels of operational relevance.
We present a method to compute, quickly and efficiently, the mutual information achieved by an IID (independent identically distributed) complex Gaussian input on a block Rayleigh-faded channel without side information at the receiver. The method accommodates both scalar and MIMO (multiple-input multiple-output) settings. Operationally, the mutual information thus computed represents the highest spectral efficiency that can be attained using standard Gaussian codebooks. Examples are provided that illustrate the loss in spectral efficiency caused by fast fading and how that loss is amplified by the use of multiple transmit antennas. These examples are further enriched by comparisons with the channel capacity under perfect channel-state information at the receiver, and with the spectral efficiency attained by pilot-based transmission.
Discrete tomography deals with reconstructing finite spatial objects from lower dimensional projections and has applications for example in timetable design. In this paper we consider the problem of reconstructing a tile packing from its row and column projections. It consists of disjoint copies of a fixed tile, all contained in some rectangular grid. The projections tell how many cells are covered by a tile in each row and column. How difficult is it to construct a tile packing satisfying given projections? It was known to be solvable by a greedy algorithm for bars (tiles of width or height 1), and NP-hardness results were known for some specific tiles. This paper shows that the problem is NP-hard whenever the tile is not a bar.
The spectral efficiency achievable with joint processing of pilot and data symbol observations is compared with that achievable through the conventional (separate) approach of first estimating the channel on the basis of the pilot symbols alone, and subsequently detecting the data symbols. Studied on the basis of a mutual information lower bound, joint processing is found to provide a non-negligible advantage relative to separate processing, particularly for fast fading. It is shown that, regardless of the fading rate, only a very small number of pilot symbols (at most one per transmit antenna and per channel coherence interval) should be transmitted if joint processing is allowed.
The optimization of the pilot overhead in single-user wireless fading channels is investigated, and the dependence of this overhead on various system parameters of interest (e.g., fading rate, signal-to-noise ratio) is quantified. The achievable pilot-based spectral efficiency is expanded with respect to the fading rate about the no-fading point, which leads to an accurate order expansion for the pilot overhead. This expansion identifies that the pilot overhead, as well as the spectral efficiency penalty with respect to a reference system with genie-aided CSI (channel state information) at the receiver, depend on the square root of the normalized Doppler frequency. Furthermore, it is shown that the widely-used block fading model is only a special case of more accurate continuous fading models in terms of the achievable pilot-based spectral efficiency, and that the overhead optimization for multiantenna systems is effectively the same as for single-antenna systems with the normalized Doppler frequency multiplied by the number of transmit antennas.
A contemporary perspective on the tradeoff between transmit antenna diversity and spatial multiplexing is provided. It is argued that, in the context of most modern wireless systems and for the operating points of interest, transmission techniques that utilize all available spatial degrees of freedom for multiplexing outperform techniques that explicitly sacrifice spatial multiplexing for diversity. In the context of such systems, therefore, there essentially is no decision to be made between transmit antenna diversity and spatial multiplexing in MIMO communication. Reaching this conclusion, however, requires that the channel and some key system features be adequately modeled and that suitable performance metrics be adopted; failure to do so may bring about starkly different conclusions. As a specific example, this contrast is illustrated using the 3GPP Long-Term Evolution system design.