-
A Simplified Positional Cell Type Visualization using Spatially Aggregated Clusters
Authors:
Lee Mason,
Jonas Almeida
Abstract:
We introduce a novel method for overlaying cell type proportion data onto tissue images. This approach preserves spatial context while avoiding visual clutter or excessively obscuring the underlying slide. Our proposed technique involves clustering the data and aggregating neighboring points of the same cluster into polygons.
We introduce a novel method for overlaying cell type proportion data onto tissue images. This approach preserves spatial context while avoiding visual clutter or excessively obscuring the underlying slide. Our proposed technique involves clustering the data and aggregating neighboring points of the same cluster into polygons.
△ Less
Submitted 20 September, 2024;
originally announced October 2024.
-
Crown-Like Structures in Breast Adipose Tissue: Finding a 'Needle-in-a-Haystack' using Artificial Intelligence and Collaborative Active Learning on the Web
Authors:
Praphulla MS Bhawsar,
Cody Ramin,
Petra Lenz,
Máire A Duggan,
Alexandra R Harris,
Brittany Jenkins,
Renata Cora,
Mustapha Abubakar,
Gretchen Gierach,
Joel Saltz,
Jonas S Almeida
Abstract:
Crown-like structures (CLS) in breast adipose tissue are formed as a result of macrophages clustering around necrotic adipocytes in specific patterns. As a histologic marker of local inflammation, CLS could have potential diagnostic utility as a biomarker for breast cancer risk. However, given the scale of whole slide images and the rarity of CLS (a few cells in an entire tissue sample), microscop…
▽ More
Crown-like structures (CLS) in breast adipose tissue are formed as a result of macrophages clustering around necrotic adipocytes in specific patterns. As a histologic marker of local inflammation, CLS could have potential diagnostic utility as a biomarker for breast cancer risk. However, given the scale of whole slide images and the rarity of CLS (a few cells in an entire tissue sample), microscope-based manual identification is a challenge for the pathologist. In this report, we describe an artificial intelligence pipeline to solve this needle-in-a-haystack problem. We developed a zero-cost, zero-footprint web platform to enable remote operation on digital whole slide imaging data directly in the web browser, supporting collaborative annotation of the data by multiple experts. The annotated images then allow for incremental training and fine tuning of deep neural networks via active learning. The platform is reusable and requires no backend or installations, thus ensuring the data remains secure and private under the governance of the end user. Using this platform, we iteratively trained a CLS identification model, evaluating the performance after each round and adding examples to the training data to overcome failure cases. The resulting model, with an AUC of 0.90, shows promise as a first-pass screening tool to detect CLS in breast adipose tissue, considerably reducing the workload of the pathologist.
Platform available at: https://episphere.github.io/path
△ Less
Submitted 12 September, 2024;
originally announced September 2024.
-
A Topic-wise Exploration of the Telegram Group-verse
Authors:
Alessandro Perlo,
Giordano Paoletti,
Nikhil Jha,
Luca Vassio,
Jussara Almeida,
Marco Mellia
Abstract:
Although currently one of the most popular instant messaging apps worldwide, Telegram has been largely understudied in the past years. In this paper, we aim to address this gap by presenting an analysis of publicly accessible groups covering discussions encompassing different topics, as diverse as Education, Erotic, Politics, and Cryptocurrencies. We engineer and offer an open-source tool to autom…
▽ More
Although currently one of the most popular instant messaging apps worldwide, Telegram has been largely understudied in the past years. In this paper, we aim to address this gap by presenting an analysis of publicly accessible groups covering discussions encompassing different topics, as diverse as Education, Erotic, Politics, and Cryptocurrencies. We engineer and offer an open-source tool to automate the collection of messages from Telegram groups, a non-straightforward problem. We use it to collect more than 50 million messages from 669 groups. Here, we present a first-of-its-kind, per-topic analysis, contrasting the characteristics of the messages sent on the platform from different angles -- the language, the presence of bots, the type and volume of shared media content. Our results confirm some anecdotal evidence, e.g., clues that Telegram is used to share possibly illicit content, and unveil some unexpected findings, e.g., the different sharing patterns of video and stickers in groups of different topics. While preliminary, we hope that our work paves the road for several avenues of future research on the understudied Telegram platform.
△ Less
Submitted 4 September, 2024;
originally announced September 2024.
-
Residual-based Adaptive Huber Loss (RAHL) -- Design of an improved Huber loss for CQI prediction in 5G networks
Authors:
Mina Kaviani,
Jurandy Almeida,
Fabio L. Verdi
Abstract:
The Channel Quality Indicator (CQI) plays a pivotal role in 5G networks, optimizing infrastructure dynamically to ensure high Quality of Service (QoS). Recent research has focused on improving CQI estimation in 5G networks using machine learning. In this field, the selection of the proper loss function is critical for training an accurate model. Two commonly used loss functions are Mean Squared Er…
▽ More
The Channel Quality Indicator (CQI) plays a pivotal role in 5G networks, optimizing infrastructure dynamically to ensure high Quality of Service (QoS). Recent research has focused on improving CQI estimation in 5G networks using machine learning. In this field, the selection of the proper loss function is critical for training an accurate model. Two commonly used loss functions are Mean Squared Error (MSE) and Mean Absolute Error (MAE). Roughly speaking, MSE put more weight on outliers, MAE on the majority. Here, we argue that the Huber loss function is more suitable for CQI prediction, since it combines the benefits of both MSE and MAE. To achieve this, the Huber loss transitions smoothly between MSE and MAE, controlled by a user-defined hyperparameter called delta. However, finding the right balance between sensitivity to small errors (MAE) and robustness to outliers (MSE) by manually choosing the optimal delta is challenging. To address this issue, we propose a novel loss function, named Residual-based Adaptive Huber Loss (RAHL). In RAHL, a learnable residual is added to the delta, enabling the model to adapt based on the distribution of errors in the data. Our approach effectively balances model robustness against outliers while preserving inlier data precision. The widely recognized Long Short-Term Memory (LSTM) model is employed in conjunction with RAHL, showcasing significantly improved results compared to the aforementioned loss functions. The obtained results affirm the superiority of RAHL, offering a promising avenue for enhanced CQI prediction in 5G networks.
△ Less
Submitted 26 August, 2024;
originally announced August 2024.
-
Towards Foundation Models for the Industrial Forecasting of Chemical Kinetics
Authors:
Imran Nasim,
Joaõ Lucas de Sousa Almeida
Abstract:
Scientific Machine Learning is transforming traditional engineering industries by enhancing the efficiency of existing technologies and accelerating innovation, particularly in modeling chemical reactions. Despite recent advancements, the issue of solving stiff chemically reacting problems within computational fluid dynamics remains a significant issue. In this study we propose a novel approach ut…
▽ More
Scientific Machine Learning is transforming traditional engineering industries by enhancing the efficiency of existing technologies and accelerating innovation, particularly in modeling chemical reactions. Despite recent advancements, the issue of solving stiff chemically reacting problems within computational fluid dynamics remains a significant issue. In this study we propose a novel approach utilizing a multi-layer-perceptron mixer architecture (MLP-Mixer) to model the time-series of stiff chemical kinetics. We evaluate this method using the ROBER system, a benchmark model in chemical kinetics, to compare its performance with traditional numerical techniques. This study provides insight into the industrial utility of the recently developed MLP-Mixer architecture to model chemical kinetics and provides motivation for such neural architecture to be used as a base for time-series foundation models.
△ Less
Submitted 20 August, 2024;
originally announced August 2024.
-
Is it a work or leisure travel? Applying text classification to identify work-related travel on social networks
Authors:
Lucas Félix,
Washington Cunha,
Jussara Almeida
Abstract:
In today's digital era, the use of Social Networks (SNs) and Location-Based SNs (LBSNs) has become integral for travelers seeking Points of Interest (POI) and sharing travel experiences. This trend is supported by the fact that a significant majority of American travelers utilize SNs during their trips. However, the abundance of information available on these platforms presents a challenge in iden…
▽ More
In today's digital era, the use of Social Networks (SNs) and Location-Based SNs (LBSNs) has become integral for travelers seeking Points of Interest (POI) and sharing travel experiences. This trend is supported by the fact that a significant majority of American travelers utilize SNs during their trips. However, the abundance of information available on these platforms presents a challenge in identifying the best options. To address this issue, Recommender Systems (RS) are commonly employed to suggest POIs based on user history, with the integration of contextual information enhancing the quality of recommendations. Notably, incorporating user travel purpose, which is often overlooked but holds potential in characterizing travelers' behavior, can lead to more tailored recommendations. In this study, we propose a model to predict whether a trip is leisure or work-related, utilizing state-of-the-art Automatic Text Classification (ATC) models such as BERT, RoBERTa, and BART to enhance the understanding of user travel purposes and improve recommendation accuracy in specific travel scenarios.
△ Less
Submitted 12 August, 2024;
originally announced August 2024.
-
Demystifying Spatial Dependence: Interactive Visualizations for Interpreting Local Spatial Autocorrelation
Authors:
Lee Mason,
Blanaid Hicks,
Jonas Almeida
Abstract:
The Local Moran's I statistic is a valuable tool for identifying localized patterns of spatial autocorrelation. Understanding these patterns is crucial in spatial analysis, but interpreting the statistic can be difficult. To simplify this process, we introduce three novel visualizations that enhance the interpretation of Local Moran's I results. These visualizations can be interactively linked to…
▽ More
The Local Moran's I statistic is a valuable tool for identifying localized patterns of spatial autocorrelation. Understanding these patterns is crucial in spatial analysis, but interpreting the statistic can be difficult. To simplify this process, we introduce three novel visualizations that enhance the interpretation of Local Moran's I results. These visualizations can be interactively linked to one another, and to established visualizations, to offer a more holistic exploration of the results. We provide a JavaScript library with implementations of these new visual elements, along with a web dashboard that demonstrates their integrated use.
△ Less
Submitted 5 August, 2024;
originally announced August 2024.
-
TMA-Grid: An open-source, zero-footprint web application for FAIR Tissue MicroArray De-arraying
Authors:
Aaron Ge,
Monjoy Saha,
Maire A. Duggan,
Petra Lenz,
Mustapha Abubakar,
Montserrat García-Closas,
Jeya Balasubramanian,
Jonas S. Almeida,
Praphulla MS Bhawsar
Abstract:
Background:
Tissue Microarrays (TMAs) significantly increase analytical efficiency in histopathology and large-scale epidemiologic studies by allowing multiple tissue cores to be scanned on a single slide. The individual cores can be digitally extracted and then linked to metadata for analysis in a process known as de-arraying. However, TMAs often contain core misalignments and artifacts due to…
▽ More
Background:
Tissue Microarrays (TMAs) significantly increase analytical efficiency in histopathology and large-scale epidemiologic studies by allowing multiple tissue cores to be scanned on a single slide. The individual cores can be digitally extracted and then linked to metadata for analysis in a process known as de-arraying. However, TMAs often contain core misalignments and artifacts due to assembly errors, which can adversely affect the reliability of the extracted cores during the de-arraying process. Moreover, conventional approaches for TMA de-arraying rely on desktop solutions.Therefore, a robust yet flexible de-arraying method is crucial to account for these inaccuracies and ensure effective downstream analyses.
Results:
We developed TMA-Grid, an in-browser, zero-footprint, interactive web application for TMA de-arraying. This web application integrates a convolutional neural network for precise tissue segmentation and a grid estimation algorithm to match each identified core to its expected location. The application emphasizes interactivity, allowing users to easily adjust segmentation and gridding results. Operating entirely in the web-browser, TMA-Grid eliminates the need for downloads or installations and ensures data privacy. Adhering to FAIR principles (Findable, Accessible, Interoperable, and Reusable), the application and its components are designed for seamless integration into TMA research workflows.
Conclusions:
TMA-Grid provides a robust, user-friendly solution for TMA dearraying on the web. As an open, freely accessible platform, it lays the foundation for collaborative analyses of TMAs and similar histopathology imaging data. Availability: Web application: https://episphere.github.io/tma-grid Code: https://github.com/episphere/tma-grid Tutorial: https://youtu.be/miajqyw4BVk
△ Less
Submitted 30 July, 2024;
originally announced July 2024.
-
The stellar distribution in ultra-faint dwarf galaxies suggests deviations from the collision-less cold dark matter paradigm
Authors:
Jorge Sanchez Almeida,
Ignacio Trujillo,
Angel R. Plastino
Abstract:
Unraveling the nature of dark matter (DM) stands as a primary objective in modern physics. Here we present evidence suggesting deviations from the collisionless Cold DM (CDM) paradigm. It arises from the radial distribution of stars in six Ultra Faint Dwarf (UFD) galaxies measured with the Hubble Space Telescope (HST). After a trivial renormalization in size and central density, the six UFDs show…
▽ More
Unraveling the nature of dark matter (DM) stands as a primary objective in modern physics. Here we present evidence suggesting deviations from the collisionless Cold DM (CDM) paradigm. It arises from the radial distribution of stars in six Ultra Faint Dwarf (UFD) galaxies measured with the Hubble Space Telescope (HST). After a trivial renormalization in size and central density, the six UFDs show the same stellar distribution, which happens to have a central plateau or core. Assuming spherical symmetry and isotropic velocities, the Eddington inversion method proves the observed distribution to be inconsistent with potentials characteristic of CDM particles. Under such assumptions, the observed innermost slope of the stellar profile discards the UFDs to reside in a CDM potential at a > 97% confidence level. The extremely low stellar mass of these galaxies, 10**3-10**4 Msun , prevents stellar feedback from modifying the shape of a CDM potential. Other conceivable explanations for the observed cores, like deviations from spherical symmetry and isotropy, tidal forces, and the exact form of the used CDM potential, are disfavored by simulations and/or observations. Thus, the evidence suggests that collisions among DM particles or other alternatives to CDM are likely shaping these galaxies. Many of these alternatives produce cored gravitational potentials, shown here to be consistent with the observed stellar distribution.
△ Less
Submitted 23 July, 2024;
originally announced July 2024.
-
Application of the Eddington inversion method to constrain the dark matter halo of galaxies using only observed surface brightness profiles
Authors:
Jorge Sanchez Almeida,
Angel R. Plastino,
Ignacio Trujillo
Abstract:
*** Context: The halos of low-mass galaxies may allow us to constrain the nature of dark matter (DM), but the kinematic measurements to diagnose the required properties are technically extremely challenging. However, the photometry of these systems is doable. Aims. Using only stellar photometry, constrain key properties of the DM haloes in low-mass galaxies. *** Methods: Unphysical pairs of DM gra…
▽ More
*** Context: The halos of low-mass galaxies may allow us to constrain the nature of dark matter (DM), but the kinematic measurements to diagnose the required properties are technically extremely challenging. However, the photometry of these systems is doable. Aims. Using only stellar photometry, constrain key properties of the DM haloes in low-mass galaxies. *** Methods: Unphysical pairs of DM gravitational potentials and starlight distributions can be identified if the pair requires a distribution function f that is negative somewhere in the phase space. We use the classical Eddington inversion method (EIM) to compute f for a battery of DM gravitational potentials and around 100 observed low-mass galaxies with Mstar between 10**6 and 10**8 Msun. The battery includes NFW potentials (expected from cold DM) and potentials stemming from cored mass distributions (expected in many alternatives to cold DM). The method assumes spherical symmetry and isotropic velocity distribution and requires fitting the observed profiles with analytic functions, for which we use polytropes (with zero inner slope, a.k.a. core) and profiles with variable inner and outer slopes. The validity of all these assumptions is analyzed. *** Results: In general, the polytropes fit well the observed starlight profiles. If they were the correct fits (which could be the case) then all galaxies are inconsistent with NFW-like potentials. Alternatively, when the inner slope is allowed to vary for fitting, between 40% and 70% of the galaxies are consistent with cores in the stellar mass distribution and thus inconsistent with NFW-like potentials. *** Conclusions: Even though the stellar mass of the observed galaxies is still not low enough to constrain the nature of DM, this work shows the practical feasibility of the EIM technique to infer DM properties only from photometry.
△ Less
Submitted 23 July, 2024;
originally announced July 2024.
-
Hierarchical Homogeneity-Based Superpixel Segmentation: Application to Hyperspectral Image Analysis
Authors:
Luciano Carvalho Ayres,
Sérgio José Melo de Almeida,
José Carlos Moreira Bermudez,
Ricardo Augusto Borsoi
Abstract:
Hyperspectral image (HI) analysis approaches have recently become increasingly complex and sophisticated. Recently, the combination of spectral-spatial information and superpixel techniques have addressed some hyperspectral data issues, such as the higher spatial variability of spectral signatures and dimensionality of the data. However, most existing superpixel approaches do not account for speci…
▽ More
Hyperspectral image (HI) analysis approaches have recently become increasingly complex and sophisticated. Recently, the combination of spectral-spatial information and superpixel techniques have addressed some hyperspectral data issues, such as the higher spatial variability of spectral signatures and dimensionality of the data. However, most existing superpixel approaches do not account for specific HI characteristics resulting from its high spectral dimension. In this work, we propose a multiscale superpixel method that is computationally efficient for processing hyperspectral data. The Simple Linear Iterative Clustering (SLIC) oversegmentation algorithm, on which the technique is based, has been extended hierarchically. Using a novel robust homogeneity testing, the proposed hierarchical approach leads to superpixels of variable sizes but with higher spectral homogeneity when compared to the classical SLIC segmentation. For validation, the proposed homogeneity-based hierarchical method was applied as a preprocessing step in the spectral unmixing and classification tasks carried out using, respectively, the Multiscale sparse Unmixing Algorithm (MUA) and the CNN-Enhanced Graph Convolutional Network (CEGCN) methods. Simulation results with both synthetic and real data show that the technique is competitive with state-of-the-art solutions.
△ Less
Submitted 21 July, 2024;
originally announced July 2024.
-
FastImpute: A Baseline for Open-source, Reference-Free Genotype Imputation Methods -- A Case Study in PRS313
Authors:
Aaron Ge,
Jeya Balasubramanian,
Xueyao Wu,
Peter Kraft,
Jonas S. Almeida
Abstract:
Genotype imputation enhances genetic data by predicting missing SNPs using reference haplotype information. Traditional methods leverage linkage disequilibrium (LD) to infer untyped SNP genotypes, relying on the similarity of LD structures between genotyped target sets and fully sequenced reference panels. Recently, reference-free deep learning-based methods have emerged, offering a promising alte…
▽ More
Genotype imputation enhances genetic data by predicting missing SNPs using reference haplotype information. Traditional methods leverage linkage disequilibrium (LD) to infer untyped SNP genotypes, relying on the similarity of LD structures between genotyped target sets and fully sequenced reference panels. Recently, reference-free deep learning-based methods have emerged, offering a promising alternative by predicting missing genotypes without external databases, thereby enhancing privacy and accessibility. However, these methods often produce models with tens of millions of parameters, leading to challenges such as the need for substantial computational resources to train and inefficiency for client-sided deployment. Our study addresses these limitations by introducing a baseline for a novel genotype imputation pipeline that supports client-sided imputation models generalizable across any genotyping chip and genomic region. This approach enhances patient privacy by performing imputation directly on edge devices. As a case study, we focus on PRS313, a polygenic risk score comprising 313 SNPs used for breast cancer risk prediction. Utilizing consumer genetic panels such as 23andMe, our model democratizes access to personalized genetic insights by allowing 23andMe users to obtain their PRS313 score. We demonstrate that simple linear regression can significantly improve the accuracy of PRS313 scores when calculated using SNPs imputed from consumer gene panels, such as 23andMe. Our linear regression model achieved an R^2 of 0.86, compared to 0.33 without imputation and 0.28 with simple imputation (substituting missing SNPs with the minor allele frequency). These findings suggest that popular SNP analysis libraries could benefit from integrating linear regression models for genotype imputation, providing a viable and light-weight alternative to reference based imputation.
△ Less
Submitted 12 July, 2024;
originally announced July 2024.
-
Finding Fake News Websites in the Wild
Authors:
Leandro Araujo,
Joao M. M. Couto,
Luiz Felipe Nery,
Isadora C. Rodrigues,
Jussara M. Almeida,
Julio C. S. Reis,
Fabricio Benevenuto
Abstract:
The battle against the spread of misinformation on the Internet is a daunting task faced by modern society. Fake news content is primarily distributed through digital platforms, with websites dedicated to producing and disseminating such content playing a pivotal role in this complex ecosystem. Therefore, these websites are of great interest to misinformation researchers. However, obtaining a comp…
▽ More
The battle against the spread of misinformation on the Internet is a daunting task faced by modern society. Fake news content is primarily distributed through digital platforms, with websites dedicated to producing and disseminating such content playing a pivotal role in this complex ecosystem. Therefore, these websites are of great interest to misinformation researchers. However, obtaining a comprehensive list of websites labeled as producers and/or spreaders of misinformation can be challenging, particularly in developing countries. In this study, we propose a novel methodology for identifying websites responsible for creating and disseminating misinformation content, which are closely linked to users who share confirmed instances of fake news on social media. We validate our approach on Twitter by examining various execution modes and contexts. Our findings demonstrate the effectiveness of the proposed methodology in identifying misinformation websites, which can aid in gaining a better understanding of this phenomenon and enabling competent entities to tackle the problem in various areas of society.
△ Less
Submitted 15 July, 2024; v1 submitted 9 July, 2024;
originally announced July 2024.
-
Transferable-guided Attention Is All You Need for Video Domain Adaptation
Authors:
André Sacilotti,
Samuel Felipe dos Santos,
Nicu Sebe,
Jurandy Almeida
Abstract:
Unsupervised domain adaptation (UDA) in videos is a challenging task that remains not well explored compared to image-based UDA techniques. Although vision transformers (ViT) achieve state-of-the-art performance in many computer vision tasks, their use in video UDA has been little explored. Our key idea is to use transformer layers as a feature encoder and incorporate spatial and temporal transfer…
▽ More
Unsupervised domain adaptation (UDA) in videos is a challenging task that remains not well explored compared to image-based UDA techniques. Although vision transformers (ViT) achieve state-of-the-art performance in many computer vision tasks, their use in video UDA has been little explored. Our key idea is to use transformer layers as a feature encoder and incorporate spatial and temporal transferability relationships into the attention mechanism. A Transferable-guided Attention (TransferAttn) framework is then developed to exploit the capacity of the transformer to adapt cross-domain knowledge across different backbones. To improve the transferability of ViT, we introduce a novel and effective module, named Domain Transferable-guided Attention Block (DTAB). DTAB compels ViT to focus on the spatio-temporal transferability relationship among video frames by changing the self-attention mechanism to a transferability attention mechanism. Extensive experiments were conducted on UCF-HMDB, Kinetics-Gameplay, and Kinetics-NEC Drone datasets, with different backbones, like ResNet101, I3D, and STAM, to verify the effectiveness of TransferAttn compared with state-of-the-art approaches. Also, we demonstrate that DTAB yields performance gains when applied to other state-of-the-art transformer-based UDA methods from both video and image domains. Our code is available at https://github.com/Andre-Sacilotti/transferattn-project-code.
△ Less
Submitted 17 September, 2024; v1 submitted 1 July, 2024;
originally announced July 2024.
-
Einasto gravitational potentials have difficulty to hold spherically symmetric stellar systems with cores
Authors:
Jorge Sanchez Almeida
Abstract:
It was known that an ideal spherically symmetric stellar system with isotropic velocities and an inner core cannot reside in a Navarro, Frenk, and White (NFW) gravitational potential. The incompatibility can be pinned down to the radial gradient of the NFW potential in the very center of the system, which differs from zero. The gradient is identically zero in an Einasto potential, also an alternat…
▽ More
It was known that an ideal spherically symmetric stellar system with isotropic velocities and an inner core cannot reside in a Navarro, Frenk, and White (NFW) gravitational potential. The incompatibility can be pinned down to the radial gradient of the NFW potential in the very center of the system, which differs from zero. The gradient is identically zero in an Einasto potential, also an alternative representation of the dark matter (DM) halos created by the kind of cold DM (CDM) defining the current cosmological model. Here we show that, despite the inner gradient being zero, stellar cores are also inconsistent with Einasto potentials. This result may have implications to constrain the nature of DM through interpreting the stellar cores often observed in dwarf galaxies.
△ Less
Submitted 19 June, 2024;
originally announced June 2024.
-
Using graph neural networks to reconstruct charged pion showers in the CMS High Granularity Calorimeter
Authors:
M. Aamir,
B. Acar,
G. Adamov,
T. Adams,
C. Adloff,
S. Afanasiev,
C. Agrawal,
C. Agrawal,
A. Ahmad,
H. A. Ahmed,
S. Akbar,
N. Akchurin,
B. Akgul,
B. Akgun,
R. O. Akpinar,
E. Aktas,
A. AlKadhim,
V. Alexakhin,
J. Alimena,
J. Alison,
A. Alpana,
W. Alshehri,
P. Alvarez Dominguez,
M. Alyari,
C. Amendola
, et al. (550 additional authors not shown)
Abstract:
A novel method to reconstruct the energy of hadronic showers in the CMS High Granularity Calorimeter (HGCAL) is presented. The HGCAL is a sampling calorimeter with very fine transverse and longitudinal granularity. The active media are silicon sensors and scintillator tiles readout by SiPMs and the absorbers are a combination of lead and Cu/CuW in the electromagnetic section, and steel in the hadr…
▽ More
A novel method to reconstruct the energy of hadronic showers in the CMS High Granularity Calorimeter (HGCAL) is presented. The HGCAL is a sampling calorimeter with very fine transverse and longitudinal granularity. The active media are silicon sensors and scintillator tiles readout by SiPMs and the absorbers are a combination of lead and Cu/CuW in the electromagnetic section, and steel in the hadronic section. The shower reconstruction method is based on graph neural networks and it makes use of a dynamic reduction network architecture. It is shown that the algorithm is able to capture and mitigate the main effects that normally hinder the reconstruction of hadronic showers using classical reconstruction methods, by compensating for fluctuations in the multiplicity, energy, and spatial distributions of the shower's constituents. The performance of the algorithm is evaluated using test beam data collected in 2018 prototype of the CMS HGCAL accompanied by a section of the CALICE AHCAL prototype. The capability of the method to mitigate the impact of energy leakage from the calorimeter is also demonstrated.
△ Less
Submitted 30 June, 2024; v1 submitted 17 June, 2024;
originally announced June 2024.
-
Deep HST imaging favors the bulgeless edge-on galaxy explanation for the hypothetical stellar wake created by a runaway supermassive black hole
Authors:
Mireia Montes,
Jorge Sánchez Almeida,
Ignacio Trujillo
Abstract:
A long linear structure recently discovered could be the stellar wake produced by the passage of a runaway supermassive black hole (SMBH) or, alternatively, a bulgeless edge-on galaxy. We report on new very deep HST imaging that seems to be in tension with the SMBH runaway scenario but is consistent with the bulgeless edge-on galaxy scenario. The new observations were aimed at detecting two key fe…
▽ More
A long linear structure recently discovered could be the stellar wake produced by the passage of a runaway supermassive black hole (SMBH) or, alternatively, a bulgeless edge-on galaxy. We report on new very deep HST imaging that seems to be in tension with the SMBH runaway scenario but is consistent with the bulgeless edge-on galaxy scenario. The new observations were aimed at detecting two key features expected in the SMBH scenario, namely, the bow shock formed where the SMBH meets the surrounding medium, and a counter stellar wake created by another binary SMBH hypothesized as part of the ejection mechanism. Neither of these two features appears to be present in the new images, as would be expected in the edge-on galaxy scenario.
△ Less
Submitted 31 May, 2024;
originally announced June 2024.
-
Using Neural Implicit Flow To Represent Latent Dynamics Of Canonical Systems
Authors:
Imran Nasim,
Joaõ Lucas de Sousa Almeida
Abstract:
The recently introduced class of architectures known as Neural Operators has emerged as highly versatile tools applicable to a wide range of tasks in the field of Scientific Machine Learning (SciML), including data representation and forecasting. In this study, we investigate the capabilities of Neural Implicit Flow (NIF), a recently developed mesh-agnostic neural operator, for representing the la…
▽ More
The recently introduced class of architectures known as Neural Operators has emerged as highly versatile tools applicable to a wide range of tasks in the field of Scientific Machine Learning (SciML), including data representation and forecasting. In this study, we investigate the capabilities of Neural Implicit Flow (NIF), a recently developed mesh-agnostic neural operator, for representing the latent dynamics of canonical systems such as the Kuramoto-Sivashinsky (KS), forced Korteweg-de Vries (fKdV), and Sine-Gordon (SG) equations, as well as for extracting dynamically relevant information from them. Finally we assess the applicability of NIF as a dimensionality reduction algorithm and conduct a comparative analysis with another widely recognized family of neural operators, known as Deep Operator Networks (DeepONets).
△ Less
Submitted 26 April, 2024;
originally announced April 2024.
-
High Harmonic Tracking of Ultrafast Electron Dynamics across the Mott to Charge Density Wave Phase Transition
Authors:
Marlena Dziurawiec,
Jessica O. de Almeida,
Mohit Lal Bera,
Marcin Płodzień,
Maciej Lewenstein,
Tobias Grass,
Ravindra W. Chhajlany,
Maciej M. Maśka,
Utso Bhattacharya
Abstract:
Different insulator phases compete with each other in strongly correlated materials with simultaneous local and non-local interactions. It is known that the homogeneous Mott insulator converts into a charge density wave (CDW) phase when the non-local interactions are increased, but there is ongoing debate on whether and in which parameter regimes this transition is of first order, or of second ord…
▽ More
Different insulator phases compete with each other in strongly correlated materials with simultaneous local and non-local interactions. It is known that the homogeneous Mott insulator converts into a charge density wave (CDW) phase when the non-local interactions are increased, but there is ongoing debate on whether and in which parameter regimes this transition is of first order, or of second order with an intermediate bond-order wave phase. Here we show that strong-field optics applied to an extended Fermi-Hubbard system can serve as a powerful tool to reveal the nature of the quantum phase transition. Specifically, we show that in the strongly interacting regime characteristic excitations such as excitons, biexcitons, excitonic strings, and charge droplets can be tracked by the non-linear optical response to an ultrafast and intense laser pulse. Subcycle analysis of high harmonic spectra unravels the ultrafast dynamics of these increasingly complex objects, which partially escape the scrutiny of linear optics. Their appearance in the high harmonic spectrum provides striking evidence of a first-order transition into the CDW phase, and makes a strong case for using strong-field optics as a powerful tool to reveal the nature of quasiparticles in strongly correlated matter, and to track the electron dynamics during a first-order quantum phase transition.
△ Less
Submitted 22 April, 2024;
originally announced April 2024.
-
ClusterRadar: an Interactive Web-Tool for the Multi-Method Exploration of Spatial Clusters Over Time
Authors:
Lee Mason,
Blánaid Hicks,
Jonas S. Almeida
Abstract:
Spatial cluster analysis, the detection of localized patterns of similarity in geospatial data, has a wide-range of applications for scientific discovery and practical decision making. One way to detect spatial clusters is by using local indicators of spatial association, such as Local Moran's I or Getis-Ord Gi*. However, different indicators tend to produce substantially different results due to…
▽ More
Spatial cluster analysis, the detection of localized patterns of similarity in geospatial data, has a wide-range of applications for scientific discovery and practical decision making. One way to detect spatial clusters is by using local indicators of spatial association, such as Local Moran's I or Getis-Ord Gi*. However, different indicators tend to produce substantially different results due to their distinct operational characteristics. Choosing a suitable method or comparing results from multiple methods is a complex task. Furthermore, spatial clusters are dynamic and it is often useful to track their evolution over time, which adds an additional layer of complexity. ClusterRadar is a web-tool designed to address these analytical challenges. The tool allows users to easily perform spatial clustering and analyze the results in an interactive environment, uniquely prioritizing temporal analysis and the comparison of multiple methods. The tool's interactive dashboard presents several visualizations, each offering a distinct perspective of the temporal and methodological aspects of the spatial clustering results. ClusterRadar has several features designed to maximize its utility to a broad user-base, including support for various geospatial formats, and a fully in-browser execution environment to preserve the privacy of sensitive data. Feedback from a varied set of researchers suggests ClusterRadar's potential for enhancing the temporal analysis of spatial clusters.
△ Less
Submitted 8 April, 2024;
originally announced April 2024.
-
Finding Regions of Interest in Whole Slide Images Using Multiple Instance Learning
Authors:
Martim Afonso,
Praphulla M. S. Bhawsar,
Monjoy Saha,
Jonas S. Almeida,
Arlindo L. Oliveira
Abstract:
Whole Slide Images (WSI), obtained by high-resolution digital scanning of microscope slides at multiple scales, are the cornerstone of modern Digital Pathology. However, they represent a particular challenge to AI-based/AI-mediated analysis because pathology labeling is typically done at slide-level, instead of tile-level. It is not just that medical diagnostics is recorded at the specimen level,…
▽ More
Whole Slide Images (WSI), obtained by high-resolution digital scanning of microscope slides at multiple scales, are the cornerstone of modern Digital Pathology. However, they represent a particular challenge to AI-based/AI-mediated analysis because pathology labeling is typically done at slide-level, instead of tile-level. It is not just that medical diagnostics is recorded at the specimen level, the detection of oncogene mutation is also experimentally obtained, and recorded by initiatives like The Cancer Genome Atlas (TCGA), at the slide level. This configures a dual challenge: a) accurately predicting the overall cancer phenotype and b) finding out what cellular morphologies are associated with it at the tile level. To address these challenges, a weakly supervised Multiple Instance Learning (MIL) approach was explored for two prevalent cancer types, Invasive Breast Carcinoma (TCGA-BRCA) and Lung Squamous Cell Carcinoma (TCGA-LUSC). This approach was explored for tumor detection at low magnification levels and TP53 mutations at various levels. Our results show that a novel additive implementation of MIL matched the performance of reference implementation (AUC 0.96), and was only slightly outperformed by Attention MIL (AUC 0.97). More interestingly from the perspective of the molecular pathologist, these different AI architectures identify distinct sensitivities to morphological features (through the detection of Regions of Interest, RoI) at different amplification levels. Tellingly, TP53 mutation was most sensitive to features at the higher applications where cellular morphology is resolved.
△ Less
Submitted 11 April, 2024; v1 submitted 1 April, 2024;
originally announced April 2024.
-
Multifield curved solid: Early dark energy and perturbation instabilities
Authors:
Juan P. Beltrán Almeida,
Alejandro Guarnizo,
Thiago S. Pereira,
César A. Valenzuela-Toledo
Abstract:
We introduce a multifield dark energy model with a nonflat field-space metric, in which one field is dynamical while the others have constant spatial gradients. The model is predictive at the background level, leading to an early dark energy component at high redshifts and a suppressed fraction of late-time anisotropy. Both features have simple expressions in terms of the curvature scale of the fi…
▽ More
We introduce a multifield dark energy model with a nonflat field-space metric, in which one field is dynamical while the others have constant spatial gradients. The model is predictive at the background level, leading to an early dark energy component at high redshifts and a suppressed fraction of late-time anisotropy. Both features have simple expressions in terms of the curvature scale of the field-space, and correspond to stable points in the phase space of possible solutions. Because of the coupling between time and space-dependent scalar fields, vector field perturbations develop tachyonic instabilities at scales below the Hubble radius, thus being potentially observable in the number count of galaxies. Overall, the presence of a nontrivial field-space curvature also leads to the appearance of instabilities on scalar perturbations, which can impact the matter density distribution at large scales.
△ Less
Submitted 30 August, 2024; v1 submitted 21 February, 2024;
originally announced February 2024.
-
More fundamental than the fundamental metallicity relation: The effect of the stellar metallicity on the gas-phase mass-metallicity and gravitational potential-metallicity relations
Authors:
Laura Sánchez-Menguiano,
Sebastián F. Sánchez,
Jorge Sánchez Almeida,
Casiana Muñoz-Tuñón
Abstract:
Context. One of the most fundamental scaling relations in galaxies is observed between metallicity and stellar mass -- the mass-metallicity relation (MZR) -- although recently a stronger dependence of the gas-phase metallicity with the galactic gravitational potential ($Φ\rm ZR$) has been reported. Further dependences of metallicity on other galaxy properties have been revealed, with the star form…
▽ More
Context. One of the most fundamental scaling relations in galaxies is observed between metallicity and stellar mass -- the mass-metallicity relation (MZR) -- although recently a stronger dependence of the gas-phase metallicity with the galactic gravitational potential ($Φ\rm ZR$) has been reported. Further dependences of metallicity on other galaxy properties have been revealed, with the star formation rate (SFR) being one of the most studied and debated secondary parameters in the relation (the so-called fundamental metallicity relation). Aims. In this work we explore the dependence of the gas-phase metallicity residuals from the MZR and $Φ\rm ZR$ on different galaxy properties in the search for the most fundamental scaling relation in galaxies. Methods. We applied a random forest regressor algorithm on a sample of 3430 nearby star-forming galaxies from the SDSS-IV MaNGA survey. Using this technique, we explored the effect of 147 additional parameters on the global oxygen abundance residuals obtained after subtracting the MZR. Alternatively, we followed a similar approach with the metallicity residuals from the $Φ\rm ZR$. Results. The stellar metallicity of the galaxy is revealed as the secondary parameter in both the MZR and the $Φ\rm ZR$, ahead of the SFR. This parameter reduces the scatter in the relations $\sim 10-15\%$. We find the 3D relation between gravitational potential, gas metallicity, and stellar metallicity to be the most fundamental metallicity relation observed in galaxies.
△ Less
Submitted 2 February, 2024;
originally announced February 2024.
-
Enabling Seamless Data Security, Consensus, and Trading in Vehicular Networks
Authors:
Emanuel Vieira,
João Almeida,
Joaquim Ferreira,
Paulo C. Bartolomeu
Abstract:
Cooperative driving is an emerging paradigm to enhance the safety and efficiency of autonomous vehicles. To ensure successful cooperation, road users must reach a consensus for making collective decisions, while recording vehicular data to analyze and address failures related to such agreements. This data has the potential to provide valuable insights into various vehicular events, while also pote…
▽ More
Cooperative driving is an emerging paradigm to enhance the safety and efficiency of autonomous vehicles. To ensure successful cooperation, road users must reach a consensus for making collective decisions, while recording vehicular data to analyze and address failures related to such agreements. This data has the potential to provide valuable insights into various vehicular events, while also potentially improving accountability measures. Furthermore, vehicles may benefit from the ability to negotiate and trade services among themselves, adding value to the cooperative driving framework. However, the majority of proposed systems aiming to ensure data security, consensus, or service trading, lack efficient and thoroughly validated mechanisms that consider the distinctive characteristics of vehicular networks. These limitations are amplified by a dependency on the centralized support provided by the infrastructure. Furthermore, corresponding mechanisms must diligently address security concerns, especially regarding potential malicious or misbehaving nodes, while also considering inherent constraints of the wireless medium. We introduce the Verifiable Event Extension (VEE), an applicational extension designed for Intelligent Transportation System (ITS) messages. The VEE operates seamlessly with any existing standardized vehicular communications protocol, addressing crucial aspects of data security, consensus, and trading with minimal overhead. To achieve this, we employ blockchain techniques, Byzantine fault tolerance (BFT) consensus protocols, and cryptocurrency-based mechanics. To assess our proposal's feasibility and lightweight nature, we employed a hardware-in-the-loop setup for analysis. Experimental results demonstrate the viability and efficiency of the VEE extension in overcoming the challenges posed by the distributed and opportunistic nature of wireless vehicular communications.
△ Less
Submitted 24 January, 2024;
originally announced January 2024.
-
A Generalized Multiscale Bundle-Based Hyperspectral Sparse Unmixing Algorithm
Authors:
Luciano Carvalho Ayres,
Ricardo Augusto Borsoi,
José Carlos Moreira Bermudez,
Sérgio José Melo de Almeida
Abstract:
In hyperspectral sparse unmixing, a successful approach employs spectral bundles to address the variability of the endmembers in the spatial domain. However, the regularization penalties usually employed aggregate substantial computational complexity, and the solutions are very noise-sensitive. We generalize a multiscale spatial regularization approach to solve the unmixing problem by incorporatin…
▽ More
In hyperspectral sparse unmixing, a successful approach employs spectral bundles to address the variability of the endmembers in the spatial domain. However, the regularization penalties usually employed aggregate substantial computational complexity, and the solutions are very noise-sensitive. We generalize a multiscale spatial regularization approach to solve the unmixing problem by incorporating group sparsity-inducing mixed norms. Then, we propose a noise-robust method that can take advantage of the bundle structure to deal with endmember variability while ensuring inter- and intra-class sparsity in abundance estimation with reasonable computational cost. We also present a general heuristic to select the \emph{most representative} abundance estimation over multiple runs of the unmixing process, yielding a solution that is robust and highly reproducible. Experiments illustrate the robustness and consistency of the results when compared to related methods.
△ Less
Submitted 23 January, 2024;
originally announced January 2024.
-
Benchmarking Evolutionary Community Detection Algorithms in Dynamic Networks
Authors:
Giordano Paoletti,
Luca Gioacchini,
Marco Mellia,
Luca Vassio,
Jussara M. Almeida
Abstract:
In dynamic complex networks, entities interact and form network communities that evolve over time. Among the many static Community Detection (CD) solutions, the modularity-based Louvain, or Greedy Modularity Algorithm (GMA), is widely employed in real-world applications due to its intuitiveness and scalability. Nevertheless, addressing CD in dynamic graphs remains an open problem, since the evolut…
▽ More
In dynamic complex networks, entities interact and form network communities that evolve over time. Among the many static Community Detection (CD) solutions, the modularity-based Louvain, or Greedy Modularity Algorithm (GMA), is widely employed in real-world applications due to its intuitiveness and scalability. Nevertheless, addressing CD in dynamic graphs remains an open problem, since the evolution of the network connections may poison the identification of communities, which may be evolving at a slower pace. Hence, naively applying GMA to successive network snapshots may lead to temporal inconsistencies in the communities. Two evolutionary adaptations of GMA, sGMA and $α$GMA, have been proposed to tackle this problem. Yet, evaluating the performance of these methods and understanding to which scenarios each one is better suited is challenging because of the lack of a comprehensive set of metrics and a consistent ground truth. To address these challenges, we propose (i) a benchmarking framework for evolutionary CD algorithms in dynamic networks and (ii) a generalised modularity-based approach (NeGMA). Our framework allows us to generate synthetic community-structured graphs and design evolving scenarios with nine basic graph transformations occurring at different rates. We evaluate performance through three metrics we define, i.e. Correctness, Delay, and Stability. Our findings reveal that $α$GMA is well-suited for detecting intermittent transformations, but struggles with abrupt changes; sGMA achieves superior stability, but fails to detect emerging communities; and NeGMA appears a well-balanced solution, excelling in responsiveness and instantaneous transformations detection.
△ Less
Submitted 11 January, 2024; v1 submitted 21 December, 2023;
originally announced December 2023.
-
Testing the Segment Anything Model on radiology data
Authors:
José Guilherme de Almeida,
Nuno M. Rodrigues,
Sara Silva,
Nickolas Papanikolaou
Abstract:
Deep learning models trained with large amounts of data have become a recent and effective approach to predictive problem solving -- these have become known as "foundation models" as they can be used as fundamental tools for other applications. While the paramount examples of image classification (earlier) and large language models (more recently) led the way, the Segment Anything Model (SAM) was…
▽ More
Deep learning models trained with large amounts of data have become a recent and effective approach to predictive problem solving -- these have become known as "foundation models" as they can be used as fundamental tools for other applications. While the paramount examples of image classification (earlier) and large language models (more recently) led the way, the Segment Anything Model (SAM) was recently proposed and stands as the first foundation model for image segmentation, trained on over 10 million images and with recourse to over 1 billion masks. However, the question remains -- what are the limits of this foundation? Given that magnetic resonance imaging (MRI) stands as an important method of diagnosis, we sought to understand whether SAM could be used for a few tasks of zero-shot segmentation using MRI data. Particularly, we wanted to know if selecting masks from the pool of SAM predictions could lead to good segmentations.
Here, we provide a critical assessment of the performance of SAM on magnetic resonance imaging data. We show that, while acceptable in a very limited set of cases, the overall trend implies that these models are insufficient for MRI segmentation across the whole volume, but can provide good segmentations in a few, specific slices. More importantly, we note that while foundation models trained on natural images are set to become key aspects of predictive modelling, they may prove ineffective when used on other imaging modalities.
△ Less
Submitted 16 May, 2024; v1 submitted 20 December, 2023;
originally announced December 2023.
-
Stellar mass is not the best predictor of galaxy metallicity. The gravitational potential-metallicity relation $Φ\rm ZR$
Authors:
Laura Sánchez-Menguiano,
Jorge Sánchez Almeida,
Sebastián F. Sánchez,
Casiana Muñoz-Tuñón
Abstract:
Interpreting the scaling relations followed by galaxies is a fundamental tool for assessing how well we understand galaxy formation and evolution. Several scaling relations involving the galaxy metallicity have been discovered through the years, the foremost of which is the scaling with stellar mass. This so-called mass-metallicity relation is thought to be fundamental and has been subject to many…
▽ More
Interpreting the scaling relations followed by galaxies is a fundamental tool for assessing how well we understand galaxy formation and evolution. Several scaling relations involving the galaxy metallicity have been discovered through the years, the foremost of which is the scaling with stellar mass. This so-called mass-metallicity relation is thought to be fundamental and has been subject to many studies in the literature. We study the dependence of the gas-phase metallicity on many different galaxy properties to assess which of them determines the metallicity of a galaxy. We applied a random forest regressor algorithm on a sample of more than 3000 nearby galaxies from the SDSS-IV MaNGA survey. Using this machine-learning technique, we explored the effect of 148 parameters on the global oxygen abundance as an indicator of the gas metallicity. $M_{\rm \star}$/$R_e$, as a proxy for the baryonic gravitational potential of the galaxy, is found to be the primary factor determining the average gas-phase metallicity of the galaxy ($Z_g$). It outweighs stellar mass. A subsequent analysis provides the strongest dependence of $Z_g$ on $M_\star / R_e^{\,0.6}$. We argue that this parameter traces the total gravitational potential, and the exponent $α\simeq 0.6$ accounts for the inclusion of the dark matter component. Our results reveal the importance of the relation between the total gravitational potential of the galaxy and the gas metallicity. This relation is tighter and likely more primordial than the widely known mass-metallicity relation.
△ Less
Submitted 4 December, 2023;
originally announced December 2023.
-
The WHaD diagram: Classifying the ionizing source with one single emission line
Authors:
S. F. Sánchez,
A. Z. Lugo-Aranda,
J. Sánchez Almeida,
J. K. Barrera-Ballesteros,
O. Gonzalez-Martín,
S. Salim,
C. J. Agostino6
Abstract:
The usual approach to classify the ionizing source using optical spectroscopy is based on the use of diagnostic diagrams that compares the relative strength of pairs of collisitional metallic lines (e.g., [O iii] and [N ii]) with respect to recombination hydrogen lines (e.g., Hβ and Hα). Despite of being accepted as the standard procedure, it present known problems, including confusion regimes and…
▽ More
The usual approach to classify the ionizing source using optical spectroscopy is based on the use of diagnostic diagrams that compares the relative strength of pairs of collisitional metallic lines (e.g., [O iii] and [N ii]) with respect to recombination hydrogen lines (e.g., Hβ and Hα). Despite of being accepted as the standard procedure, it present known problems, including confusion regimes and/or limitations related to the required signal-to-noise of the involved emission lines. These problems affect not only our intrinsic understanding of inter-stellar medium and its poroperties, but also fundamental galaxy properties, such as the star-formation rate and the oxygen abundance, and key questions just as the fraction of active galactic nuclei, among several others. We explore the existing alternatives in the literature to minimize the confusion among different ionizing sources and proposed a new simple diagram that uses the equivalent width and the velocity dispersion from one single emission line, Hα, to classify the ionizing sources. We use aperture limited and spatial resolved spectroscopic data in the nearby Universe (z{\sim}0.01) to demonstrate that the new diagram, that we called WHaD, segregates the different ionizing sources in a more efficient way that previously adopted procedures. A new set of regions are defined in this diagram to select betweeen different ionizing sources. The new proposed diagram is well placed to determine the ionizing source when only Hα is available, or when the signal-to-noise of the emission lines involved in the classical diagnostic diagrams (e.g., Hβ).
△ Less
Submitted 17 November, 2023;
originally announced November 2023.
-
A hybrid meta-heuristic for the generation of feasible large-scale course timetables using instance decomposition
Authors:
João Almeida,
José Rui Figueira,
Alexandre P. Francisco,
Daniel Santos
Abstract:
This study introduces a hybrid meta-heuristic for generating feasible course timetables in large-scale scenarios. We conducted tests using our university's instances. The current commercial software often struggles to meet constraints and takes hours to find satisfactory solutions. Our methodology combines adaptive large neighbourhood search, guided local search, variable neighbourhood search, and…
▽ More
This study introduces a hybrid meta-heuristic for generating feasible course timetables in large-scale scenarios. We conducted tests using our university's instances. The current commercial software often struggles to meet constraints and takes hours to find satisfactory solutions. Our methodology combines adaptive large neighbourhood search, guided local search, variable neighbourhood search, and an innovative instance decomposition technique. Constraint violations from various groups are treated as objective functions to minimize. The search focuses on time slots with the most violations, and if no improvements are observed after a certain number of iterations, the most challenging constraint groups receive new weights to guide the search towards non-dominated solutions, even if the total sum of violations increases. In cases where this approach fails, a shaking phase is employed. The decomposition mechanism works by iteratively introducing curricula to the problem and finding new feasible solutions while considering an expanding set of lectures. Assignments from each iteration can be adjusted in subsequent iterations. Our methodology is tested on real-world instances from our university and random subdivisions. For subdivisions with 400 curricula timetables, decomposition reduced solution times by up to 27%. In real-world instances with 1,288 curricula timetables, the reduction was 18%. Clustering curricula with more common lectures and professors during increments improved solution times by 18% compared to random increments. Using our methodology, viable solutions for real-world instances are found in an average of 21 minutes, whereas the commercial software takes several hours.
△ Less
Submitted 31 October, 2023;
originally announced October 2023.
-
Wasm-iCARE: a portable and privacy-preserving web module to build, validate, and apply absolute risk models
Authors:
Jeya Balaji Balasubramanian,
Parichoy Pal Choudhury,
Srijon Mukhopadhyay,
Thomas Ahearn,
Nilanjan Chatterjee,
Montserrat García-Closas,
Jonas S. Almeida
Abstract:
Objective: Absolute risk models estimate an individual's future disease risk over a specified time interval. Applications utilizing server-side risk tooling, such as the R-based iCARE (R-iCARE), to build, validate, and apply absolute risk models, face serious limitations in portability and privacy due to their need for circulating user data in remote servers for operation. Our objective was to ove…
▽ More
Objective: Absolute risk models estimate an individual's future disease risk over a specified time interval. Applications utilizing server-side risk tooling, such as the R-based iCARE (R-iCARE), to build, validate, and apply absolute risk models, face serious limitations in portability and privacy due to their need for circulating user data in remote servers for operation. Our objective was to overcome these limitations.
Materials and Methods: We refactored R-iCARE into a Python package (Py-iCARE) then compiled it to WebAssembly (Wasm-iCARE): a portable web module, which operates entirely within the privacy of the user's device.
Results: We showcase the portability and privacy of Wasm-iCARE through two applications: for researchers to statistically validate risk models, and to deliver them to end-users. Both applications run entirely on the client-side, requiring no downloads or installations, and keeps user data on-device during risk calculation.
Conclusions: Wasm-iCARE fosters accessible and privacy-preserving risk tools, accelerating their validation and delivery.
△ Less
Submitted 13 October, 2023;
originally announced October 2023.
-
Unraveling Multifractality and Mobility Edges in Quasiperiodic Aubry-André-Harper Chains through High-Harmonic Generation
Authors:
Marlena Dziurawiec,
Jessica O. de Almeida,
Mohit Lal Bera,
Marcin Płodzień,
Maciej M. Maśka,
Maciej Lewenstein,
Tobias Grass,
Utso Bhattacharya
Abstract:
Quasicrystals are fascinating and important because of their unconventional atomic arrangements, which challenge traditional notions of crystalline structures. Unlike regular crystals, they lack translational symmetry and generate unique mechanical, thermal, and electrical properties, holding promise for numerous applications. In order to probe the electronic properties of quasicrystals, tools bey…
▽ More
Quasicrystals are fascinating and important because of their unconventional atomic arrangements, which challenge traditional notions of crystalline structures. Unlike regular crystals, they lack translational symmetry and generate unique mechanical, thermal, and electrical properties, holding promise for numerous applications. In order to probe the electronic properties of quasicrystals, tools beyond linear response transport measurements are needed, since all spectral regions can be affected by the non-periodic geometry. Here we show that high-harmonic spectroscopy offers an advanced avenue for this goal. Focusing on the quasiperiodic 1D Aubry-André-Harper (AAH) model, we leverage high-harmonic spectroscopy to delve into their intricate characteristics: By carefully analyzing emitted harmonic intensities, we extract the multifractal spectrum -- an essential indicator of the spatial distribution of electronic states in quasicrystals. Additionally, we address the detection of mobility edges, vital energy thresholds that demarcate localized and extended eigenstates within generalized AAH models. The precise identification of these mobility edges sheds light on the metal-insulator transition and the behavior of electronic states near these boundaries. Merging high-harmonic spectroscopy with the AAH model provides a powerful framework for understanding the interplay between localization and extended states in quasicrystals for an extremely wide energy range not captured within linear response studies, thereby offering valuable insights for guiding future experimental investigations.
△ Less
Submitted 4 October, 2023;
originally announced October 2023.
-
Budget-Aware Pruning: Handling Multiple Domains with Less Parameters
Authors:
Samuel Felipe dos Santos,
Rodrigo Berriel,
Thiago Oliveira-Santos,
Nicu Sebe,
Jurandy Almeida
Abstract:
Deep learning has achieved state-of-the-art performance on several computer vision tasks and domains. Nevertheless, it still has a high computational cost and demands a significant amount of parameters. Such requirements hinder the use in resource-limited environments and demand both software and hardware optimization. Another limitation is that deep models are usually specialized into a single do…
▽ More
Deep learning has achieved state-of-the-art performance on several computer vision tasks and domains. Nevertheless, it still has a high computational cost and demands a significant amount of parameters. Such requirements hinder the use in resource-limited environments and demand both software and hardware optimization. Another limitation is that deep models are usually specialized into a single domain or task, requiring them to learn and store new parameters for each new one. Multi-Domain Learning (MDL) attempts to solve this problem by learning a single model capable of performing well in multiple domains. Nevertheless, the models are usually larger than the baseline for a single domain. This work tackles both of these problems: our objective is to prune models capable of handling multiple domains according to a user-defined budget, making them more computationally affordable while keeping a similar classification performance. We achieve this by encouraging all domains to use a similar subset of filters from the baseline model, up to the amount defined by the user's budget. Then, filters that are not used by any domain are pruned from the network. The proposed approach innovates by better adapting to resource-limited devices while being one of the few works that handles multiple domains at test time with fewer parameters and lower computational complexity than the baseline model for a single domain.
△ Less
Submitted 3 July, 2024; v1 submitted 20 September, 2023;
originally announced September 2023.
-
CNNs for JPEGs: A Study in Computational Cost
Authors:
Samuel Felipe dos Santos,
Nicu Sebe,
Jurandy Almeida
Abstract:
Convolutional neural networks (CNNs) have achieved astonishing advances over the past decade, defining state-of-the-art in several computer vision tasks. CNNs are capable of learning robust representations of the data directly from the RGB pixels. However, most image data are usually available in compressed format, from which the JPEG is the most widely used due to transmission and storage purpose…
▽ More
Convolutional neural networks (CNNs) have achieved astonishing advances over the past decade, defining state-of-the-art in several computer vision tasks. CNNs are capable of learning robust representations of the data directly from the RGB pixels. However, most image data are usually available in compressed format, from which the JPEG is the most widely used due to transmission and storage purposes demanding a preliminary decoding process that have a high computational load and memory usage. For this reason, deep learning methods capable of learning directly from the compressed domain have been gaining attention in recent years. Those methods usually extract a frequency domain representation of the image, like DCT, by a partial decoding, and then make adaptation to typical CNNs architectures to work with them. One limitation of these current works is that, in order to accommodate the frequency domain data, the modifications made to the original model increase significantly their amount of parameters and computational complexity. On one hand, the methods have faster preprocessing, since the cost of fully decoding the images is avoided, but on the other hand, the cost of passing the images though the model is increased, mitigating the possible upside of accelerating the method. In this paper, we propose a further study of the computational cost of deep models designed for the frequency domain, evaluating the cost of decoding and passing the images through the network. We also propose handcrafted and data-driven techniques for reducing the computational complexity and the number of parameters for these models in order to keep them similar to their RGB baselines, leading to efficient models with a better trade off between computational cost and accuracy.
△ Less
Submitted 22 September, 2023; v1 submitted 20 September, 2023;
originally announced September 2023.
-
Tightening Classification Boundaries in Open Set Domain Adaptation through Unknown Exploitation
Authors:
Lucas Fernando Alvarenga e Silva,
Nicu Sebe,
Jurandy Almeida
Abstract:
Convolutional Neural Networks (CNNs) have brought revolutionary advances to many research areas due to their capacity of learning from raw data. However, when those methods are applied to non-controllable environments, many different factors can degrade the model's expected performance, such as unlabeled datasets with different levels of domain shift and category shift. Particularly, when both iss…
▽ More
Convolutional Neural Networks (CNNs) have brought revolutionary advances to many research areas due to their capacity of learning from raw data. However, when those methods are applied to non-controllable environments, many different factors can degrade the model's expected performance, such as unlabeled datasets with different levels of domain shift and category shift. Particularly, when both issues occur at the same time, we tackle this challenging setup as Open Set Domain Adaptation (OSDA) problem. In general, existing OSDA approaches focus their efforts only on aligning known classes or, if they already extract possible negative instances, use them as a new category learned with supervision during the course of training. We propose a novel way to improve OSDA approaches by extracting a high-confidence set of unknown instances and using it as a hard constraint to tighten the classification boundaries of OSDA methods. Especially, we adopt a new loss constraint evaluated in three different means, (1) directly with the pristine negative instances; (2) with randomly transformed negatives using data augmentation techniques; and (3) with synthetically generated negatives containing adversarial features. We assessed all approaches in an extensive set of experiments based on OVANet, where we could observe consistent improvements for two public benchmarks, the Office-31 and Office-Home datasets, yielding absolute gains of up to 1.3% for both Accuracy and H-Score on Office-31 and 5.8% for Accuracy and 4.7% for H-Score on Office-Home.
△ Less
Submitted 16 September, 2023;
originally announced September 2023.
-
Supermassive black hole wake or bulgeless edge-on galaxy? II: Order-of-magnitude analysis of the two physical scenarios
Authors:
J. Sanchez Almeida
Abstract:
-- Context. A recently discovered thin long object aligned with a nearby galaxy could be the stellar wake induced by the passage of a supermassive black hole (SMBH) kicked out from the nearby galaxy by the slingshot effect of a three-body encounter of SMBHs. Alternatively, the object could be a bulgeless edge-on galaxy coincidentally aligned with a second nearby companion. In contrast with the la…
▽ More
-- Context. A recently discovered thin long object aligned with a nearby galaxy could be the stellar wake induced by the passage of a supermassive black hole (SMBH) kicked out from the nearby galaxy by the slingshot effect of a three-body encounter of SMBHs. Alternatively, the object could be a bulgeless edge-on galaxy coincidentally aligned with a second nearby companion. In contrast with the latter, the SMBH interpretation requires a number of unlikely events to happen simultaneously. -- Aims. We aim to assign a probability of occurrence to the two competing scenarios. -- Methods. The probability that the SMBH passage leaves a trace of stars is factorized as the product of the probabilities of all the independent events required for this to happen (PSMBH). Then, each factor is estimated individually. The same exercise is repeated with the edge-on galaxy interpretation (Pgalax). -- Results. Our estimate yields log(Pgalax/PSMBH) simeq 11.4 pm 1.6, where the error is evaluated considering that both Pgalax and PSMBH are products of a large number of random independent variables. Based on the estimated probabilities, PSMBH < 6 x 10**-17 and Pgalax > 1.4 x 10**-5, we determined the number of objects to be expected in various existing, ongoing, and forthcoming surveys, as well as among all observable galaxies (i.e., when observing between 10**6 and 2 x 10**12 galaxies). In the edge-on galaxy scenario, there are always objects to be detected, whereas in the SMBH scenario, the expectation is always compatible with zero. -- Conclusions. Despite the appeal of the runaway SMBH explanation, arguments based on the Occam's razor clearly favor the bulgeless edge-on galaxy interpretation. Our work does not rule out the existence of runaway SMBHs leaving stellar trails. It tells that the vD23 object is more likely to be a bulgeless edge-on galaxy.
△ Less
Submitted 5 September, 2023;
originally announced September 2023.
-
Helping Fact-Checkers Identify Fake News Stories Shared through Images on WhatsApp
Authors:
Julio C. S. Reis,
Philipe Melo,
Fabiano Belém,
Fabricio Murai,
Jussara M. Almeida,
Fabricio Benevenuto
Abstract:
WhatsApp has introduced a novel avenue for smartphone users to engage with and disseminate news stories. The convenience of forming interest-based groups and seamlessly sharing content has rendered WhatsApp susceptible to the exploitation of misinformation campaigns. While the process of fact-checking remains a potent tool in identifying fabricated news, its efficacy falters in the face of the unp…
▽ More
WhatsApp has introduced a novel avenue for smartphone users to engage with and disseminate news stories. The convenience of forming interest-based groups and seamlessly sharing content has rendered WhatsApp susceptible to the exploitation of misinformation campaigns. While the process of fact-checking remains a potent tool in identifying fabricated news, its efficacy falters in the face of the unprecedented deluge of information generated on the Internet today. In this work, we explore automatic ranking-based strategies to propose a "fakeness score" model as a means to help fact-checking agencies identify fake news stories shared through images on WhatsApp. Based on the results, we design a tool and integrate it into a real system that has been used extensively for monitoring content during the 2018 Brazilian general election. Our experimental evaluation shows that this tool can reduce by up to 40% the amount of effort required to identify 80% of the fake news in the data when compared to current mechanisms practiced by the fact-checking agencies for the selection of news stories to be checked.
△ Less
Submitted 28 August, 2023;
originally announced August 2023.
-
The Multi-modality Cell Segmentation Challenge: Towards Universal Solutions
Authors:
Jun Ma,
Ronald Xie,
Shamini Ayyadhury,
Cheng Ge,
Anubha Gupta,
Ritu Gupta,
Song Gu,
Yao Zhang,
Gihun Lee,
Joonkee Kim,
Wei Lou,
Haofeng Li,
Eric Upschulte,
Timo Dickscheid,
José Guilherme de Almeida,
Yixin Wang,
Lin Han,
Xin Yang,
Marco Labagnara,
Vojislav Gligorovski,
Maxime Scheder,
Sahand Jamal Rahi,
Carly Kempster,
Alice Pollitt,
Leon Espinosa
, et al. (15 additional authors not shown)
Abstract:
Cell segmentation is a critical step for quantitative single-cell analysis in microscopy images. Existing cell segmentation methods are often tailored to specific modalities or require manual interventions to specify hyper-parameters in different experimental settings. Here, we present a multi-modality cell segmentation benchmark, comprising over 1500 labeled images derived from more than 50 diver…
▽ More
Cell segmentation is a critical step for quantitative single-cell analysis in microscopy images. Existing cell segmentation methods are often tailored to specific modalities or require manual interventions to specify hyper-parameters in different experimental settings. Here, we present a multi-modality cell segmentation benchmark, comprising over 1500 labeled images derived from more than 50 diverse biological experiments. The top participants developed a Transformer-based deep-learning algorithm that not only exceeds existing methods but can also be applied to diverse microscopy images across imaging platforms and tissue types without manual parameter adjustments. This benchmark and the improved algorithm offer promising avenues for more accurate and versatile cell analysis in microscopy imaging.
△ Less
Submitted 1 April, 2024; v1 submitted 10 August, 2023;
originally announced August 2023.
-
mSigSDK -- private, at scale, computation of mutation signatures
Authors:
Aaron Ge,
Yasmmin Côrtes Martins,
Tongwu Zhang,
Kailing Chen,
Maria Teresa Landi,
Brian Park,
Jeya Balasubramanian,
Jonas S Almeida
Abstract:
In our previous work, we demonstrated that it is feasible to perform analysis on mutation signature data without the need for downloads or installations and analyze individual patient data at scale without compromising privacy. Building on this foundation, we developed a Software Development Kit (SDK) called mSigSDK to facilitate the orchestration of distributed data processing workflows and graph…
▽ More
In our previous work, we demonstrated that it is feasible to perform analysis on mutation signature data without the need for downloads or installations and analyze individual patient data at scale without compromising privacy. Building on this foundation, we developed a Software Development Kit (SDK) called mSigSDK to facilitate the orchestration of distributed data processing workflows and graphic visualization of mutational signature analysis results. We strictly adhered to modern web computing standards, particularly the modularization standards set by the ECMAScript ES6 framework (JavaScript modules). Our approach allows for computation to be entirely performed by secure delegation to the computational resources of the user's own machine (in-browser), without any downloads or installations. The mSigSDK was developed primarily as a companion library to the mSig Portal resource of the National Cancer Institute Division of Cancer Epidemiology and Genetics (NIH/NCI/DCEG), with a focus on its FAIR extensibility as components of other researchers' computational constructs. Anticipated extensions include the programmatic operation of other mutation signature API ecosystems such as SIGNAL and COSMIC, advancing towards a data commons for mutational signature research (Grossman et al., 2016).
△ Less
Submitted 19 January, 2024; v1 submitted 5 August, 2023;
originally announced August 2023.
-
Towards Mobility Data Science (Vision Paper)
Authors:
Mohamed Mokbel,
Mahmoud Sakr,
Li Xiong,
Andreas Züfle,
Jussara Almeida,
Taylor Anderson,
Walid Aref,
Gennady Andrienko,
Natalia Andrienko,
Yang Cao,
Sanjay Chawla,
Reynold Cheng,
Panos Chrysanthis,
Xiqi Fei,
Gabriel Ghinita,
Anita Graser,
Dimitrios Gunopulos,
Christian Jensen,
Joon-Seok Kim,
Kyoung-Sook Kim,
Peer Kröger,
John Krumm,
Johannes Lauer,
Amr Magdy,
Mario Nascimento
, et al. (23 additional authors not shown)
Abstract:
Mobility data captures the locations of moving objects such as humans, animals, and cars. With the availability of GPS-equipped mobile devices and other inexpensive location-tracking technologies, mobility data is collected ubiquitously. In recent years, the use of mobility data has demonstrated significant impact in various domains including traffic management, urban planning, and health sciences…
▽ More
Mobility data captures the locations of moving objects such as humans, animals, and cars. With the availability of GPS-equipped mobile devices and other inexpensive location-tracking technologies, mobility data is collected ubiquitously. In recent years, the use of mobility data has demonstrated significant impact in various domains including traffic management, urban planning, and health sciences. In this paper, we present the emerging domain of mobility data science. Towards a unified approach to mobility data science, we envision a pipeline having the following components: mobility data collection, cleaning, analysis, management, and privacy. For each of these components, we explain how mobility data science differs from general data science, we survey the current state of the art and describe open challenges for the research community in the coming years.
△ Less
Submitted 7 March, 2024; v1 submitted 21 June, 2023;
originally announced July 2023.
-
An explainable model to support the decision about the therapy protocol for AML
Authors:
Jade M. Almeida,
Giovanna A. Castro,
João A. Machado-Neto,
Tiago A. Almeida
Abstract:
Acute Myeloid Leukemia (AML) is one of the most aggressive types of hematological neoplasm. To support the specialists' decision about the appropriate therapy, patients with AML receive a prognostic of outcomes according to their cytogenetic and molecular characteristics, often divided into three risk categories: favorable, intermediate, and adverse. However, the current risk classification has kn…
▽ More
Acute Myeloid Leukemia (AML) is one of the most aggressive types of hematological neoplasm. To support the specialists' decision about the appropriate therapy, patients with AML receive a prognostic of outcomes according to their cytogenetic and molecular characteristics, often divided into three risk categories: favorable, intermediate, and adverse. However, the current risk classification has known problems, such as the heterogeneity between patients of the same risk group and no clear definition of the intermediate risk category. Moreover, as most patients with AML receive an intermediate-risk classification, specialists often demand other tests and analyses, leading to delayed treatment and worsening of the patient's clinical condition. This paper presents the data analysis and an explainable machine-learning model to support the decision about the most appropriate therapy protocol according to the patient's survival prediction. In addition to the prediction model being explainable, the results obtained are promising and indicate that it is possible to use it to support the specialists' decisions safely. Most importantly, the findings offered in this study have the potential to open new avenues of research toward better treatments and prognostic markers.
△ Less
Submitted 15 July, 2023; v1 submitted 5 July, 2023;
originally announced July 2023.
-
Can cuspy dark matter dominated halos hold cored stellar mass distributions?
Authors:
Jorge Sanchez Almeida,
Angel R. Plastino,
Ignacio Trujillo
Abstract:
According to the current concordance cosmological model, the dark matter (DM) particles are collision-less and produce self-gravitating structures with a central cusp which, generally, is not observed. The observed density tends to a central plateau or core, explained within the cosmological model through the gravitational feedback of baryons on DM. This mechanism becomes inefficient when decreasi…
▽ More
According to the current concordance cosmological model, the dark matter (DM) particles are collision-less and produce self-gravitating structures with a central cusp which, generally, is not observed. The observed density tends to a central plateau or core, explained within the cosmological model through the gravitational feedback of baryons on DM. This mechanism becomes inefficient when decreasing the galaxy stellar mass so that in the low-mass regime (Mstar << 10**6 Msun) the energy provided by the baryons is insufficient to modify cusps into cores. Thus, if cores exist in these galaxies they have to reflect departures from the collision-less nature of DM. Measuring the DM mass distribution in these faint galaxies is extremely challenging, however, their stellar mass distribution can be characterized through deep photometry. Here we provide a way of using only the stellar mass distribution to constrain the underlying DM distribution. The so-called Eddington inversion method allows us to discard pairs of stellar distributions and DM potentials requiring (unphysical) negative distribution functions in the phase space. In particular, cored stellar density profiles are incompatible with the Navarro, Frenk, and White (NFW) potential expected from collision-less DM if the velocity distribution is isotropic and the system spherically symmetric. Through a case-by-case analysis, we are able to relax these assumptions to consider anisotropic velocity distributions and systems which do not have exact cores. In general, stellar distributions with radially biased orbits are difficult to reconcile with NFW-like potentials, and cores in the baryon distribution tend to require cores in the DM distribution.
△ Less
Submitted 3 July, 2023;
originally announced July 2023.
-
Impact of User Privacy and Mobility on Edge Offloading
Authors:
João Paulo Esper,
Nadjib Achir,
Kleber Vieira Cardoso,
Jussara M. Almeida
Abstract:
Offloading high-demanding applications to the edge provides better quality of experience (QoE) for users with limited hardware devices. However, to maintain a competitive QoE, infrastructure, and service providers must adapt to users' different mobility patterns, which can be challenging, especially for location-based services (LBS). Another issue that needs to be tackled is the increasing demand…
▽ More
Offloading high-demanding applications to the edge provides better quality of experience (QoE) for users with limited hardware devices. However, to maintain a competitive QoE, infrastructure, and service providers must adapt to users' different mobility patterns, which can be challenging, especially for location-based services (LBS). Another issue that needs to be tackled is the increasing demand for user privacy protection. With less (accurate) information regarding user location, preferences, and usage patterns, forecasting the performance of offloading mechanisms becomes even more challenging. This work discusses the impacts of users' privacy and mobility when offloading to the edge. Different privacy and mobility scenarios are simulated and discussed to shed light on the trade-offs (e.g., privacy protection at the cost of increased latency) among privacy protection, mobility, and offloading performance.
△ Less
Submitted 27 June, 2023;
originally announced June 2023.
-
A Machine Learning Pressure Emulator for Hydrogen Embrittlement
Authors:
Minh Triet Chau,
João Lucas de Sousa Almeida,
Elie Alhajjar,
Alberto Costa Nogueira Junior
Abstract:
A recent alternative for hydrogen transportation as a mixture with natural gas is blending it into natural gas pipelines. However, hydrogen embrittlement of material is a major concern for scientists and gas installation designers to avoid process failures. In this paper, we propose a physics-informed machine learning model to predict the gas pressure on the pipes' inner wall. Despite its high-fid…
▽ More
A recent alternative for hydrogen transportation as a mixture with natural gas is blending it into natural gas pipelines. However, hydrogen embrittlement of material is a major concern for scientists and gas installation designers to avoid process failures. In this paper, we propose a physics-informed machine learning model to predict the gas pressure on the pipes' inner wall. Despite its high-fidelity results, the current PDE-based simulators are time- and computationally-demanding. Using simulation data, we train an ML model to predict the pressure on the pipelines' inner walls, which is a first step for pipeline system surveillance. We found that the physics-based method outperformed the purely data-driven method and satisfy the physical constraints of the gas flow system.
△ Less
Submitted 22 June, 2023;
originally announced June 2023.
-
A FAIR platform for reproducing mutational signature detection on tumor sequencing data
Authors:
Aaron Ge,
Tongwu Zhang,
Clara Bodelon,
Montserrat Garcia-Closas,
Jonas Almeida,
Jeya Balasubramanian
Abstract:
This paper presents a portable, privacy-preserving, in-browser platform for the reproducible assessment of mutational signature detection methods from sparse sequencing data generated by targeted gene panels. The platform aims to address the reproducibility challenges in mutational signature research by adhering to the FAIR principles, making it findable, accessible, interoperable, and reusable. O…
▽ More
This paper presents a portable, privacy-preserving, in-browser platform for the reproducible assessment of mutational signature detection methods from sparse sequencing data generated by targeted gene panels. The platform aims to address the reproducibility challenges in mutational signature research by adhering to the FAIR principles, making it findable, accessible, interoperable, and reusable. Our approach focuses on the detection of specific mutational signatures, such as SBS3, which have been linked to specific mutagenic processes. The platform relies on publicly available data, simulation, downsampling techniques, and machine learning algorithms to generate training data and labels and to train and evaluate models. The key achievement of our platform is its transparency, reusability, and privacy preservation, enabling researchers and clinicians to analyze mutational signatures with the guarantee that no data circulates outside the client machine.
△ Less
Submitted 2 June, 2023;
originally announced June 2023.
-
Productive Crop Field Detection: A New Dataset and Deep Learning Benchmark Results
Authors:
Eduardo Nascimento,
John Just,
Jurandy Almeida,
Tiago Almeida
Abstract:
In precision agriculture, detecting productive crop fields is an essential practice that allows the farmer to evaluate operating performance separately and compare different seed varieties, pesticides, and fertilizers. However, manually identifying productive fields is often a time-consuming and error-prone task. Previous studies explore different methods to detect crop fields using advanced machi…
▽ More
In precision agriculture, detecting productive crop fields is an essential practice that allows the farmer to evaluate operating performance separately and compare different seed varieties, pesticides, and fertilizers. However, manually identifying productive fields is often a time-consuming and error-prone task. Previous studies explore different methods to detect crop fields using advanced machine learning algorithms, but they often lack good quality labeled data. In this context, we propose a high-quality dataset generated by machine operation combined with Sentinel-2 images tracked over time. As far as we know, it is the first one to overcome the lack of labeled samples by using this technique. In sequence, we apply a semi-supervised classification of unlabeled data and state-of-the-art supervised and self-supervised deep learning methods to detect productive crop fields automatically. Finally, the results demonstrate high accuracy in Positive Unlabeled learning, which perfectly fits the problem where we have high confidence in the positive samples. Best performances have been found in Triplet Loss Siamese given the existence of an accurate dataset and Contrastive Learning considering situations where we do not have a comprehensive labeled dataset available.
△ Less
Submitted 25 July, 2023; v1 submitted 19 May, 2023;
originally announced May 2023.
-
Topological phase detection through high-harmonic spectroscopy in extended Su-Schrieffer-Heeger chains
Authors:
Mohit Lal Bera,
Jessica O. de Almeida,
Marlena Dziurawiec,
Marcin Płodzień,
Maciej M. Maśka,
Maciej Lewenstein,
Tobias Grass,
Utso Bhattacharya
Abstract:
Su-Schrieffer-Heeger (SSH) chains are paradigmatic examples of 1D topological insulators hosting zero-energy edge modes when the bulk of the system has a non-zero topological winding invariant. Recently, high-harmonic spectroscopy has been suggested as a tool for detecting the topological phase. Specifically, it has been shown that when the SSH chain is coupled to an external laser field of a freq…
▽ More
Su-Schrieffer-Heeger (SSH) chains are paradigmatic examples of 1D topological insulators hosting zero-energy edge modes when the bulk of the system has a non-zero topological winding invariant. Recently, high-harmonic spectroscopy has been suggested as a tool for detecting the topological phase. Specifically, it has been shown that when the SSH chain is coupled to an external laser field of a frequency much smaller than the band gap, the emitted light at harmonic frequencies strongly differs between the trivial and the topological phase. However, it remains unclear whether various non-trivial topological phases -- differing in the number of edge states -- can also be distinguished by the high harmonic generation (HHG). In this paper, we investigate this problem by studying an extended version of the SSH chain with extended-range hoppings, resulting in a topological model with different topological phases. We explicitly show that HHG spectra are a sensitive and suitable tool for distinguishing topological phases when there is more than one topological phase. We also propose a quantitative scheme based on tuning the filling of the system to precisely locate the number of edge modes in each topological phase of this chain.
△ Less
Submitted 3 May, 2023;
originally announced May 2023.
-
Super-massive black hole wake or bulgeless edge-on galaxy?
Authors:
Jorge Sanchez Almeida,
Mireia Montes,
Ignacio Trujillo
Abstract:
van Dokkum et al. (2023) reported the serendipitous discovery of a thin linear object interpreted as the trail of star-forming regions left behind by a runaway supermassive black hole (SMBH) kicked out from the center of a galaxy. Despite the undeniable interest in the idea, the actual physical interpretation is not devoid of difficulty. The wake of a SMBH produces only small perturbations on the…
▽ More
van Dokkum et al. (2023) reported the serendipitous discovery of a thin linear object interpreted as the trail of star-forming regions left behind by a runaway supermassive black hole (SMBH) kicked out from the center of a galaxy. Despite the undeniable interest in the idea, the actual physical interpretation is not devoid of difficulty. The wake of a SMBH produces only small perturbations on the external medium, which has to be in exceptional physical conditions to collapse gravitationally and form a long (40 kpc) massive (3e9 Msun) stellar trace in only 39 Myr. Here we offer a more conventional explanation: the stellar trail is a bulgeless galaxy viewed edge-on. This interpretation is supported by the fact that its position--velocity curve resembles a rotation curve which, together with its stellar mass, puts the object right on top of the Tully-Fisher relation characteristic of disk galaxies. Moreover, the rotation curve (Vmax sim 110 km/s), stellar mass, extension, width (z0 sim 1.2 kpc), and surface brightness profile of the object are very much like those of IC5249, a well-known local bulgeless edge-on galaxy. These observational facts are difficult to interpret within the SMBH wake scenario. We discuss in detail the pros and cons of the two options.
△ Less
Submitted 24 April, 2023;
originally announced April 2023.
-
Halcyon -- A Pathology Imaging and Feature analysis and Management System
Authors:
Erich Bremer,
Tammy DiPrima,
Joseph Balsamo,
Jonas Almeida,
Rajarsi Gupta,
Joel Saltz
Abstract:
Halcyon is a new pathology imaging analysis and feature management system based on W3C linked-data open standards and is designed to scale to support the needs for the voluminous production of features from deep-learning feature pipelines. Halcyon can support multiple users with a web-based UX with access to all user data over a standards-based web API allowing for integration with other processes…
▽ More
Halcyon is a new pathology imaging analysis and feature management system based on W3C linked-data open standards and is designed to scale to support the needs for the voluminous production of features from deep-learning feature pipelines. Halcyon can support multiple users with a web-based UX with access to all user data over a standards-based web API allowing for integration with other processes and software systems. Identity management and data security is also provided.
△ Less
Submitted 7 April, 2023;
originally announced April 2023.
-
Comparison of confidence regions for quantum state tomography
Authors:
Jessica O. de Almeida,
Matthias Kleinmann,
Gael Sentís
Abstract:
The quantum state associated to an unknown experimental preparation procedure can be determined by performing quantum state tomography. If the statistical uncertainty in the data dominates over other experimental errors, then a tomographic reconstruction procedure must express this uncertainty. A rigorous way to accomplish this is via statistical confidence regions in state space. Naturally, the s…
▽ More
The quantum state associated to an unknown experimental preparation procedure can be determined by performing quantum state tomography. If the statistical uncertainty in the data dominates over other experimental errors, then a tomographic reconstruction procedure must express this uncertainty. A rigorous way to accomplish this is via statistical confidence regions in state space. Naturally, the size of this region decreases when increasing the number of samples, but it also depends critically on the construction method of the region. We compare recent methods for constructing confidence regions as well as a reference method based on a Gaussian approximation. For the comparison, we propose an operational measure with the finding, that there is a significant difference between methods, but which method is preferable can depend on the details of the state preparation scenario.
△ Less
Submitted 14 November, 2023; v1 submitted 13 March, 2023;
originally announced March 2023.