subscribe to arXiv mailings

arXiv:2401.06790 [pdf, other]

Using Zero-shot Prompting in the Automatic Creation and Expansion of Topic Taxonomies for Tagging Retail Banking Transactions

Authors: Daniel de S. Moraes, Pedro T. C. Santos, Polyana B. da Costa, Matheus A. S. Pinto, Ivan de J. P. Pinto, Álvaro M. G. da Veiga, Sergio Colcher, Antonio J. G. Busson, Rafael H. Rocha, Rennan Gaio, Rafael Miceli, Gabriela Tourinho, Marcos Rabaioli, Leandro Santos, Fellipe Marques, David Favaro

Abstract: This work presents an unsupervised method for automatically constructing and expanding topic taxonomies using instruction-based fine-tuned LLMs (Large Language Models). We apply topic modeling and keyword extraction techniques to create initial topic taxonomies and LLMs to post-process the resulting terms and create a hierarchy. To expand an existing taxonomy with new terms, we use zero-shot promp… ▽ More This work presents an unsupervised method for automatically constructing and expanding topic taxonomies using instruction-based fine-tuned LLMs (Large Language Models). We apply topic modeling and keyword extraction techniques to create initial topic taxonomies and LLMs to post-process the resulting terms and create a hierarchy. To expand an existing taxonomy with new terms, we use zero-shot prompting to find out where to add new nodes, which, to our knowledge, is the first work to present such an approach to taxonomy tasks. We use the resulting taxonomies to assign tags that characterize merchants from a retail bank dataset. To evaluate our work, we asked 12 volunteers to answer a two-part form in which we first assessed the quality of the taxonomies created and then the tags assigned to merchants based on that taxonomy. The evaluation revealed a coherence rate exceeding 90% for the chosen taxonomies. The taxonomies' expansion with LLMs also showed exciting results for parent node prediction, with an f1-score above 70% in our taxonomies. △ Less

Submitted 11 February, 2024; v1 submitted 7 January, 2024; originally announced January 2024.

arXiv:2312.07730 [pdf, other]

doi 10.5753/bwaif.2023.229322

Hierarchical Classification of Financial Transactions Through Context-Fusion of Transformer-based Embeddings and Taxonomy-aware Attention Layer

Authors: Antonio J. G. Busson, Rafael Rocha, Rennan Gaio, Rafael Miceli, Ivan Pereira, Daniel de S. Moraes, Sérgio Colcher, Alvaro Veiga, Bruno Rizzi, Francisco Evangelista, Leandro Santos, Fellipe Marques, Marcos Rabaioli, Diego Feldberg, Debora Mattos, João Pasqua, Diogo Dias

Abstract: This work proposes the Two-headed DragoNet, a Transformer-based model for hierarchical multi-label classification of financial transactions. Our model is based on a stack of Transformers encoder layers that generate contextual embeddings from two short textual descriptors (merchant name and business activity), followed by a Context Fusion layer and two output heads that classify transactions accor… ▽ More This work proposes the Two-headed DragoNet, a Transformer-based model for hierarchical multi-label classification of financial transactions. Our model is based on a stack of Transformers encoder layers that generate contextual embeddings from two short textual descriptors (merchant name and business activity), followed by a Context Fusion layer and two output heads that classify transactions according to a hierarchical two-level taxonomy (macro and micro categories). Finally, our proposed Taxonomy-aware Attention Layer corrects predictions that break categorical hierarchy rules defined in the given taxonomy. Our proposal outperforms classical machine learning methods in experiments of macro-category classification by achieving an F1-score of 93\% on a card dataset and 95% on a current account dataset. △ Less

Submitted 12 December, 2023; originally announced December 2023.

arXiv:2110.01425 [pdf, other]

Building a Noisy Audio Dataset to Evaluate Machine Learning Approaches for Automatic Speech Recognition Systems

Authors: Julio Cesar Duarte, Sérgio Colcher

Abstract: Automatic speech recognition systems are part of people's daily lives, embedded in personal assistants and mobile phones, helping as a facilitator for human-machine interaction while allowing access to information in a practically intuitive way. Such systems are usually implemented using machine learning techniques, especially with deep neural networks. Even with its high performance in the task o… ▽ More Automatic speech recognition systems are part of people's daily lives, embedded in personal assistants and mobile phones, helping as a facilitator for human-machine interaction while allowing access to information in a practically intuitive way. Such systems are usually implemented using machine learning techniques, especially with deep neural networks. Even with its high performance in the task of transcribing text from speech, few works address the issue of its recognition in noisy environments and, usually, the datasets used do not contain noisy audio examples, while only mitigating this issue using data augmentation techniques. This work aims to present the process of building a dataset of noisy audios, in a specific case of degenerated audios due to interference, commonly present in radio transmissions. Additionally, we present initial results of a classifier that uses such data for evaluation, indicating the benefits of using this dataset in the recognizer's training process. Such recognizer achieves an average result of 0.4116 in terms of character error rate in the noisy set (SNR = 30). △ Less

Submitted 4 October, 2021; originally announced October 2021.

Comments: Tech report series Monografias em Ciência da Computação, september, 2021, Dep. Informática PUC-Rio, RJ, BRAZIL, ISSN 0103-9741

Report number: MCC no. 05/2021

arXiv:2106.08269 [pdf, other]

Generating Data Augmentation samples for Semantic Segmentation of Salt Bodies in a Synthetic Seismic Image Dataset

Authors: Luis Felipe Henriques, Sérgio Colcher, Ruy Luiz Milidiú, André Bulcão, Pablo Barros

Abstract: Nowadays, subsurface salt body localization and delineation, also called semantic segmentation of salt bodies, are among the most challenging geophysicist tasks. Thus, identifying large salt bodies is notoriously tricky and is crucial for identifying hydrocarbon reservoirs and drill path planning. This work proposes a Data Augmentation method based on training two generative models to augment the… ▽ More Nowadays, subsurface salt body localization and delineation, also called semantic segmentation of salt bodies, are among the most challenging geophysicist tasks. Thus, identifying large salt bodies is notoriously tricky and is crucial for identifying hydrocarbon reservoirs and drill path planning. This work proposes a Data Augmentation method based on training two generative models to augment the number of samples in a seismic image dataset for the semantic segmentation of salt bodies. Our method uses deep learning models to generate pairs of seismic image patches and their respective salt masks for the Data Augmentation. The first model is a Variational Autoencoder and is responsible for generating patches of salt body masks. The second is a Conditional Normalizing Flow model, which receives the generated masks as inputs and generates the associated seismic image patches. We evaluate the proposed method by comparing the performance of ten distinct state-of-the-art models for semantic segmentation, trained with and without the generated augmentations, in a dataset from two synthetic seismic images. The proposed methodology yields an average improvement of 8.57% in the IoU metric across all compared models. The best result is achieved by a DeeplabV3+ model variant, which presents an IoU score of 95.17% when trained with our augmentations. Additionally, our proposal outperformed six selected data augmentation methods, and the most significant improvement in the comparison, of 9.77%, is achieved by composing our DA with augmentations from an elastic transformation. At last, we show that the proposed method is adaptable for a larger context size by achieving results comparable to the obtained on the smaller context size. △ Less

Submitted 17 June, 2021; v1 submitted 15 June, 2021; originally announced June 2021.

arXiv:2011.14870 [pdf, other]

Prior Flow Variational Autoencoder: A density estimation model for Non-Intrusive Load Monitoring

Authors: Luis Felipe M. O. Henriques, Eduardo Morgan, Sergio Colcher, Ruy Luiz Milidiú

Abstract: Non-Intrusive Load Monitoring (NILM) is a computational technique to estimate the power loads' appliance-by-appliance from the whole consumption measured by a single meter. In this paper, we propose a conditional density estimation model, based on deep neural networks, that joins a Conditional Variational Autoencoder with a Conditional Invertible Normalizing Flow model to estimate the individual a… ▽ More Non-Intrusive Load Monitoring (NILM) is a computational technique to estimate the power loads' appliance-by-appliance from the whole consumption measured by a single meter. In this paper, we propose a conditional density estimation model, based on deep neural networks, that joins a Conditional Variational Autoencoder with a Conditional Invertible Normalizing Flow model to estimate the individual appliance's power demand. The resulting model is called Prior Flow Variational Autoencoder or, for simplicity PFVAE. Thus, instead of having one model per appliance, the resulting model is responsible for estimating the power demand, appliance-by-appliance, at once. We train and evaluate our proposed model in a publicly available dataset composed of power demand measures from a poultry feed factory located in Brazil. The proposed model's quality is evaluated by comparing the obtained normalized disaggregation error (NDE) and signal aggregated error (SAE) with the previous work values on the same dataset. Our proposal achieves highly competitive results, and for six of the eight machines belonging to the dataset, we observe consistent improvements that go from 28% up to 81% in NDE and from 27% up to 86% in SAE. △ Less

Submitted 27 June, 2021; v1 submitted 30 November, 2020; originally announced November 2020.

arXiv:2010.11732 [pdf, other]

doi 10.1145/3428658.3430967

A Cluster-Matching-Based Method for Video Face Recognition

Authors: Paulo R C Mendes, Antonio J G Busson, Sérgio Colcher, Daniel Schwabe, Álan L V Guedes, Carlos Laufer

Abstract: Face recognition systems are present in many modern solutions and thousands of applications in our daily lives. However, current solutions are not easily scalable, especially when it comes to the addition of new targeted people. We propose a cluster-matching-based approach for face recognition in video. In our approach, we use unsupervised learning to cluster the faces present in both the dataset… ▽ More Face recognition systems are present in many modern solutions and thousands of applications in our daily lives. However, current solutions are not easily scalable, especially when it comes to the addition of new targeted people. We propose a cluster-matching-based approach for face recognition in video. In our approach, we use unsupervised learning to cluster the faces present in both the dataset and targeted videos selected for face recognition. Moreover, we design a cluster matching heuristic to associate clusters in both sets that is also capable of identifying when a face belongs to a non-registered person. Our method has achieved a recall of 99.435% and a precision of 99.131% in the task of video face recognition. Besides performing face recognition, it can also be used to determine the video segments where each person is present. △ Less

Submitted 19 October, 2020; originally announced October 2020.

Comments: 13 pages

arXiv:2010.05760 [pdf, other]

Video Quality Enhancement Using Deep Learning-Based Prediction Models for Quantized DCT Coefficients in MPEG I-frames

Authors: Antonio J G Busson, Paulo R C Mendes, Daniel de S Moraes, Álvaro M da Veiga, Álan L V Guedes, Sérgio Colcher

Abstract: Recent works have successfully applied some types of Convolutional Neural Networks (CNNs) to reduce the noticeable distortion resulting from the lossy JPEG/MPEG compression technique. Most of them are built upon the processing made on the spatial domain. In this work, we propose a MPEG video decoder that is purely based on the frequency-to-frequency domain: it reads the quantized DCT coefficients… ▽ More Recent works have successfully applied some types of Convolutional Neural Networks (CNNs) to reduce the noticeable distortion resulting from the lossy JPEG/MPEG compression technique. Most of them are built upon the processing made on the spatial domain. In this work, we propose a MPEG video decoder that is purely based on the frequency-to-frequency domain: it reads the quantized DCT coefficients received from a low-quality I-frames bitstream and, using a deep learning-based model, predicts the missing coefficients in order to recompose the same frames with enhanced quality. In experiments with a video dataset, our best model was able to improve from frames with quantized DCT coefficients corresponding to a Quality Factor (QF) of 10 to enhanced quality frames with QF slightly near to 20. △ Less

Submitted 9 October, 2020; originally announced October 2020.

arXiv:2010.04676 [pdf, other]

A Clustering-Based Method for Automatic Educational Video Recommendation Using Deep Face-Features of Lecturers

Authors: Paulo R. C. Mendes, Eduardo S. Vieira, Álan L. V. Guedes, Antonio J. G. Busson, Sérgio Colcher

Abstract: Discovering and accessing specific content within educational video bases is a challenging task, mainly because of the abundance of video content and its diversity. Recommender systems are often used to enhance the ability to find and select content. But, recommendation mechanisms, especially those based on textual information, exhibit some limitations, such as being error-prone to manually create… ▽ More Discovering and accessing specific content within educational video bases is a challenging task, mainly because of the abundance of video content and its diversity. Recommender systems are often used to enhance the ability to find and select content. But, recommendation mechanisms, especially those based on textual information, exhibit some limitations, such as being error-prone to manually created keywords or due to imprecise speech recognition. This paper presents a method for generating educational video recommendation using deep face-features of lecturers without identifying them. More precisely, we use an unsupervised face clustering mechanism to create relations among the videos based on the lecturer's presence. Then, for a selected educational video taken as a reference, we recommend the ones where the presence of the same lecturers is detected. Moreover, we rank these recommended videos based on the amount of time the referenced lecturers were present. For this task, we achieved a mAP value of 99.165%. △ Less

Submitted 9 October, 2020; originally announced October 2020.

arXiv:2005.03626 [pdf, other]

Seismic Shot Gather Noise Localization Using a Multi-Scale Feature-Fusion-Based Neural Network

Authors: Antonio José G. Busson, Sérgio Colcher, Ruy Luiz Milidiú, Bruno Pereira Dias, André Bulcão

Abstract: Deep learning-based models, such as convolutional neural networks, have advanced various segments of computer vision. However, this technology is rarely applied to seismic shot gather noise localization problem. This letter presents an investigation on the effectiveness of a multi-scale feature-fusion-based network for seismic shot-gather noise localization. Herein, we describe the following: (1)… ▽ More Deep learning-based models, such as convolutional neural networks, have advanced various segments of computer vision. However, this technology is rarely applied to seismic shot gather noise localization problem. This letter presents an investigation on the effectiveness of a multi-scale feature-fusion-based network for seismic shot-gather noise localization. Herein, we describe the following: (1) the construction of a real-world dataset of seismic noise localization based on 6,500 seismograms; (2) a multi-scale feature-fusion-based detector that uses the MobileNet combined with the Feature Pyramid Net as the backbone; and (3) the Single Shot multi-box detector for box classification/regression. Additionally, we propose the use of the Focal Loss function that improves the detector's prediction accuracy. The proposed detector achieves an AP@0.5 of 78.67\% in our empirical evaluation. △ Less

Submitted 7 May, 2020; originally announced May 2020.

arXiv:1912.01148 [pdf, other]

A Deep Convolutional Network for Seismic Shot-Gather Image Quality Classification

Authors: Eduardo Betine Bucker, Antonio José Grandson Busson, Ruy Luiz Milidiú, Sérgio Colcher, Bruno Pereira Dias, André Bulcão

Abstract: Deep Learning-based models such as Convolutional Neural Networks, have led to significant advancements in several areas of computing applications. Seismogram quality assurance is a relevant Geophysics task, since in the early stages of seismic processing, we are required to identify and fix noisy sail lines. In this work, we introduce a real-world seismogram quality classification dataset based on… ▽ More Deep Learning-based models such as Convolutional Neural Networks, have led to significant advancements in several areas of computing applications. Seismogram quality assurance is a relevant Geophysics task, since in the early stages of seismic processing, we are required to identify and fix noisy sail lines. In this work, we introduce a real-world seismogram quality classification dataset based on 6,613 examples, manually labeled by human experts as good, bad or ugly, according to their noise intensity. This dataset is used to train a CNN classifier for seismic shot-gathers quality prediction. In our empirical evaluation, we observe an F1-score of 93.56% in the test set. △ Less

Submitted 2 December, 2019; originally announced December 2019.

arXiv:1911.03974 [pdf, other]

A Multimodal CNN-based Tool to Censure Inappropriate Video Scenes

Authors: Pedro V. A. de Freitas, Paulo R. C. Mendes, Gabriel N. P. dos Santos, Antonio José G. Busson, Álan Livio Guedes, Sérgio Colcher, Ruy Luiz Milidiú

Abstract: Due to the extensive use of video-sharing platforms and services for their storage, the amount of such media on the internet has become massive. This volume of data makes it difficult to control the kind of content that may be present in such video files. One of the main concerns regarding the video content is if it has an inappropriate subject matter, such as nudity, violence, or other potentiall… ▽ More Due to the extensive use of video-sharing platforms and services for their storage, the amount of such media on the internet has become massive. This volume of data makes it difficult to control the kind of content that may be present in such video files. One of the main concerns regarding the video content is if it has an inappropriate subject matter, such as nudity, violence, or other potentially disturbing content. More than telling if a video is either appropriate or inappropriate, it is also important to identify which parts of it contain such content, for preserving parts that would be discarded in a simple broad analysis. In this work, we present a multimodal~(using audio and image features) architecture based on Convolutional Neural Networks (CNNs) for detecting inappropriate scenes in video files. In the task of classifying video files, our model achieved 98.95\% and 98.94\% of F1-score for the appropriate and inappropriate classes, respectively. We also present a censoring tool that automatically censors inappropriate segments of a video file. △ Less

Submitted 10 November, 2019; originally announced November 2019.

arXiv:1811.04193 [pdf, other]

A Ginga-enabled Digital Radio Mondiale Broadcasting chain: Signaling and Definitions

Authors: Rafael Diniz, Alan L. V. Guedes, Sergio Colcher

Abstract: ISDB-T International standard is currently adopted by most Latin America countries and is already installed in most TV sets sold in recent years in the region. To support interactive applications in Digital TV receivers, ISDB-T defines the middleware Ginga. Similar to Digital TV, Digital Radio standards also provide the means to carry interactive applications; however, their specifications for int… ▽ More ISDB-T International standard is currently adopted by most Latin America countries and is already installed in most TV sets sold in recent years in the region. To support interactive applications in Digital TV receivers, ISDB-T defines the middleware Ginga. Similar to Digital TV, Digital Radio standards also provide the means to carry interactive applications; however, their specifications for interactive applications are usually more restricted than the ones used in Digital TV. Also, interactive applications for Digital TV and Digital Radio are usually incompatible. Motivated by such observations, this report considers the importance of interactive applications for both TV and Radio Broadcasting and the advantages of using the same middleware and languages specification for Digital TV and Radio. More specifically, it establishes the signaling and definitions on how to transport and execute Ginga-NCL and Ginga-HTML5 applications over DRM (Digital Radio Mondiale) transmission. Ministry of Science, Technology, Innovation and Communication of Brazil is carrying trials with Digital Radio Mondiale standard in order to define the reference model of the Brazilian Digital Radio System (Portuguese: Sistema Brasileiro de Rádio Digital - SBRD). △ Less

Submitted 12 June, 2019; v1 submitted 9 November, 2018; originally announced November 2018.

Comments: 15 pages

Report number: ISSN 0103-9741 ISSN 0103-9741

Showing 1–12 of 12 results for author: Colcher, S