Skip to main content

Showing 1–50 of 62 results for author: Ogawa, T

  1. arXiv:2410.07563  [pdf, other

    cs.CL cs.AI cs.LG

    PLaMo-100B: A Ground-Up Language Model Designed for Japanese Proficiency

    Authors: Kenshin Abe, Kaizaburo Chubachi, Yasuhiro Fujita, Yuta Hirokawa, Kentaro Imajo, Toshiki Kataoka, Hiroyoshi Komatsu, Hiroaki Mikami, Tsuguo Mogami, Shogo Murai, Kosuke Nakago, Daisuke Nishino, Toru Ogawa, Daisuke Okanohara, Yoshihiko Ozaki, Shotaro Sano, Shuji Suzuki, Tianqi Xu, Toshihiko Yanase

    Abstract: We introduce PLaMo-100B, a large-scale language model designed for Japanese proficiency. The model was trained from scratch using 2 trillion tokens, with architecture such as QK Normalization and Z-Loss to ensure training stability during the training process. Post-training techniques, including Supervised Fine-Tuning and Direct Preference Optimization, were applied to refine the model's performan… ▽ More

    Submitted 9 October, 2024; originally announced October 2024.

  2. arXiv:2409.01534  [pdf, other

    cs.CV cs.AI cs.MM

    Think Twice Before Recognizing: Large Multimodal Models for General Fine-grained Traffic Sign Recognition

    Authors: Yaozong Gan, Guang Li, Ren Togo, Keisuke Maeda, Takahiro Ogawa, Miki Haseyama

    Abstract: We propose a new strategy called think twice before recognizing to improve fine-grained traffic sign recognition (TSR). Fine-grained TSR in the wild is difficult due to the complex road conditions, and existing approaches particularly struggle with cross-country TSR when data is lacking. Our strategy achieves effective fine-grained TSR by stimulating the multiple-thinking capability of large multi… ▽ More

    Submitted 2 September, 2024; originally announced September 2024.

  3. arXiv:2409.00919  [pdf, other

    cs.SD cs.AI eess.AS

    MMT-BERT: Chord-aware Symbolic Music Generation Based on Multitrack Music Transformer and MusicBERT

    Authors: Jinlong Zhu, Keigo Sakurai, Ren Togo, Takahiro Ogawa, Miki Haseyama

    Abstract: We propose a novel symbolic music representation and Generative Adversarial Network (GAN) framework specially designed for symbolic multitrack music generation. The main theme of symbolic music generation primarily encompasses the preprocessing of music data and the implementation of a deep learning framework. Current techniques dedicated to symbolic music generation generally encounter two signif… ▽ More

    Submitted 1 September, 2024; originally announced September 2024.

    Comments: Accepted to the 25th International Society for Music Information Retrieval Conference (ISMIR 2024)

  4. arXiv:2408.08610  [pdf, other

    cs.CV cs.AI cs.LG

    Generative Dataset Distillation Based on Diffusion Model

    Authors: Duo Su, Junjie Hou, Guang Li, Ren Togo, Rui Song, Takahiro Ogawa, Miki Haseyama

    Abstract: This paper presents our method for the generative track of The First Dataset Distillation Challenge at ECCV 2024. Since the diffusion model has become the mainstay of generative models because of its high-quality generative effects, we focus on distillation methods based on the diffusion model. Considering that the track can only generate a fixed number of images in 10 minutes using a generative m… ▽ More

    Submitted 16 August, 2024; originally announced August 2024.

    Comments: The Third Place Winner in Generative Track of the ECCV 2024 DD Challenge

  5. Tool Shape Optimization through Backpropagation of Neural Network

    Authors: Kento Kawaharazuka, Toru Ogawa, Cota Nabeshima

    Abstract: When executing a certain task, human beings can choose or make an appropriate tool to achieve the task. This research especially addresses the optimization of tool shape for robotic tool-use. We propose a method in which a robot obtains an optimized tool shape, tool trajectory, or both, depending on a given task. The feature of our method is that a transition of the task state when the robot moves… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

    Comments: Accepted at IROS2020

  6. Dynamic Task Control Method of a Flexible Manipulator Using a Deep Recurrent Neural Network

    Authors: Kento Kawaharazuka, Toru Ogawa, Cota Nabeshima

    Abstract: The flexible body has advantages over the rigid body in terms of environmental contact thanks to its underactuation. On the other hand, when applying conventional control methods to realize dynamic tasks with the flexible body, there are two difficulties: accurate modeling of the flexible body and the derivation of intermediate postures to achieve the tasks. Learning-based methods are considered t… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

    Comments: Accepted at IROS2019

  7. arXiv:2407.05814  [pdf, other

    cs.CV cs.AI cs.MM

    Cross-domain Few-shot In-context Learning for Enhancing Traffic Sign Recognition

    Authors: Yaozong Gan, Guang Li, Ren Togo, Keisuke Maeda, Takahiro Ogawa, Miki Haseyama

    Abstract: Recent multimodal large language models (MLLM) such as GPT-4o and GPT-4v have shown great potential in autonomous driving. In this paper, we propose a cross-domain few-shot in-context learning method based on the MLLM for enhancing traffic sign recognition (TSR). We first construct a traffic sign detection network based on Vision Transformer Adapter and an extraction module to extract traffic sign… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

  8. arXiv:2406.18836  [pdf, other

    cs.CV cs.IR

    Zero-shot Composed Image Retrieval Considering Query-target Relationship Leveraging Masked Image-text Pairs

    Authors: Huaying Zhang, Rintaro Yanagi, Ren Togo, Takahiro Ogawa, Miki Haseyama

    Abstract: This paper proposes a novel zero-shot composed image retrieval (CIR) method considering the query-target relationship by masked image-text pairs. The objective of CIR is to retrieve the target image using a query image and a query text. Existing methods use a textual inversion network to convert the query image into a pseudo word to compose the image and text and use a pre-trained visual-language… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

    Comments: Accepted as a conference paper in IEEE ICIP 2024

  9. arXiv:2406.13316  [pdf, other

    cs.CV cs.MM

    Reinforcing Pre-trained Models Using Counterfactual Images

    Authors: Xiang Li, Ren Togo, Keisuke Maeda, Takahiro Ogawa, Miki Haseyama

    Abstract: This paper proposes a novel framework to reinforce classification models using language-guided generated counterfactual images. Deep learning classification models are often trained using datasets that mirror real-world scenarios. In this training process, because learning is based solely on correlations with labels, there is a risk that models may learn spurious relationships, such as an overreli… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

    Comments: 6 pages, 4 figures

  10. arXiv:2406.01033  [pdf

    cs.CV cs.LG cs.MM

    Generalized Jersey Number Recognition Using Multi-task Learning With Orientation-guided Weight Refinement

    Authors: Yung-Hui Lin, Yu-Wen Chang, Huang-Chia Shih, Takahiro Ogawa

    Abstract: Jersey number recognition (JNR) has always been an important task in sports analytics. Improving recognition accuracy remains an ongoing challenge because images are subject to blurring, occlusion, deformity, and low resolution. Recent research has addressed these problems using number localization and optical character recognition. Some approaches apply player identification schemes to image sequ… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

    Comments: 10 pages, 6 figures, 5 tables

  11. arXiv:2404.17732  [pdf, other

    cs.CV cs.AI cs.LG

    Generative Dataset Distillation: Balancing Global Structure and Local Details

    Authors: Longzhen Li, Guang Li, Ren Togo, Keisuke Maeda, Takahiro Ogawa, Miki Haseyama

    Abstract: In this paper, we propose a new dataset distillation method that considers balancing global structure and local details when distilling the information from a large dataset into a generative model. Dataset distillation has been proposed to reduce the size of the required dataset when training models. The conventional dataset distillation methods face the problem of long redeployment time and poor… ▽ More

    Submitted 26 April, 2024; originally announced April 2024.

    Comments: Accepted by the 1st CVPR Workshop on Dataset Distillation

  12. arXiv:2403.18258  [pdf, other

    cs.CV cs.AI

    Enhancing Generative Class Incremental Learning Performance with Model Forgetting Approach

    Authors: Taro Togo, Ren Togo, Keisuke Maeda, Takahiro Ogawa, Miki Haseyama

    Abstract: This study presents a novel approach to Generative Class Incremental Learning (GCIL) by introducing the forgetting mechanism, aimed at dynamically managing class information for better adaptation to streaming data. GCIL is one of the hot topics in the field of computer vision, and this is considered one of the crucial tasks in society, specifically the continual learning of generative models. The… ▽ More

    Submitted 27 March, 2024; originally announced March 2024.

  13. arXiv:2402.09677  [pdf, other

    cs.CV

    Prompt-based Personalized Federated Learning for Medical Visual Question Answering

    Authors: He Zhu, Ren Togo, Takahiro Ogawa, Miki Haseyama

    Abstract: We present a novel prompt-based personalized federated learning (pFL) method to address data heterogeneity and privacy concerns in traditional medical visual question answering (VQA) methods. Specifically, we regard medical datasets from different organs as clients and use pFL to train personalized transformer-based VQA models for each client. To address the high computational complexity of client… ▽ More

    Submitted 14 February, 2024; originally announced February 2024.

    Comments: Accept by ICASSP2024

  14. arXiv:2401.15863  [pdf, other

    cs.CV cs.AI cs.LG

    Importance-Aware Adaptive Dataset Distillation

    Authors: Guang Li, Ren Togo, Takahiro Ogawa, Miki Haseyama

    Abstract: Herein, we propose a novel dataset distillation method for constructing small informative datasets that preserve the information of the large original datasets. The development of deep learning models is enabled by the availability of large-scale datasets. Despite unprecedented success, large-scale datasets considerably increase the storage and transmission costs, resulting in a cumbersome model t… ▽ More

    Submitted 28 January, 2024; originally announced January 2024.

    Comments: Published as a journal paper in Elsevier Neural Networks

  15. arXiv:2310.08277  [pdf, other

    eess.AS cs.SD

    A Single Speech Enhancement Model Unifying Dereverberation, Denoising, Speaker Counting, Separation, and Extraction

    Authors: Kohei Saijo, Wangyou Zhang, Zhong-Qiu Wang, Shinji Watanabe, Tetsunori Kobayashi, Tetsuji Ogawa

    Abstract: We propose a multi-task universal speech enhancement (MUSE) model that can perform five speech enhancement (SE) tasks: dereverberation, denoising, speech separation (SS), target speaker extraction (TSE), and speaker counting. This is achieved by integrating two modules into an SE model: 1) an internal separation module that does both speaker counting and separation; and 2) a TSE module that extrac… ▽ More

    Submitted 12 October, 2023; originally announced October 2023.

    Comments: 6 pages, 4 figures, 2 tables, accepted by ASRU2023

  16. arXiv:2309.10524  [pdf, other

    eess.AS cs.CL cs.SD

    Harnessing the Zero-Shot Power of Instruction-Tuned Large Language Model in End-to-End Speech Recognition

    Authors: Yosuke Higuchi, Tetsuji Ogawa, Tetsunori Kobayashi

    Abstract: We propose to utilize an instruction-tuned large language model (LLM) for guiding the text generation process in automatic speech recognition (ASR). Modern large language models (LLMs) are adept at performing various text generation tasks through zero-shot learning, prompted with instructions designed for specific objectives. This paper explores the potential of LLMs to derive linguistic informati… ▽ More

    Submitted 30 September, 2024; v1 submitted 19 September, 2023; originally announced September 2023.

    Comments: Submitted to ICASSP2025

  17. arXiv:2309.04654  [pdf, other

    cs.SD eess.AS

    Mask-CTC-based Encoder Pre-training for Streaming End-to-End Speech Recognition

    Authors: Huaibo Zhao, Yosuke Higuchi, Yusuke Kida, Tetsuji Ogawa, Tetsunori Kobayashi

    Abstract: Achieving high accuracy with low latency has always been a challenge in streaming end-to-end automatic speech recognition (ASR) systems. By attending to more future contexts, a streaming ASR model achieves higher accuracy but results in larger latency, which hurts the streaming performance. In the Mask-CTC framework, an encoder network is trained to learn the feature representation that anticipate… ▽ More

    Submitted 8 September, 2023; originally announced September 2023.

    Comments: Accepted to EUSIPCO 2023

  18. arXiv:2309.00376  [pdf, other

    eess.AS cs.SD

    Remixing-based Unsupervised Source Separation from Scratch

    Authors: Kohei Saijo, Tetsuji Ogawa

    Abstract: We propose an unsupervised approach for training separation models from scratch using RemixIT and Self-Remixing, which are recently proposed self-supervised learning methods for refining pre-trained models. They first separate mixtures with a teacher model and create pseudo-mixtures by shuffling and remixing the separated signals. A student model is then trained to separate the pseudo-mixtures usi… ▽ More

    Submitted 1 September, 2023; originally announced September 2023.

    Comments: Interspeech2023, 5pages, 2figures, 2tables

  19. arXiv:2307.02799  [pdf, other

    eess.IV cs.LG

    Few-shot Personalized Saliency Prediction Based on Inter-personnel Gaze Patterns

    Authors: Yuya Moroto, Keisuke Maeda, Takahiro Ogawa, Miki Haseyama

    Abstract: This paper presents few-shot personalized saliency prediction based on inter-personnel gaze patterns. In contrast to general saliency maps, personalized saliecny maps (PSMs) have been great potential since PSMs indicate the person-specific visual attention useful for obtaining individual visual preferences. The PSM prediction is needed for acquiring the PSMs for unseen images, but its prediction i… ▽ More

    Submitted 3 March, 2024; v1 submitted 6 July, 2023; originally announced July 2023.

    Comments: 5pages, 3 figures

  20. arXiv:2303.06806  [pdf, other

    eess.AS cs.CL cs.SD

    Neural Diarization with Non-autoregressive Intermediate Attractors

    Authors: Yusuke Fujita, Tatsuya Komatsu, Robin Scheibler, Yusuke Kida, Tetsuji Ogawa

    Abstract: End-to-end neural diarization (EEND) with encoder-decoder-based attractors (EDA) is a promising method to handle the whole speaker diarization problem simultaneously with a single neural network. While the EEND model can produce all frame-level speaker labels simultaneously, it disregards output label dependency. In this work, we propose a novel EEND model that introduces the label dependency betw… ▽ More

    Submitted 12 March, 2023; originally announced March 2023.

    Comments: ICASSP 2023

  21. arXiv:2303.04388  [pdf, other

    cs.CV

    Interpretable Visual Question Answering Referring to Outside Knowledge

    Authors: He Zhu, Ren Togo, Takahiro Ogawa, Miki Haseyama

    Abstract: We present a novel multimodal interpretable VQA model that can answer the question more accurately and generate diverse explanations. Although researchers have proposed several methods that can generate human-readable and fine-grained natural language sentences to explain a model's decision, these methods have focused solely on the information in the image. Ideally, the model should refer to vario… ▽ More

    Submitted 8 March, 2023; originally announced March 2023.

    Comments: Under review

  22. arXiv:2302.08493  [pdf, other

    cs.CV cs.HC eess.IV

    Deep Multi-stream Network for Video-based Calving Sign Detection

    Authors: Ryosuke Hyodo, Teppei Nakano, Tetsuji Ogawa

    Abstract: We have designed a deep multi-stream network for automatically detecting calving signs from video. Calving sign detection from a camera, which is a non-contact sensor, is expected to enable more efficient livestock management. As large-scale, well-developed data cannot generally be assumed when establishing calving detection systems, the basis for making the prediction needs to be presented to far… ▽ More

    Submitted 10 January, 2023; originally announced February 2023.

  23. arXiv:2301.03926  [pdf, other

    cs.HC cs.CV eess.IV

    Video Surveillance System Incorporating Expert Decision-making Process: A Case Study on Detecting Calving Signs in Cattle

    Authors: Ryosuke Hyodo, Susumu Saito, Teppei Nakano, Makoto Akabane, Ryoichi Kasuga, Tetsuji Ogawa

    Abstract: Through a user study in the field of livestock farming, we verify the effectiveness of an XAI framework for video surveillance systems. The systems can be made interpretable by incorporating experts' decision-making processes. AI systems are becoming increasingly common in real-world applications, especially in fields related to human decision-making, and its interpretability is necessary. However… ▽ More

    Submitted 10 January, 2023; originally announced January 2023.

  24. arXiv:2212.09281  [pdf, other

    eess.IV cs.CV

    Boosting Automatic COVID-19 Detection Performance with Self-Supervised Learning and Batch Knowledge Ensembling

    Authors: Guang Li, Ren Togo, Takahiro Ogawa, Miki Haseyama

    Abstract: Problem: Detecting COVID-19 from chest X-Ray (CXR) images has become one of the fastest and easiest methods for detecting COVID-19. However, the existing methods usually use supervised transfer learning from natural images as a pretraining process. These methods do not consider the unique features of COVID-19 and the similar features between COVID-19 and other pneumonia. Aim: In this paper, we wan… ▽ More

    Submitted 30 March, 2023; v1 submitted 19 December, 2022; originally announced December 2022.

    Comments: Published as a journal paper at Elsevier CIBM

  25. arXiv:2212.09276  [pdf, other

    eess.IV cs.CV cs.LG

    COVID-19 Detection Based on Self-Supervised Transfer Learning Using Chest X-Ray Images

    Authors: Guang Li, Ren Togo, Takahiro Ogawa, Miki Haseyama

    Abstract: Purpose: Considering several patients screened due to COVID-19 pandemic, computer-aided detection has strong potential in assisting clinical workflow efficiency and reducing the incidence of infections among radiologists and healthcare providers. Since many confirmed COVID-19 cases present radiological findings of pneumonia, radiologic examinations can be useful for fast detection. Therefore, ches… ▽ More

    Submitted 19 December, 2022; originally announced December 2022.

    Comments: Published as a journal paper at Springer IJCARS

  26. Union-set Multi-source Model Adaptation for Semantic Segmentation

    Authors: Zongyao Li, Ren Togo, Takahiro Ogawa, Miki haseyama

    Abstract: This paper solves a generalized version of the problem of multi-source model adaptation for semantic segmentation. Model adaptation is proposed as a new domain adaptation problem which requires access to a pre-trained model instead of data for the source domain. A general multi-source setting of model adaptation assumes strictly that each source domain shares a common label space with the target d… ▽ More

    Submitted 6 December, 2022; originally announced December 2022.

    Comments: Accepted by ECCV2022

  27. arXiv:2211.10194  [pdf, other

    eess.AS cs.SD

    Self-Remixing: Unsupervised Speech Separation via Separation and Remixing

    Authors: Kohei Saijo, Tetsuji Ogawa

    Abstract: We present Self-Remixing, a novel self-supervised speech separation method, which refines a pre-trained separation model in an unsupervised manner. The proposed method consists of a shuffler module and a solver module, and they grow together through separation and remixing processes. Specifically, the shuffler first separates observed mixtures and makes pseudo-mixtures by shuffling and remixing th… ▽ More

    Submitted 1 September, 2023; v1 submitted 18 November, 2022; originally announced November 2022.

    Comments: Accepted by ICASSP2023, 5pages, 2figures, 2tables

  28. arXiv:2211.00858  [pdf, other

    cs.SD eess.AS

    Conversation-oriented ASR with multi-look-ahead CBS architecture

    Authors: Huaibo Zhao, Shinya Fujie, Tetsuji Ogawa, Jin Sakuma, Yusuke Kida, Tetsunori Kobayashi

    Abstract: During conversations, humans are capable of inferring the intention of the speaker at any point of the speech to prepare the following action promptly. Such ability is also the key for conversational systems to achieve rhythmic and natural conversation. To perform this, the automatic speech recognition (ASR) used for transcribing the speech in real-time must achieve high accuracy without delay. In… ▽ More

    Submitted 1 November, 2022; originally announced November 2022.

    Comments: Submitted to ICASSP2023

  29. arXiv:2211.00795  [pdf, other

    eess.AS cs.CL cs.SD

    InterMPL: Momentum Pseudo-Labeling with Intermediate CTC Loss

    Authors: Yosuke Higuchi, Tetsuji Ogawa, Tetsunori Kobayashi, Shinji Watanabe

    Abstract: This paper presents InterMPL, a semi-supervised learning method of end-to-end automatic speech recognition (ASR) that performs pseudo-labeling (PL) with intermediate supervision. Momentum PL (MPL) trains a connectionist temporal classification (CTC)-based model on unlabeled data by continuously generating pseudo-labels on the fly and improving their quality. In contrast to autoregressive formulati… ▽ More

    Submitted 16 March, 2023; v1 submitted 1 November, 2022; originally announced November 2022.

    Comments: Accepted to ICASSP2023

  30. arXiv:2211.00792  [pdf, other

    eess.AS cs.CL cs.SD

    BECTRA: Transducer-based End-to-End ASR with BERT-Enhanced Encoder

    Authors: Yosuke Higuchi, Tetsuji Ogawa, Tetsunori Kobayashi, Shinji Watanabe

    Abstract: We present BERT-CTC-Transducer (BECTRA), a novel end-to-end automatic speech recognition (E2E-ASR) model formulated by the transducer with a BERT-enhanced encoder. Integrating a large-scale pre-trained language model (LM) into E2E-ASR has been actively studied, aiming to utilize versatile linguistic knowledge for generating accurate text. One crucial factor that makes this integration challenging… ▽ More

    Submitted 16 March, 2023; v1 submitted 1 November, 2022; originally announced November 2022.

    Comments: Accepted to ICASSP2023

  31. arXiv:2211.00313  [pdf, other

    cs.CV cs.LG eess.IV

    RGMIM: Region-Guided Masked Image Modeling for Learning Meaningful Representations from X-Ray Images

    Authors: Guang Li, Ren Togo, Takahiro Ogawa, Miki Haseyama

    Abstract: In this study, we propose a novel method called region-guided masked image modeling (RGMIM) for learning meaningful representations from X-ray images. Our method adopts a new masking strategy that utilizes organ mask information to identify valid regions for learning more meaningful representations. We conduct quantitative evaluations on an open lung X-ray image dataset as well as masking ratio hy… ▽ More

    Submitted 17 August, 2024; v1 submitted 1 November, 2022; originally announced November 2022.

    Comments: Accepted by ECCV 2024 Workshop on Human-inspired Computer Vision

  32. arXiv:2210.16663  [pdf, other

    eess.AS cs.CL

    BERT Meets CTC: New Formulation of End-to-End Speech Recognition with Pre-trained Masked Language Model

    Authors: Yosuke Higuchi, Brian Yan, Siddhant Arora, Tetsuji Ogawa, Tetsunori Kobayashi, Shinji Watanabe

    Abstract: This paper presents BERT-CTC, a novel formulation of end-to-end speech recognition that adapts BERT for connectionist temporal classification (CTC). Our formulation relaxes the conditional independence assumptions used in conventional CTC and incorporates linguistic knowledge through the explicit output dependency obtained by BERT contextual embedding. BERT-CTC attends to the full contexts of the… ▽ More

    Submitted 19 April, 2023; v1 submitted 29 October, 2022; originally announced October 2022.

    Comments: v1: Accepted to Findings of EMNLP2022, v2: Minor corrections and clearer derivation of Eq. (21)

  33. Dataset Complexity Assessment Based on Cumulative Maximum Scaled Area Under Laplacian Spectrum

    Authors: Guang Li, Ren Togo, Takahiro Ogawa, Miki Haseyama

    Abstract: Dataset complexity assessment aims to predict classification performance on a dataset with complexity calculation before training a classifier, which can also be used for classifier selection and dataset reduction. The training process of deep convolutional neural networks (DCNNs) is iterative and time-consuming because of hyperparameter uncertainty and the domain shift introduced by different dat… ▽ More

    Submitted 29 September, 2022; originally announced September 2022.

    Comments: Published as a journal paper at Springer MTAP

  34. Compressed Gastric Image Generation Based on Soft-Label Dataset Distillation for Medical Data Sharing

    Authors: Guang Li, Ren Togo, Takahiro Ogawa, Miki Haseyama

    Abstract: Background and objective: Sharing of medical data is required to enable the cross-agency flow of healthcare information and construct high-accuracy computer-aided diagnosis systems. However, the large sizes of medical datasets, the massive amount of memory of saved deep convolutional neural network (DCNN) models, and patients' privacy protection are problems that can lead to inefficient medical da… ▽ More

    Submitted 1 November, 2022; v1 submitted 29 September, 2022; originally announced September 2022.

    Comments: Published as a journal paper at Elsevier CMPB

  35. arXiv:2209.14609  [pdf, other

    cs.CV cs.AI cs.LG

    Dataset Distillation Using Parameter Pruning

    Authors: Guang Li, Ren Togo, Takahiro Ogawa, Miki Haseyama

    Abstract: In this study, we propose a novel dataset distillation method based on parameter pruning. The proposed method can synthesize more robust distilled datasets and improve distillation performance by pruning difficult-to-match parameters during the distillation process. Experimental results on two benchmark datasets show the superiority of the proposed method.

    Submitted 20 August, 2023; v1 submitted 29 September, 2022; originally announced September 2022.

    Comments: Published as a journal paper at IEICE Trans. Fund

  36. arXiv:2209.14603  [pdf, other

    cs.CR cs.CV cs.LG eess.IV

    Dataset Distillation for Medical Dataset Sharing

    Authors: Guang Li, Ren Togo, Takahiro Ogawa, Miki Haseyama

    Abstract: Sharing medical datasets between hospitals is challenging because of the privacy-protection problem and the massive cost of transmitting and storing many high-resolution medical images. However, dataset distillation can synthesize a small dataset such that models trained on it achieve comparable performance with the original large dataset, which shows potential for solving the existing medical sha… ▽ More

    Submitted 23 December, 2022; v1 submitted 29 September, 2022; originally announced September 2022.

    Comments: Accepted by AAAI-23 Workshop on Representation Learning for Responsible Human-Centric AI

  37. arXiv:2209.07007  [pdf, other

    cs.LG cs.CV

    Gromov-Wasserstein Autoencoders

    Authors: Nao Nakagawa, Ren Togo, Takahiro Ogawa, Miki Haseyama

    Abstract: Variational Autoencoder (VAE)-based generative models offer flexible representation learning by incorporating meta-priors, general premises considered beneficial for downstream tasks. However, the incorporated meta-priors often involve ad-hoc model deviations from the original likelihood architecture, causing undesirable changes in their training. In this paper, we propose a novel representation l… ▽ More

    Submitted 24 February, 2023; v1 submitted 14 September, 2022; originally announced September 2022.

    Comments: 38 pages, 9 tables, 13 figures; accepted at ICLR2023

  38. TriBYOL: Triplet BYOL for Self-Supervised Representation Learning

    Authors: Guang Li, Ren Togo, Takahiro Ogawa, Miki Haseyama

    Abstract: This paper proposes a novel self-supervised learning method for learning better representations with small batch sizes. Many self-supervised learning methods based on certain forms of the siamese network have emerged and received significant attention. However, these methods need to use large batch sizes to learn good representations and require heavy computational resources. We present a new trip… ▽ More

    Submitted 7 June, 2022; originally announced June 2022.

    Comments: Published as a conference paper at ICASSP 2022

  39. Self-Knowledge Distillation based Self-Supervised Learning for Covid-19 Detection from Chest X-Ray Images

    Authors: Guang Li, Ren Togo, Takahiro Ogawa, Miki Haseyama

    Abstract: The global outbreak of the Coronavirus 2019 (COVID-19) has overloaded worldwide healthcare systems. Computer-aided diagnosis for COVID-19 fast detection and patient triage is becoming critical. This paper proposes a novel self-knowledge distillation based self-supervised learning method for COVID-19 detection from chest X-ray images. Our method can use self-knowledge of images based on similaritie… ▽ More

    Submitted 7 June, 2022; originally announced June 2022.

    Comments: Published as a conference paper at ICASSP 2022

  40. arXiv:2203.14080  [pdf, ps, other

    eess.AS cs.SD

    Remix-cycle-consistent Learning on Adversarially Learned Separator for Accurate and Stable Unsupervised Speech Separation

    Authors: Kohei Saijo, Tetsuji Ogawa

    Abstract: A new learning algorithm for speech separation networks is designed to explicitly reduce residual noise and artifacts in the separated signal in an unsupervised manner. Generative adversarial networks are known to be effective in constructing separation networks when the ground truth for the observed signal is inaccessible. Still, weak objectives aimed at distribution-to-distribution mapping make… ▽ More

    Submitted 26 March, 2022; originally announced March 2022.

    Comments: Accepted by ICASSP2022

  41. arXiv:2110.10402  [pdf, other

    cs.SD cs.LG eess.AS

    An Investigation of Enhancing CTC Model for Triggered Attention-based Streaming ASR

    Authors: Huaibo Zhao, Yosuke Higuchi, Tetsuji Ogawa, Tetsunori Kobayashi

    Abstract: In the present paper, an attempt is made to combine Mask-CTC and the triggered attention mechanism to construct a streaming end-to-end automatic speech recognition (ASR) system that provides high performance with low latency. The triggered attention mechanism, which performs autoregressive decoding triggered by the CTC spike, has shown to be effective in streaming ASR. However, in order to maintai… ▽ More

    Submitted 20 October, 2021; originally announced October 2021.

    Comments: Accepted to APSIPA 2021

  42. arXiv:2110.04109  [pdf, other

    eess.AS cs.CL

    Hierarchical Conditional End-to-End ASR with CTC and Multi-Granular Subword Units

    Authors: Yosuke Higuchi, Keita Karube, Tetsuji Ogawa, Tetsunori Kobayashi

    Abstract: In end-to-end automatic speech recognition (ASR), a model is expected to implicitly learn representations suitable for recognizing a word-level sequence. However, the huge abstraction gap between input acoustic signals and output linguistic tokens makes it challenging for a model to learn the representations. In this work, to promote the word-level representation learning in end-to-end ASR, we pro… ▽ More

    Submitted 8 February, 2022; v1 submitted 8 October, 2021; originally announced October 2021.

    Comments: Accepted to ICASSP2022

  43. arXiv:2104.02864  [pdf, other

    cs.CV

    Self-Supervised Learning for Gastritis Detection with Gastric X-ray Images

    Authors: Guang Li, Ren Togo, Takahiro Ogawa, Miki Haseyama

    Abstract: Purpose: Manual annotation of gastric X-ray images by doctors for gastritis detection is time-consuming and expensive. To solve this, a self-supervised learning method is developed in this study. The effectiveness of the proposed self-supervised learning method in gastritis detection is verified using a few annotated gastric X-ray images. Methods: In this study, we develop a novel method that can… ▽ More

    Submitted 27 March, 2023; v1 submitted 6 April, 2021; originally announced April 2021.

    Comments: Published as a journal paper at Springer IJCARS

  44. Soft-Label Anonymous Gastric X-ray Image Distillation

    Authors: Guang Li, Ren Togo, Takahiro Ogawa, Miki Haseyama

    Abstract: This paper presents a soft-label anonymous gastric X-ray image distillation method based on a gradient descent approach. The sharing of medical data is demanded to construct high-accuracy computer-aided diagnosis (CAD) systems. However, the large size of the medical dataset and privacy protection are remaining problems in medical data sharing, which hindered the research of CAD systems. The idea o… ▽ More

    Submitted 20 March, 2024; v1 submitted 6 April, 2021; originally announced April 2021.

    Comments: The first paper to explore real-world dataset distillation; Work was done in 2019 and published as a conference paper at ICIP 2020

  45. arXiv:2012.10999  [pdf, other

    cs.HC

    Exploring Effectiveness of Inter-Microtask Qualification Tests in Crowdsourcing

    Authors: Masaya Morinaga, Susumu Saito, Teppei Nakano, Tetsunori Kobayashi, Tetsuji Ogawa

    Abstract: Qualification tests in crowdsourcing are often used to pre-filter workers by measuring their ability in executing microtasks.While creating qualification tests for each task type is considered as a common and reasonable way, this study investigates into its worker-filtering performance when the same qualification test is used across multiple types of tasks.On Amazon Mechanical Turk, we tested the… ▽ More

    Submitted 20 December, 2020; originally announced December 2020.

  46. arXiv:2010.13270  [pdf, ps, other

    eess.AS cs.CL cs.SD

    Improved Mask-CTC for Non-Autoregressive End-to-End ASR

    Authors: Yosuke Higuchi, Hirofumi Inaguma, Shinji Watanabe, Tetsuji Ogawa, Tetsunori Kobayashi

    Abstract: For real-world deployment of automatic speech recognition (ASR), the system is desired to be capable of fast inference while relieving the requirement of computational resources. The recently proposed end-to-end ASR system based on mask-predict with connectionist temporal classification (CTC), Mask-CTC, fulfills this demand by generating tokens in a non-autoregressive fashion. While Mask-CTC achie… ▽ More

    Submitted 16 February, 2021; v1 submitted 25 October, 2020; originally announced October 2020.

    Comments: Accepted to ICASSP2021

  47. arXiv:2005.08700  [pdf, other

    eess.AS cs.SD

    Mask CTC: Non-Autoregressive End-to-End ASR with CTC and Mask Predict

    Authors: Yosuke Higuchi, Shinji Watanabe, Nanxin Chen, Tetsuji Ogawa, Tetsunori Kobayashi

    Abstract: We present Mask CTC, a novel non-autoregressive end-to-end automatic speech recognition (ASR) framework, which generates a sequence by refining outputs of the connectionist temporal classification (CTC). Neural sequence-to-sequence models are usually \textit{autoregressive}: each output token is generated by conditioning on previously generated tokens, at the cost of requiring as many iterations a… ▽ More

    Submitted 17 August, 2020; v1 submitted 18 May, 2020; originally announced May 2020.

    Comments: Accepted to INTERSPEECH2020

  48. Building a Manga Dataset "Manga109" with Annotations for Multimedia Applications

    Authors: Kiyoharu Aizawa, Azuma Fujimoto, Atsushi Otsubo, Toru Ogawa, Yusuke Matsui, Koki Tsubota, Hikaru Ikuta

    Abstract: Manga, or comics, which are a type of multimodal artwork, have been left behind in the recent trend of deep learning applications because of the lack of a proper dataset. Hence, we built Manga109, a dataset consisting of a variety of 109 Japanese comic books (94 authors and 21,142 pages) and made it publicly available by obtaining author permissions for academic use. We carefully annotated the fra… ▽ More

    Submitted 12 May, 2020; v1 submitted 9 May, 2020; originally announced May 2020.

    Comments: 10 pages, 8 figures

    ACM Class: I.4

    Journal ref: IEEE MultiMedia 2020

  49. arXiv:2001.07761  [pdf, other

    cs.CV cs.LG eess.IV

    Block-wise Scrambled Image Recognition Using Adaptation Network

    Authors: Koki Madono, Masayuki Tanaka, Masaki Onishi, Tetsuji Ogawa

    Abstract: In this study, a perceptually hidden object-recognition method is investigated to generate secure images recognizable by humans but not machines. Hence, both the perceptual information hiding and the corresponding object recognition methods should be developed. Block-wise image scrambling is introduced to hide perceptual information from a third party. In addition, an adaptation network is propose… ▽ More

    Submitted 21 January, 2020; originally announced January 2020.

    Comments: 6 pages Artificial Intelligence of Things(AAAI-2020 WS)

  50. arXiv:1910.11534  [pdf, other

    cs.CV

    Team PFDet's Methods for Open Images Challenge 2019

    Authors: Yusuke Niitani, Toru Ogawa, Shuji Suzuki, Takuya Akiba, Tommi Kerola, Kohei Ozaki, Shotaro Sano

    Abstract: We present the instance segmentation and the object detection method used by team PFDet for Open Images Challenge 2019. We tackle a massive dataset size, huge class imbalance and federated annotations. Using this method, the team PFDet achieved 3rd and 4th place in the instance segmentation and the object detection track, respectively.

    Submitted 25 October, 2019; originally announced October 2019.