subscribe to arXiv mailings

Automated Detection of Algorithm Debt in Deep Learning Frameworks: An Empirical Study

Authors: Emmanuel Iko-Ojo Simon, Chirath Hettiarachchi, Alex Potanin, Hanna Suominen, Fatemeh Fard

Abstract: Context: Previous studies demonstrate that Machine or Deep Learning (ML/DL) models can detect Technical Debt from source code comments called Self-Admitted Technical Debt (SATD). Despite the importance of ML/DL in software development, limited studies focus on automated detection for new SATD types: Algorithm Debt (AD). AD detection is important because it helps to identify TD early, facilitating… ▽ More Context: Previous studies demonstrate that Machine or Deep Learning (ML/DL) models can detect Technical Debt from source code comments called Self-Admitted Technical Debt (SATD). Despite the importance of ML/DL in software development, limited studies focus on automated detection for new SATD types: Algorithm Debt (AD). AD detection is important because it helps to identify TD early, facilitating research, learning, and preventing the accumulation of issues related to model degradation and lack of scalability. Aim: Our goal is to improve AD detection performance of various ML/DL models. Method: We will perform empirical studies using approaches: TF-IDF, Count Vectorizer, Hash Vectorizer, and TD-indicative words to identify features that improve AD detection, using ML/DL classifiers with different data featurisations. We will use an existing dataset curated from seven DL frameworks where comments were manually classified as AD, Compatibility, Defect, Design, Documentation, Requirement, and Test Debt. We will explore various word embedding methods to further enrich features for ML models. These embeddings will be from models founded in DL such as ROBERTA, ALBERTv2, and large language models (LLMs): INSTRUCTOR and VOYAGE AI. We will enrich the dataset by incorporating AD-related terms, then train various ML/DL classifiers, Support Vector Machine, Logistic Regression, Random Forest, ROBERTA, and ALBERTv2. △ Less

Submitted 21 August, 2024; v1 submitted 20 August, 2024; originally announced August 2024.

Comments: Accepted as Continuity Acceptance (CA) for a Stage 1 registration of the Registered Report Track at 40th IEEE International Conference on Software Maintenance and Evolution (ICSME 2024), Flagstaff, USA, October 6-11, 2024

ACM Class: D.2.7; K.6.3

arXiv:2401.13825 [pdf, other]

RR Lyrae Stars Belonging to the Candidate Globular Cluster Patchick 99

Authors: Evan Butler, Andrea Kunder, Zdenek Prudil, Kevin R. Covey, Macy Ball, Carlos Campos, Kaylen Gollnick, Julio Olivares Carvajal, Joanne Hughes, Kathryn Devine, Christian I. Johnson, A. Katherina Vivas, Michael R. Rich, Meridith Joyce, Iulia T. Simon, Tommaso Marchetti, Andreas J. Koch-Hansen, William I. Clarkson, Rebekah Kuss

Abstract: Patchick 99 is a candidate globular cluster located in the direction of the Galactic bulge, with a proper motion almost identical to the field and extreme field star contamination. A recent analysis suggests it is a low-luminosity globular cluster with a population of RR Lyrae stars. We present new spectra of stars in and around Patchick 99, targeting specifically the 3 RR Lyrae stars associated w… ▽ More Patchick 99 is a candidate globular cluster located in the direction of the Galactic bulge, with a proper motion almost identical to the field and extreme field star contamination. A recent analysis suggests it is a low-luminosity globular cluster with a population of RR Lyrae stars. We present new spectra of stars in and around Patchick 99, targeting specifically the 3 RR Lyrae stars associated with the cluster as well as the other RR Lyrae stars in the field. A sample of 53 giant stars selected from proper motions and a position on CMD are also observed. The three RR Lyrae stars associated with the cluster have similar radial velocities and distances, and two of the targeted giants also have radial velocities in this velocity regime and [Fe/H] metallicities that are slightly more metal-poor than the field. Therefore, if Patchick 99 is a bonafide globular cluster, it would have a radial velocity of -92+/-10 km s-1, a distance of 6.7+/-0.4 kpc (as determined from the RR Lyrae stars), and an orbit that confines it to the inner bulge. △ Less

Submitted 25 January, 2024; v1 submitted 24 January, 2024; originally announced January 2024.

Comments: Accepted to The Astrophysical Journal Letters. Replaced due to a typo in the title

arXiv:2311.06453 [pdf, other]

DocGen: Generating Detailed Parameter Docstrings in Python

Authors: Vatsal Venkatkrishna, Durga Shree Nagabushanam, Emmanuel Iko-Ojo Simon, Melina Vidoni

Abstract: Documentation debt hinders the effective utilization of open-source software. Although code summarization tools have been helpful for developers, most would prefer a detailed account of each parameter in a function rather than a high-level summary. However, generating such a summary is too intricate for a single generative model to produce reliably due to the lack of high-quality training data. Th… ▽ More Documentation debt hinders the effective utilization of open-source software. Although code summarization tools have been helpful for developers, most would prefer a detailed account of each parameter in a function rather than a high-level summary. However, generating such a summary is too intricate for a single generative model to produce reliably due to the lack of high-quality training data. Thus, we propose a multi-step approach that combines multiple task-specific models, each adept at producing a specific section of a docstring. The combination of these models ensures the inclusion of each section in the final docstring. We compared the results from our approach with existing generative models using both automatic metrics and a human-centred evaluation with 17 participating developers, which proves the superiority of our approach over existing methods. △ Less

Submitted 17 November, 2023; v1 submitted 10 November, 2023; originally announced November 2023.

arXiv:2301.12662 [pdf, other]

SingSong: Generating musical accompaniments from singing

Authors: Chris Donahue, Antoine Caillon, Adam Roberts, Ethan Manilow, Philippe Esling, Andrea Agostinelli, Mauro Verzetti, Ian Simon, Olivier Pietquin, Neil Zeghidour, Jesse Engel

Abstract: We present SingSong, a system that generates instrumental music to accompany input vocals, potentially offering musicians and non-musicians alike an intuitive new way to create music featuring their own voice. To accomplish this, we build on recent developments in musical source separation and audio generation. Specifically, we apply a state-of-the-art source separation algorithm to a large corpus… ▽ More We present SingSong, a system that generates instrumental music to accompany input vocals, potentially offering musicians and non-musicians alike an intuitive new way to create music featuring their own voice. To accomplish this, we build on recent developments in musical source separation and audio generation. Specifically, we apply a state-of-the-art source separation algorithm to a large corpus of music audio to produce aligned pairs of vocals and instrumental sources. Then, we adapt AudioLM (Borsos et al., 2022) -- a state-of-the-art approach for unconditional audio generation -- to be suitable for conditional "audio-to-audio" generation tasks, and train it on the source-separated (vocal, instrumental) pairs. In a pairwise comparison with the same vocal inputs, listeners expressed a significant preference for instrumentals generated by SingSong compared to those from a strong retrieval baseline. Sound examples at https://g.co/magenta/singsong △ Less

Submitted 29 January, 2023; originally announced January 2023.

arXiv:2209.14458 [pdf, other]

The Chamber Ensemble Generator: Limitless High-Quality MIR Data via Generative Modeling

Authors: Yusong Wu, Josh Gardner, Ethan Manilow, Ian Simon, Curtis Hawthorne, Jesse Engel

Abstract: Data is the lifeblood of modern machine learning systems, including for those in Music Information Retrieval (MIR). However, MIR has long been mired by small datasets and unreliable labels. In this work, we propose to break this bottleneck using generative modeling. By pipelining a generative model of notes (Coconet trained on Bach Chorales) with a structured synthesis model of chamber ensembles (… ▽ More Data is the lifeblood of modern machine learning systems, including for those in Music Information Retrieval (MIR). However, MIR has long been mired by small datasets and unreliable labels. In this work, we propose to break this bottleneck using generative modeling. By pipelining a generative model of notes (Coconet trained on Bach Chorales) with a structured synthesis model of chamber ensembles (MIDI-DDSP trained on URMP), we demonstrate a system capable of producing unlimited amounts of realistic chorale music with rich annotations including mixes, stems, MIDI, note-level performance attributes (staccato, vibrato, etc.), and even fine-grained synthesis parameters (pitch, amplitude, etc.). We call this system the Chamber Ensemble Generator (CEG), and use it to generate a large dataset of chorales from four different chamber ensembles (CocoChorales). We demonstrate that data generated using our approach improves state-of-the-art models for music transcription and source separation, and we release both the system and the dataset as an open-source foundation for future work in the MIR community. △ Less

Submitted 28 September, 2022; originally announced September 2022.

arXiv:2206.13380 [pdf]

Properties of the Nili Fossae Olivine-clay-carbonate lithology: orbital and in situ at Séítah

Authors: Adrian J. Brown, Linda Kah, Lucia Mandon, Roger Wiens, Patrick Pinet, Elise Clavé, Stéphane Le Mouélic, Arya Udry, Patrick J. Gasda, Clément Royer, Keyron Hickman-Lewis11, Agnes Cousin, Justin I. Simon, Jade Comellas14, Edward Cloutis, Thierry Fouchet, Alberto G. Fairén, Stephanie Connell, David Flannery, Briony Horgan, Lisa Mayhew, Allan Treiman, Jorge I. Núñez, Brittan Wogsland, Karim Benzerara , et al. (9 additional authors not shown)

Abstract: We examine the observed properties of the Nili Fossae olivine-clay-carbonate lithology from orbital data and in situ by the Mars 2020 rover at the Séítah unit in Jezero crater, including: 1) composition (Liu, 2022) 2) grain size (Tice, 2022) 3) inferred viscosity (calculated based on geochemistry collected by SuperCam (Wiens, 2022)). Based on the low viscosity and distribution of the unit we postu… ▽ More We examine the observed properties of the Nili Fossae olivine-clay-carbonate lithology from orbital data and in situ by the Mars 2020 rover at the Séítah unit in Jezero crater, including: 1) composition (Liu, 2022) 2) grain size (Tice, 2022) 3) inferred viscosity (calculated based on geochemistry collected by SuperCam (Wiens, 2022)). Based on the low viscosity and distribution of the unit we postulate a flood lava origin for the olivine-clay-carbonate at Séítah. We include a new CRISM map of the clay 2.38 μm band and use in situ data to show that the clay in the olivine cumulate in the Séítah formation is consistent with talc or serpentine from Mars 2020 SuperCam LIBS and VISIR and MastCam-Z observations. We discuss two intertwining aspects of the history of the lithology: 1) the emplacement and properties of the cumulate layer within a lava lake, based on terrestrial analogs in the Pilbara, Western Australia, and using previously published models of flood lavas and lava lakes, and 2) the limited extent of post emplacement alteration, including clay and carbonate alteration (Clave, 2022; Mandon, 2022). △ Less

Submitted 27 June, 2022; originally announced June 2022.

Comments: 34 pages, 15 figures

arXiv:2206.05408 [pdf, other]

Multi-instrument Music Synthesis with Spectrogram Diffusion

Authors: Curtis Hawthorne, Ian Simon, Adam Roberts, Neil Zeghidour, Josh Gardner, Ethan Manilow, Jesse Engel

Abstract: An ideal music synthesizer should be both interactive and expressive, generating high-fidelity audio in realtime for arbitrary combinations of instruments and notes. Recent neural synthesizers have exhibited a tradeoff between domain-specific models that offer detailed control of only specific instruments, or raw waveform models that can train on any music but with minimal control and slow generat… ▽ More An ideal music synthesizer should be both interactive and expressive, generating high-fidelity audio in realtime for arbitrary combinations of instruments and notes. Recent neural synthesizers have exhibited a tradeoff between domain-specific models that offer detailed control of only specific instruments, or raw waveform models that can train on any music but with minimal control and slow generation. In this work, we focus on a middle ground of neural synthesizers that can generate audio from MIDI sequences with arbitrary combinations of instruments in realtime. This enables training on a wide range of transcription datasets with a single model, which in turn offers note-level control of composition and instrumentation across a wide range of instruments. We use a simple two-stage process: MIDI to spectrograms with an encoder-decoder Transformer, then spectrograms to audio with a generative adversarial network (GAN) spectrogram inverter. We compare training the decoder as an autoregressive model and as a Denoising Diffusion Probabilistic Model (DDPM) and find that the DDPM approach is superior both qualitatively and as measured by audio reconstruction and Fréchet distance metrics. Given the interactivity and generality of this approach, we find this to be a promising first step towards interactive and expressive neural synthesis for arbitrary combinations of instruments and notes. △ Less

Submitted 12 December, 2022; v1 submitted 10 June, 2022; originally announced June 2022.

arXiv:2202.07765 [pdf, other]

General-purpose, long-context autoregressive modeling with Perceiver AR

Authors: Curtis Hawthorne, Andrew Jaegle, Cătălina Cangea, Sebastian Borgeaud, Charlie Nash, Mateusz Malinowski, Sander Dieleman, Oriol Vinyals, Matthew Botvinick, Ian Simon, Hannah Sheahan, Neil Zeghidour, Jean-Baptiste Alayrac, João Carreira, Jesse Engel

Abstract: Real-world data is high-dimensional: a book, image, or musical performance can easily contain hundreds of thousands of elements even after compression. However, the most commonly used autoregressive models, Transformers, are prohibitively expensive to scale to the number of inputs and layers needed to capture this long-range structure. We develop Perceiver AR, an autoregressive, modality-agnostic… ▽ More Real-world data is high-dimensional: a book, image, or musical performance can easily contain hundreds of thousands of elements even after compression. However, the most commonly used autoregressive models, Transformers, are prohibitively expensive to scale to the number of inputs and layers needed to capture this long-range structure. We develop Perceiver AR, an autoregressive, modality-agnostic architecture which uses cross-attention to map long-range inputs to a small number of latents while also maintaining end-to-end causal masking. Perceiver AR can directly attend to over a hundred thousand tokens, enabling practical long-context density estimation without the need for hand-crafted sparsity patterns or memory mechanisms. When trained on images or music, Perceiver AR generates outputs with clear long-term coherence and structure. Our architecture also obtains state-of-the-art likelihood on long-sequence benchmarks, including 64 x 64 ImageNet images and PG-19 books. △ Less

Submitted 14 June, 2022; v1 submitted 15 February, 2022; originally announced February 2022.

Comments: ICML 2022

arXiv:2111.03017 [pdf, other]

MT3: Multi-Task Multitrack Music Transcription

Authors: Josh Gardner, Ian Simon, Ethan Manilow, Curtis Hawthorne, Jesse Engel

Abstract: Automatic Music Transcription (AMT), inferring musical notes from raw audio, is a challenging task at the core of music understanding. Unlike Automatic Speech Recognition (ASR), which typically focuses on the words of a single speaker, AMT often requires transcribing multiple instruments simultaneously, all while preserving fine-scale pitch and timing information. Further, many AMT datasets are "l… ▽ More Automatic Music Transcription (AMT), inferring musical notes from raw audio, is a challenging task at the core of music understanding. Unlike Automatic Speech Recognition (ASR), which typically focuses on the words of a single speaker, AMT often requires transcribing multiple instruments simultaneously, all while preserving fine-scale pitch and timing information. Further, many AMT datasets are "low-resource", as even expert musicians find music transcription difficult and time-consuming. Thus, prior work has focused on task-specific architectures, tailored to the individual instruments of each task. In this work, motivated by the promising results of sequence-to-sequence transfer learning for low-resource Natural Language Processing (NLP), we demonstrate that a general-purpose Transformer model can perform multi-task AMT, jointly transcribing arbitrary combinations of musical instruments across several transcription datasets. We show this unified training framework achieves high-quality transcription results across a range of datasets, dramatically improving performance for low-resource instruments (such as guitar), while preserving strong performance for abundant instruments (such as piano). Finally, by expanding the scope of AMT, we expose the need for more consistent evaluation metrics and better dataset alignment, and provide a strong baseline for this new direction of multi-task AMT. △ Less

Submitted 15 March, 2022; v1 submitted 4 November, 2021; originally announced November 2021.

Comments: ICLR 2022 camera-ready version

arXiv:2107.09142 [pdf, other]

Sequence-to-Sequence Piano Transcription with Transformers

Authors: Curtis Hawthorne, Ian Simon, Rigel Swavely, Ethan Manilow, Jesse Engel

Abstract: Automatic Music Transcription has seen significant progress in recent years by training custom deep neural networks on large datasets. However, these models have required extensive domain-specific design of network architectures, input/output representations, and complex decoding schemes. In this work, we show that equivalent performance can be achieved using a generic encoder-decoder Transformer… ▽ More Automatic Music Transcription has seen significant progress in recent years by training custom deep neural networks on large datasets. However, these models have required extensive domain-specific design of network architectures, input/output representations, and complex decoding schemes. In this work, we show that equivalent performance can be achieved using a generic encoder-decoder Transformer with standard decoding methods. We demonstrate that the model can learn to translate spectrogram inputs directly to MIDI-like output events for several transcription tasks. This sequence-to-sequence approach simplifies transcription by jointly modeling audio features and language-like output dependencies, thus removing the need for task-specific architectures. These results point toward possibilities for creating new Music Information Retrieval models by focusing on dataset creation and labeling rather than custom model design. △ Less

Submitted 19 July, 2021; originally announced July 2021.

arXiv:2103.16091 [pdf, other]

Symbolic Music Generation with Diffusion Models

Authors: Gautam Mittal, Jesse Engel, Curtis Hawthorne, Ian Simon

Abstract: Score-based generative models and diffusion probabilistic models have been successful at generating high-quality samples in continuous domains such as images and audio. However, due to their Langevin-inspired sampling mechanisms, their application to discrete and sequential data has been limited. In this work, we present a technique for training diffusion models on sequential data by parameterizin… ▽ More Score-based generative models and diffusion probabilistic models have been successful at generating high-quality samples in continuous domains such as images and audio. However, due to their Langevin-inspired sampling mechanisms, their application to discrete and sequential data has been limited. In this work, we present a technique for training diffusion models on sequential data by parameterizing the discrete domain in the continuous latent space of a pre-trained variational autoencoder. Our method is non-autoregressive and learns to generate sequences of latent embeddings through the reverse process and offers parallel generation with a constant number of iterative refinement steps. We apply this technique to modeling symbolic music and show strong unconditional generation and post-hoc conditional infilling results compared to autoregressive language models operating over the same continuous embeddings. △ Less

Submitted 25 November, 2021; v1 submitted 30 March, 2021; originally announced March 2021.

Comments: ISMIR 2021

arXiv:2008.01100 [pdf, ps, other]

A Search for Light Hydrides in the Envelopes of Evolved Stars

Authors: Mark A. Siebert, Ignacio Simon, Christopher N. Shingledecker, P. Brandon Carroll, Andrew M. Burkhardt, Shawn Thomas Booth, Anthony J. Remijan, Rebeca Aladro, Carlos A. Duran, Brett A. McGuire

Abstract: We report a search for the diatomic hydrides SiH, PH, and FeH along the line of sight toward the chemically rich circumstellar envelopes of IRC+10216 and VY Canis Majoris. These molecules are thought to form in high temperature regions near the photospheres of these stars, and may then further react via gas-phase and dust-grain interactions leading to more complex species, but have yet to be const… ▽ More We report a search for the diatomic hydrides SiH, PH, and FeH along the line of sight toward the chemically rich circumstellar envelopes of IRC+10216 and VY Canis Majoris. These molecules are thought to form in high temperature regions near the photospheres of these stars, and may then further react via gas-phase and dust-grain interactions leading to more complex species, but have yet to be constrained by observation. We used the GREAT spectrometer on SOFIA to search for rotational emission lines of these molecules in four spectral windows ranging from 600 GHz to 1500 GHz. Though none of the targeted species were detected in our search, we report their upper limit abundances in each source and discuss how they influence the current understanding of hydride chemistry in dense circumstellar media. We attribute the non-detections of these hydrides to their compact source sizes, high barriers of formation, and proclivity to react with other molecules in the winds. △ Less

Submitted 18 August, 2020; v1 submitted 3 August, 2020; originally announced August 2020.

Comments: Accepted for publication in ApJ. 14 pages, 4 figures, 3 tables

arXiv:1912.05537 [pdf, other]

Encoding Musical Style with Transformer Autoencoders

Authors: Kristy Choi, Curtis Hawthorne, Ian Simon, Monica Dinculescu, Jesse Engel

Abstract: We consider the problem of learning high-level controls over the global structure of generated sequences, particularly in the context of symbolic music generation with complex language models. In this work, we present the Transformer autoencoder, which aggregates encodings of the input data across time to obtain a global representation of style from a given performance. We show it is possible to c… ▽ More We consider the problem of learning high-level controls over the global structure of generated sequences, particularly in the context of symbolic music generation with complex language models. In this work, we present the Transformer autoencoder, which aggregates encodings of the input data across time to obtain a global representation of style from a given performance. We show it is possible to combine this global representation with other temporally distributed embeddings, enabling improved control over the separate aspects of performance style and melody. Empirically, we demonstrate the effectiveness of our method on various music generation tasks on the MAESTRO dataset and a YouTube dataset with 10,000+ hours of piano performances, where we achieve improvements in terms of log-likelihood and mean listening scores as compared to baselines. △ Less

Submitted 30 June, 2020; v1 submitted 10 December, 2019; originally announced December 2019.

arXiv:1906.02667 [pdf, other]

Application of Machine Learning to accidents detection at directional drilling

Authors: Ekaterina Gurina, Nikita Klyuchnikov, Alexey Zaytsev, Evgenya Romanenkova, Ksenia Antipova, Igor Simon, Victor Makarov, Dmitry Koroteev

Abstract: We present a data-driven algorithm and mathematical model for anomaly alarming at directional drilling. The algorithm is based on machine learning. It compares the real-time drilling telemetry with one corresponding to past accidents and analyses the level of similarity. The model performs a time-series comparison using aggregated statistics and Gradient Boosting classification. It is trained on h… ▽ More We present a data-driven algorithm and mathematical model for anomaly alarming at directional drilling. The algorithm is based on machine learning. It compares the real-time drilling telemetry with one corresponding to past accidents and analyses the level of similarity. The model performs a time-series comparison using aggregated statistics and Gradient Boosting classification. It is trained on historical data containing the drilling telemetry of $80$ wells drilled within $19$ oilfields. The model can detect an anomaly and identify its type by comparing the real-time measurements while drilling with the ones from the database of past accidents. Validation tests show that our algorithm identifies half of the anomalies with about $0.53$ false alarms per day on average. The model performance ensures sufficient time and cost savings as it enables partial prevention of the failures and accidents at the well construction. △ Less

Submitted 12 December, 2019; v1 submitted 6 June, 2019; originally announced June 2019.

arXiv:1903.11436 [pdf, other]

doi 10.1109/LGRS.2019.2959845

Real-time data-driven detection of the rock type alteration during a directional drilling

Authors: Evgenya Romanenkova, Alexey Zaytsev, Nikita Klyuchnikov, Arseniy Gruzdev, Ksenia Antipova, Leyla Ismailova, Evgeny Burnaev, Artyom Semenikhin, Vitaliy Koryabkin, Igor Simon, Dmitry Koroteev

Abstract: During the directional drilling, a bit may sometimes go to a nonproductive rock layer due to the gap about 20m between the bit and high-fidelity rock type sensors. The only way to detect the lithotype changes in time is the usage of Measurements While Drilling (MWD) data. However, there are no general mathematical modeling approaches that both well reconstruct the rock type based on MWD data and c… ▽ More During the directional drilling, a bit may sometimes go to a nonproductive rock layer due to the gap about 20m between the bit and high-fidelity rock type sensors. The only way to detect the lithotype changes in time is the usage of Measurements While Drilling (MWD) data. However, there are no general mathematical modeling approaches that both well reconstruct the rock type based on MWD data and correspond to specifics of the oil and gas industry. In this article, we present a data-driven procedure that utilizes MWD data for quick detection of changes in rock type. We propose the approach that combines traditional machine learning based on the solution of the rock type classification problem with change detection procedures rarely used before in the Oil\&Gas industry. The data come from a newly developed oilfield in the north of western Siberia. The results suggest that we can detect a significant part of changes in rock type reducing the change detection delay from $20$ to $1.8$ meters and the number of false-positive alarms from $43$ to $6$ per well. △ Less

Submitted 12 December, 2019; v1 submitted 27 March, 2019; originally announced March 2019.

arXiv:1810.12247 [pdf, other]

Enabling Factorized Piano Music Modeling and Generation with the MAESTRO Dataset

Authors: Curtis Hawthorne, Andriy Stasyuk, Adam Roberts, Ian Simon, Cheng-Zhi Anna Huang, Sander Dieleman, Erich Elsen, Jesse Engel, Douglas Eck

Abstract: Generating musical audio directly with neural networks is notoriously difficult because it requires coherently modeling structure at many different timescales. Fortunately, most music is also highly structured and can be represented as discrete note events played on musical instruments. Herein, we show that by using notes as an intermediate representation, we can train a suite of models capable of… ▽ More Generating musical audio directly with neural networks is notoriously difficult because it requires coherently modeling structure at many different timescales. Fortunately, most music is also highly structured and can be represented as discrete note events played on musical instruments. Herein, we show that by using notes as an intermediate representation, we can train a suite of models capable of transcribing, composing, and synthesizing audio waveforms with coherent musical structure on timescales spanning six orders of magnitude (~0.1 ms to ~100 s), a process we call Wave2Midi2Wave. This large advance in the state of the art is enabled by our release of the new MAESTRO (MIDI and Audio Edited for Synchronous TRacks and Organization) dataset, composed of over 172 hours of virtuosic piano performances captured with fine alignment (~3 ms) between note labels and audio waveforms. The networks and the dataset together present a promising approach toward creating new expressive and interpretable neural models of music. △ Less

Submitted 17 January, 2019; v1 submitted 29 October, 2018; originally announced October 2018.

Comments: Examples available at https://goo.gl/magenta/maestro-examples

arXiv:1810.05246 [pdf, other]

doi 10.1145/3301275.3302288

Piano Genie

Authors: Chris Donahue, Ian Simon, Sander Dieleman

Abstract: We present Piano Genie, an intelligent controller which allows non-musicians to improvise on the piano. With Piano Genie, a user performs on a simple interface with eight buttons, and their performance is decoded into the space of plausible piano music in real time. To learn a suitable mapping procedure for this problem, we train recurrent neural network autoencoders with discrete bottlenecks: an… ▽ More We present Piano Genie, an intelligent controller which allows non-musicians to improvise on the piano. With Piano Genie, a user performs on a simple interface with eight buttons, and their performance is decoded into the space of plausible piano music in real time. To learn a suitable mapping procedure for this problem, we train recurrent neural network autoencoders with discrete bottlenecks: an encoder learns an appropriate sequence of buttons corresponding to a piano piece, and a decoder learns to map this sequence back to the original piece. During performance, we substitute a user's input for the encoder output, and play the decoder's prediction each time the user presses a button. To improve the intuitiveness of Piano Genie's performance behavior, we impose musically meaningful constraints over the encoder's outputs. △ Less

Submitted 22 March, 2019; v1 submitted 11 October, 2018; originally announced October 2018.

Comments: Published as a conference paper at ACM IUI 2019

arXiv:1809.04281 [pdf, other]

Music Transformer

Authors: Cheng-Zhi Anna Huang, Ashish Vaswani, Jakob Uszkoreit, Noam Shazeer, Ian Simon, Curtis Hawthorne, Andrew M. Dai, Matthew D. Hoffman, Monica Dinculescu, Douglas Eck

Abstract: Music relies heavily on repetition to build structure and meaning. Self-reference occurs on multiple timescales, from motifs to phrases to reusing of entire sections of music, such as in pieces with ABA structure. The Transformer (Vaswani et al., 2017), a sequence model based on self-attention, has achieved compelling results in many generation tasks that require maintaining long-range coherence.… ▽ More Music relies heavily on repetition to build structure and meaning. Self-reference occurs on multiple timescales, from motifs to phrases to reusing of entire sections of music, such as in pieces with ABA structure. The Transformer (Vaswani et al., 2017), a sequence model based on self-attention, has achieved compelling results in many generation tasks that require maintaining long-range coherence. This suggests that self-attention might also be well-suited to modeling music. In musical composition and performance, however, relative timing is critically important. Existing approaches for representing relative positional information in the Transformer modulate attention based on pairwise distance (Shaw et al., 2018). This is impractical for long sequences such as musical compositions since their memory complexity for intermediate relative information is quadratic in the sequence length. We propose an algorithm that reduces their intermediate memory requirement to linear in the sequence length. This enables us to demonstrate that a Transformer with our modified relative attention mechanism can generate minute-long compositions (thousands of steps, four times the length modeled in Oore et al., 2018) with compelling structure, generate continuations that coherently elaborate on a given motif, and in a seq2seq setup generate accompaniments conditioned on melodies. We evaluate the Transformer with our relative attention mechanism on two datasets, JSB Chorales and Piano-e-Competition, and obtain state-of-the-art results on the latter. △ Less

Submitted 12 December, 2018; v1 submitted 12 September, 2018; originally announced September 2018.

Comments: Improved skewing section and accompanying figures. Previous titles are "An Improved Relative Self-Attention Mechanism for Transformer with Application to Music Generation" and "Music Transformer"

arXiv:1808.03715 [pdf, ps, other]

This Time with Feeling: Learning Expressive Musical Performance

Authors: Sageev Oore, Ian Simon, Sander Dieleman, Douglas Eck, Karen Simonyan

Abstract: Music generation has generally been focused on either creating scores or interpreting them. We discuss differences between these two problems and propose that, in fact, it may be valuable to work in the space of direct $\it performance$ generation: jointly predicting the notes $\it and$ $\it also$ their expressive timing and dynamics. We consider the significance and qualities of the data set need… ▽ More Music generation has generally been focused on either creating scores or interpreting them. We discuss differences between these two problems and propose that, in fact, it may be valuable to work in the space of direct $\it performance$ generation: jointly predicting the notes $\it and$ $\it also$ their expressive timing and dynamics. We consider the significance and qualities of the data set needed for this. Having identified both a problem domain and characteristics of an appropriate data set, we show an LSTM-based recurrent network model that subjectively performs quite well on this task. Critically, we provide generated examples. We also include feedback from professional composers and musicians about some of these examples. △ Less

Submitted 10 August, 2018; originally announced August 2018.

Comments: Includes links to urls for audio samples

arXiv:1806.03218 [pdf, other]

Data-driven model for the identification of the rock type at a drilling bit

Authors: Nikita Klyuchnikov, Alexey Zaytsev, Arseniy Gruzdev, Georgiy Ovchinnikov, Ksenia Antipova, Leyla Ismailova, Ekaterina Muravleva, Evgeny Burnaev, Artyom Semenikhin, Alexey Cherepanov, Vitaliy Koryabkin, Igor Simon, Alexey Tsurgan, Fedor Krasnov, Dmitry Koroteev

Abstract: Directional oil well drilling requires high precision of the wellbore positioning inside the productive area. However, due to specifics of engineering design, sensors that explicitly determine the type of the drilled rock are located farther than 15m from the drilling bit. As a result, the target area runaways can be detected only after this distance, which in turn, leads to a loss in well product… ▽ More Directional oil well drilling requires high precision of the wellbore positioning inside the productive area. However, due to specifics of engineering design, sensors that explicitly determine the type of the drilled rock are located farther than 15m from the drilling bit. As a result, the target area runaways can be detected only after this distance, which in turn, leads to a loss in well productivity and the risk of the need for an expensive re-boring operation. We present a novel approach for identifying rock type at the drilling bit based on machine learning classification methods and data mining on sensors readings. We compare various machine-learning algorithms, examine extra features coming from mathematical modeling of drilling mechanics, and show that the real-time rock type classification error can be reduced from 13.5 % to 9 %. The approach is applicable for precise directional drilling in relatively thin target intervals of complex shapes and generalizes appropriately to new wells that are different from the ones used for training the machine learning model. △ Less

Submitted 25 March, 2019; v1 submitted 8 June, 2018; originally announced June 2018.

arXiv:1806.00195 [pdf, other]

Learning a Latent Space of Multitrack Measures

Authors: Ian Simon, Adam Roberts, Colin Raffel, Jesse Engel, Curtis Hawthorne, Douglas Eck

Abstract: Discovering and exploring the underlying structure of multi-instrumental music using learning-based approaches remains an open problem. We extend the recent MusicVAE model to represent multitrack polyphonic measures as vectors in a latent space. Our approach enables several useful operations such as generating plausible measures from scratch, interpolating between measures in a musically meaningfu… ▽ More Discovering and exploring the underlying structure of multi-instrumental music using learning-based approaches remains an open problem. We extend the recent MusicVAE model to represent multitrack polyphonic measures as vectors in a latent space. Our approach enables several useful operations such as generating plausible measures from scratch, interpolating between measures in a musically meaningful way, and manipulating specific musical attributes. We also introduce chord conditioning, which allows all of these operations to be performed while keeping harmony fixed, and allows chords to be changed while maintaining musical "style". By generating a sequence of measures over a predefined chord progression, our model can produce music with convincing long-term structure. We demonstrate that our latent space model makes it possible to intuitively control and generate musical sequences with rich instrumentation (see https://goo.gl/s2N7dV for generated audio). △ Less

Submitted 1 June, 2018; originally announced June 2018.

arXiv:1712.10326 [pdf, ps, other]

Strong convergence of two--dimensional Vilenkin-Fourier series

Authors: N. Memiæ, I. Simon, G. Tephnadze

Abstract: We prove that certain means of the quadratical partial sums of the two-dimensional Vilenkin-Fourier series are uniformly bounded operators from the Hardy space $H_{p}$ to the space $L_{p}$ for $0<p\leq 1.$ We also prove that the sequence in the denominator cannot be improved. We prove that certain means of the quadratical partial sums of the two-dimensional Vilenkin-Fourier series are uniformly bounded operators from the Hardy space $H_{p}$ to the space $L_{p}$ for $0<p\leq 1.$ We also prove that the sequence in the denominator cannot be improved. △ Less

Submitted 15 December, 2017; originally announced December 2017.

MSC Class: 42C10

arXiv:1710.11153 [pdf, other]

Onsets and Frames: Dual-Objective Piano Transcription

Authors: Curtis Hawthorne, Erich Elsen, Jialin Song, Adam Roberts, Ian Simon, Colin Raffel, Jesse Engel, Sageev Oore, Douglas Eck

Abstract: We advance the state of the art in polyphonic piano music transcription by using a deep convolutional and recurrent neural network which is trained to jointly predict onsets and frames. Our model predicts pitch onset events and then uses those predictions to condition framewise pitch predictions. During inference, we restrict the predictions from the framewise detector by not allowing a new note t… ▽ More We advance the state of the art in polyphonic piano music transcription by using a deep convolutional and recurrent neural network which is trained to jointly predict onsets and frames. Our model predicts pitch onset events and then uses those predictions to condition framewise pitch predictions. During inference, we restrict the predictions from the framewise detector by not allowing a new note to start unless the onset detector also agrees that an onset for that pitch is present in the frame. We focus on improving onsets and offsets together instead of either in isolation as we believe this correlates better with human musical perception. Our approach results in over a 100% relative improvement in note F1 score (with offsets) on the MAPS dataset. Furthermore, we extend the model to predict relative velocities of normalized audio which results in more natural-sounding transcriptions. △ Less

Submitted 5 June, 2018; v1 submitted 30 October, 2017; originally announced October 2017.

Comments: Examples available at https://goo.gl/magenta/onsets-frames-examples

arXiv:1606.09092 [pdf, ps, other]

Density of the span of powers of a function à la Müntz-Szasz

Authors: Philippe Jaming, Ilona Simon

Abstract: The aim of this paper is to establish density properties in $L^p$ spaces of the span of powers of functions $\{ψ^λ\,:λ\inΛ\}$, $Λ\subset\N$ in the spirit of the Müntz-Szász Theorem. As density is almost never achieved, we further investigate the density of powers and a modulation of powers $\{ψ^λ,ψ^λe^{iαt}\,:λ\inΛ\}$. Finally, we establish a Müntz-Szász Theorem for density of translates of powers… ▽ More The aim of this paper is to establish density properties in $L^p$ spaces of the span of powers of functions $\{ψ^λ\,:λ\inΛ\}$, $Λ\subset\N$ in the spirit of the Müntz-Szász Theorem. As density is almost never achieved, we further investigate the density of powers and a modulation of powers $\{ψ^λ,ψ^λe^{iαt}\,:λ\inΛ\}$. Finally, we establish a Müntz-Szász Theorem for density of translates of powers of cosines $\{\cos^λ(t-θ\_1),\cos^λ(t-θ\_2)\,:λ\inΛ\}$. Under some arithmetic restrictions on $θ\_1-θ\_2$, we show that density is equivalent to a Müntz-Szász condition on $Λ$ and we conjecture that those arithmetic restrictions are not needed.Some links are also established with the recently introduced concept of Heisenberg Uniqueness Pairs. △ Less

Submitted 29 June, 2016; originally announced June 2016.

arXiv:1312.3919 [pdf, other]

doi 10.2478/s11534-014-0497-0

Self-regulating genes. Exact steady state solution by using Poisson Representation

Authors: Istvan P. Sugar, Istvan Simon

Abstract: Systems biology studies the structure and behavior of complex gene regulatory networks. One of its aims is to develop a quantitative understanding of the modular components that constitute such networks. The self-regulating gene is a type of auto regulatory genetic modules which appears in over 40% of known transcription factors in E. coli. In this work, using the technique of Poisson Representati… ▽ More Systems biology studies the structure and behavior of complex gene regulatory networks. One of its aims is to develop a quantitative understanding of the modular components that constitute such networks. The self-regulating gene is a type of auto regulatory genetic modules which appears in over 40% of known transcription factors in E. coli. In this work, using the technique of Poisson Representation, we are able to provide exact steady state solutions for this feedback model. By using the methods of synthetic biology (P.E.M. Purnick and Weiss, R., Nature Reviews, Molecular Cell Biology, 2009, 10: 410-422) one can build the system itself from modules like this. △ Less

Submitted 18 December, 2013; v1 submitted 13 December, 2013; originally announced December 2013.

Comments: 10 pages, 2 figures, 1 table, 1 supplemental material (9 pages); additional reference to the work of Grima et al

arXiv:q-bio/0609026 [pdf]

doi 10.1529/biophysj.106.094995

Mechanisms of B cell Synapse Formation Predicted by Stochastic Simulation

Authors: Philippos K. Tsourkas, Nicole Baumgarth, Scott I. Simon, Subhadip Raychaudhuri

Abstract: The clustering of B cell receptor (BCR) molecules and the formation of the protein segregation structure known as the immunological synapse appears to precede antigen (Ag) uptake by B cells. The mature B cell synapse is characterized by a central cluster of BCR/Ag molecular complexes surrounded by a ring of LFA-1/ICAM-1 complexes. Recent experimental evidence shows receptor clustering in B cells… ▽ More The clustering of B cell receptor (BCR) molecules and the formation of the protein segregation structure known as the immunological synapse appears to precede antigen (Ag) uptake by B cells. The mature B cell synapse is characterized by a central cluster of BCR/Ag molecular complexes surrounded by a ring of LFA-1/ICAM-1 complexes. Recent experimental evidence shows receptor clustering in B cells can occur via mechanical or signaling-driven processes. An alternative mechanism of diffusion and affinity-dependent binding has been proposed to explain synapse formation in the absence of signaling-driven processes. In this work, we investigated the biophysical mechanisms that drive immunological synapse formation in B cells across the physiological range of BCR affinity (KA~10^6-10^10 M-1) through computational modeling. Our computational approach is based on stochastic simulation of diffusion and reaction events with a clearly defined mapping between probabilistic parameters of our model and their physical equivalents. We show that a diffusion-and-binding mechanism is sufficient to drive synapse formation only at low BCR affinity and for a relatively stiff B cell membrane that undergoes little deformation. We thus predict the need for alternative mechanisms: a difference in the mechanical properties of BCR/Ag and LFA-1/ICAM-1 bonds and/or signaling driven processes. △ Less

Submitted 19 October, 2006; v1 submitted 17 September, 2006; originally announced September 2006.

Comments: 35 pages, 11 figures; Supplemental Materials added

Showing 1–26 of 26 results for author: Simon, I