subscribe to arXiv mailings

arXiv:2409.16078 [pdf, other]

Assessing strategies to manage distributed photovoltaics in Swiss low-voltage networks: An analysis of curtailment, export tariffs, and resource sharing

Authors: Alejandro Pena-Bello, Gerard Marias Gonzalez, Nicolas Wyrsch, Christophe Ballif

Abstract: The integration of photovoltaic systems poses several challenges for the distribution grid, mainly due to the infrastructure not being designed to handle the upstream flow and being dimensioned for consumption only, potentially leading to reliability and stability issues. This study investigates the use of capacity-based tariffs, export tariffs, and curtailment policies to reduce negative grid imp… ▽ More The integration of photovoltaic systems poses several challenges for the distribution grid, mainly due to the infrastructure not being designed to handle the upstream flow and being dimensioned for consumption only, potentially leading to reliability and stability issues. This study investigates the use of capacity-based tariffs, export tariffs, and curtailment policies to reduce negative grid impacts without hampering PV deployment. We analyze the effect of such export tariffs on three typical Swiss low-voltage networks (rural, semi-urban, and urban), using power flow analysis to evaluate the power exchanges at the transformer station, as well as line overloading and voltage violations. Finally, a simple case of mutualization of resources is analyzed to assess its potential contribution to relieving network constraints and the economic costs of managing LV networks. We found that the tariff with capacity-based components on the export (CT export daily) severely penalizes PV penetration. This applies to other tariffs as well (e.g. IRR monthly, Curtailment 30, and DT variable) but to a lesser extent. However, the inclusion of curtailment at 50\% and 70\%, as well as mixed tariffs with capacity-based components at import and curtailment, allow for a high degree of PV installations in the three zones studied and help to mitigate the impact of PV on the distributed network. △ Less

Submitted 24 September, 2024; originally announced September 2024.

Comments: Preprint version. 25 pages, 6 figures

arXiv:2401.16247 [pdf, other]

Towards Red Teaming in Multimodal and Multilingual Translation

Authors: Christophe Ropers, David Dale, Prangthip Hansanti, Gabriel Mejia Gonzalez, Ivan Evtimov, Corinne Wong, Christophe Touret, Kristina Pereyra, Seohyun Sonia Kim, Cristian Canton Ferrer, Pierre Andrews, Marta R. Costa-jussà

Abstract: Assessing performance in Natural Language Processing is becoming increasingly complex. One particular challenge is the potential for evaluation datasets to overlap with training data, either directly or indirectly, which can lead to skewed results and overestimation of model performance. As a consequence, human evaluation is gaining increasing interest as a means to assess the performance and reli… ▽ More Assessing performance in Natural Language Processing is becoming increasingly complex. One particular challenge is the potential for evaluation datasets to overlap with training data, either directly or indirectly, which can lead to skewed results and overestimation of model performance. As a consequence, human evaluation is gaining increasing interest as a means to assess the performance and reliability of models. One such method is the red teaming approach, which aims to generate edge cases where a model will produce critical errors. While this methodology is becoming standard practice for generative AI, its application to the realm of conditional AI remains largely unexplored. This paper presents the first study on human-based red teaming for Machine Translation (MT), marking a significant step towards understanding and improving the performance of translation models. We delve into both human-based red teaming and a study on automation, reporting lessons learned and providing recommendations for both translation models and red teaming drills. This pioneering work opens up new avenues for research and development in the field of MT. △ Less

Submitted 29 January, 2024; originally announced January 2024.

Comments: arXiv admin note: substantial text overlap with arXiv:2312.05187

ACM Class: I.2.7

arXiv:2312.05187 [pdf, other]

Seamless: Multilingual Expressive and Streaming Speech Translation

Authors: Seamless Communication, Loïc Barrault, Yu-An Chung, Mariano Coria Meglioli, David Dale, Ning Dong, Mark Duppenthaler, Paul-Ambroise Duquenne, Brian Ellis, Hady Elsahar, Justin Haaheim, John Hoffman, Min-Jae Hwang, Hirofumi Inaguma, Christopher Klaiber, Ilia Kulikov, Pengwei Li, Daniel Licht, Jean Maillard, Ruslan Mavlyutov, Alice Rakotoarison, Kaushik Ram Sadagopan, Abinesh Ramakrishnan, Tuan Tran, Guillaume Wenzek , et al. (40 additional authors not shown)

Abstract: Large-scale automatic speech translation systems today lack key features that help machine-mediated communication feel seamless when compared to human-to-human dialogue. In this work, we introduce a family of models that enable end-to-end expressive and multilingual translations in a streaming fashion. First, we contribute an improved version of the massively multilingual and multimodal SeamlessM4… ▽ More Large-scale automatic speech translation systems today lack key features that help machine-mediated communication feel seamless when compared to human-to-human dialogue. In this work, we introduce a family of models that enable end-to-end expressive and multilingual translations in a streaming fashion. First, we contribute an improved version of the massively multilingual and multimodal SeamlessM4T model-SeamlessM4T v2. This newer model, incorporating an updated UnitY2 framework, was trained on more low-resource language data. SeamlessM4T v2 provides the foundation on which our next two models are initiated. SeamlessExpressive enables translation that preserves vocal styles and prosody. Compared to previous efforts in expressive speech research, our work addresses certain underexplored aspects of prosody, such as speech rate and pauses, while also preserving the style of one's voice. As for SeamlessStreaming, our model leverages the Efficient Monotonic Multihead Attention mechanism to generate low-latency target translations without waiting for complete source utterances. As the first of its kind, SeamlessStreaming enables simultaneous speech-to-speech/text translation for multiple source and target languages. To ensure that our models can be used safely and responsibly, we implemented the first known red-teaming effort for multimodal machine translation, a system for the detection and mitigation of added toxicity, a systematic evaluation of gender bias, and an inaudible localized watermarking mechanism designed to dampen the impact of deepfakes. Consequently, we bring major components from SeamlessExpressive and SeamlessStreaming together to form Seamless, the first publicly available system that unlocks expressive cross-lingual communication in real-time. The contributions to this work are publicly released and accessible at https://github.com/facebookresearch/seamless_communication △ Less

Submitted 8 December, 2023; originally announced December 2023.

arXiv:2308.11596 [pdf, other]

SeamlessM4T: Massively Multilingual & Multimodal Machine Translation

Authors: Seamless Communication, Loïc Barrault, Yu-An Chung, Mariano Cora Meglioli, David Dale, Ning Dong, Paul-Ambroise Duquenne, Hady Elsahar, Hongyu Gong, Kevin Heffernan, John Hoffman, Christopher Klaiber, Pengwei Li, Daniel Licht, Jean Maillard, Alice Rakotoarison, Kaushik Ram Sadagopan, Guillaume Wenzek, Ethan Ye, Bapi Akula, Peng-Jen Chen, Naji El Hachem, Brian Ellis, Gabriel Mejia Gonzalez, Justin Haaheim , et al. (43 additional authors not shown)

Abstract: What does it take to create the Babel Fish, a tool that can help individuals translate speech between any two languages? While recent breakthroughs in text-based models have pushed machine translation coverage beyond 200 languages, unified speech-to-speech translation models have yet to achieve similar strides. More specifically, conventional speech-to-speech translation systems rely on cascaded s… ▽ More What does it take to create the Babel Fish, a tool that can help individuals translate speech between any two languages? While recent breakthroughs in text-based models have pushed machine translation coverage beyond 200 languages, unified speech-to-speech translation models have yet to achieve similar strides. More specifically, conventional speech-to-speech translation systems rely on cascaded systems that perform translation progressively, putting high-performing unified systems out of reach. To address these gaps, we introduce SeamlessM4T, a single model that supports speech-to-speech translation, speech-to-text translation, text-to-speech translation, text-to-text translation, and automatic speech recognition for up to 100 languages. To build this, we used 1 million hours of open speech audio data to learn self-supervised speech representations with w2v-BERT 2.0. Subsequently, we created a multimodal corpus of automatically aligned speech translations. Filtered and combined with human-labeled and pseudo-labeled data, we developed the first multilingual system capable of translating from and into English for both speech and text. On FLEURS, SeamlessM4T sets a new standard for translations into multiple target languages, achieving an improvement of 20% BLEU over the previous SOTA in direct speech-to-text translation. Compared to strong cascaded models, SeamlessM4T improves the quality of into-English translation by 1.3 BLEU points in speech-to-text and by 2.6 ASR-BLEU points in speech-to-speech. Tested for robustness, our system performs better against background noises and speaker variations in speech-to-text tasks compared to the current SOTA model. Critically, we evaluated SeamlessM4T on gender bias and added toxicity to assess translation safety. Finally, all contributions in this work are open-sourced and accessible at https://github.com/facebookresearch/seamless_communication △ Less

Submitted 24 October, 2023; v1 submitted 22 August, 2023; originally announced August 2023.

ACM Class: I.2.7

arXiv:2208.08517 [pdf, other]

doi 10.3847/1538-3881/acacfc

The DESI Survey Validation: Results from Visual Inspection of the Quasar Survey Spectra

Authors: David M. Alexander, Tamara M. Davis, E. Chaussidon, V. A. Fawcett, Alma X. Gonzalez-Morales, Ting-Wen Lan, Christophe Yeche, S. Ahlen, J. N. Aguilar, E. Armengaud, S. Bailey, D. Brooks, Z. Cai, R. Canning, A. Carr, S. Chabanier, Marie-Claude Cousinou, K. Dawson, A. de la Macorra, A. Dey, Biprateep Dey, G. Dhungana, A. C. Edge, S. Eftekharzadeh, K. Fanning , et al. (47 additional authors not shown)

Abstract: A key component of the Dark Energy Spectroscopic Instrument (DESI) survey validation (SV) is a detailed visual inspection (VI) of the optical spectroscopic data to quantify key survey metrics. In this paper we present results from VI of the quasar survey using deep coadded SV spectra. We show that the majority (~70%) of the main-survey targets are spectroscopically confirmed as quasars, with ~16%… ▽ More A key component of the Dark Energy Spectroscopic Instrument (DESI) survey validation (SV) is a detailed visual inspection (VI) of the optical spectroscopic data to quantify key survey metrics. In this paper we present results from VI of the quasar survey using deep coadded SV spectra. We show that the majority (~70%) of the main-survey targets are spectroscopically confirmed as quasars, with ~16% galaxies, ~6% stars, and ~8% low-quality spectra lacking reliable features. A non-negligible fraction of the quasars are misidentified by the standard spectroscopic pipeline but we show that the majority can be recovered using post-pipeline "afterburner" quasar-identification approaches. We combine these "afterburners" with our standard pipeline to create a modified pipeline to improve the overall quasar yield. At the depth of the main DESI survey both pipelines achieve a good-redshift purity (reliable redshifts measured within 3000 km/s) of ~99%; however, the modified pipeline recovers ~94% of the visually inspected quasars, as compared to ~86% from the standard pipeline. We demonstrate that both pipelines achieve an median redshift precision and accuracy of ~100 km/s and ~70 km/s, respectively. We constructed composite spectra to investigate why some quasars are missed by the standard spectroscopic pipeline and find that they are more host-galaxy dominated (i.e., distant analogs of "Seyfert galaxies") and/or dust reddened than the standard-pipeline quasars. We also show example spectra to demonstrate the overall diversity of the DESI quasar sample and provide strong-lensing candidates where two targets contribute to a single spectrum. △ Less

Submitted 28 November, 2022; v1 submitted 17 August, 2022; originally announced August 2022.

Comments: Astronomical journal (in press). 26 pages, 15 figures, 9 tables, one of a suite of 8 papers detailing targeting for DESI. Figure data available from Zenodo (see paper for details)

arXiv:2207.04672 [pdf]

No Language Left Behind: Scaling Human-Centered Machine Translation

Authors: NLLB Team, Marta R. Costa-jussà, James Cross, Onur Çelebi, Maha Elbayad, Kenneth Heafield, Kevin Heffernan, Elahe Kalbassi, Janice Lam, Daniel Licht, Jean Maillard, Anna Sun, Skyler Wang, Guillaume Wenzek, Al Youngblood, Bapi Akula, Loic Barrault, Gabriel Mejia Gonzalez, Prangthip Hansanti, John Hoffman, Semarley Jarrett, Kaushik Ram Sadagopan, Dirk Rowe, Shannon Spruit, Chau Tran , et al. (14 additional authors not shown)

Abstract: Driven by the goal of eradicating language barriers on a global scale, machine translation has solidified itself as a key focus of artificial intelligence research today. However, such efforts have coalesced around a small subset of languages, leaving behind the vast majority of mostly low-resource languages. What does it take to break the 200 language barrier while ensuring safe, high quality res… ▽ More Driven by the goal of eradicating language barriers on a global scale, machine translation has solidified itself as a key focus of artificial intelligence research today. However, such efforts have coalesced around a small subset of languages, leaving behind the vast majority of mostly low-resource languages. What does it take to break the 200 language barrier while ensuring safe, high quality results, all while keeping ethical considerations in mind? In No Language Left Behind, we took on this challenge by first contextualizing the need for low-resource language translation support through exploratory interviews with native speakers. Then, we created datasets and models aimed at narrowing the performance gap between low and high-resource languages. More specifically, we developed a conditional compute model based on Sparsely Gated Mixture of Experts that is trained on data obtained with novel and effective data mining techniques tailored for low-resource languages. We propose multiple architectural and training improvements to counteract overfitting while training on thousands of tasks. Critically, we evaluated the performance of over 40,000 different translation directions using a human-translated benchmark, Flores-200, and combined human evaluation with a novel toxicity benchmark covering all languages in Flores-200 to assess translation safety. Our model achieves an improvement of 44% BLEU relative to the previous state-of-the-art, laying important groundwork towards realizing a universal translation system. Finally, we open source all contributions described in this work, accessible at https://github.com/facebookresearch/fairseq/tree/nllb. △ Less

Submitted 25 August, 2022; v1 submitted 11 July, 2022; originally announced July 2022.

Comments: 190 pages

MSC Class: 68T50 ACM Class: I.2.7

arXiv:2101.08748 [pdf, other]

doi 10.3847/1538-4357/abfc47

Probing the Sea of Cosmic Rays by Measuring Gamma-Ray Emission from Passive Giant Molecular Clouds with HAWC

Authors: A. Albert, R. Alfaro, C. Alvarez, J. R. Angeles Camacho, J. C. Arteaga-Velázquez, K. P. Arunbabu, D. Avila Rojas, H. A. Ayala Solares, V. Baghmanyan, E. Belmont-Moreno, S. Y. BenZvi, C. Brisbois, K. S. Caballero-Mora, T. Capistrán, A. Carramiñana, S. Casanova, U. Cotti, J. Cotzomi, S. Coutiño de León, E. De la Fuente, R. Diaz Hernandez, B. L. Dingus, M. A. DuVernois, M. Durocher, J. C. Díaz-Vélez , et al. (65 additional authors not shown)

Abstract: The study of high-energy gamma rays from passive Giant Molecular Clouds (GMCs) in our Galaxy is an indirect way to characterize and probe the paradigm of the "sea" of cosmic rays in distant parts of the Galaxy. By using data from the High Altitude Water Cherenkov (HAWC) observatory, we measure the gamma-ray flux above 1 TeV of a set of these clouds to test the paradigm. We selected high-galactic… ▽ More The study of high-energy gamma rays from passive Giant Molecular Clouds (GMCs) in our Galaxy is an indirect way to characterize and probe the paradigm of the "sea" of cosmic rays in distant parts of the Galaxy. By using data from the High Altitude Water Cherenkov (HAWC) observatory, we measure the gamma-ray flux above 1 TeV of a set of these clouds to test the paradigm. We selected high-galactic latitude clouds that are in HAWC's field-of-view and which are within 1~kpc distance from the Sun. We find no significant excess emission in the cloud regions, nor when we perform a stacked log-likelihood analysis of GMCs. Using a Bayesian approach, we calculate 95\% credible intervals upper limits of the gamma-ray flux and estimate limits on the cosmic-ray energy density of these regions. These are the first limits to constrain gamma-ray emission in the multi-TeV energy range ($>$1 TeV) using passive high-galactic latitude GMCs. Assuming that the main gamma-ray production mechanism is due to proton-proton interaction, the upper limits are consistent with a cosmic-ray flux and energy density similar to that measured at Earth. △ Less

Submitted 27 April, 2021; v1 submitted 21 January, 2021; originally announced January 2021.

Comments: 6 figures, 6 tables

arXiv:1804.09538 [pdf, ps, other]

doi 10.1007/s00339-018-2198-9

Influence of post-deposition annealing on the chemical states of crystalline tantalum pentoxide films

Authors: Israel Perez, Victor Sosa, Fidel Gamboa, Jose Trinidad Elizalde Galindo, Jose L. Enriquez-Carrejo, Rurik Farias, Pierre Giovanni Mani Gonzalez

Abstract: We investigate the effect of post-deposition annealing (for temperatures from 848 K to 1273 K) on the chemical properties of crystalline Ta$_2$O$_5$ films grown on Si(100) substrates by radio frequency magnetron sputtering. The atomic arrangement, as determined by X-ray diffraction, is predominately hexagonal ($δ$-Ta$_2$O$_5$) for the films exposed to heat treatments at 948 K and 1048 K; orthorhom… ▽ More We investigate the effect of post-deposition annealing (for temperatures from 848 K to 1273 K) on the chemical properties of crystalline Ta$_2$O$_5$ films grown on Si(100) substrates by radio frequency magnetron sputtering. The atomic arrangement, as determined by X-ray diffraction, is predominately hexagonal ($δ$-Ta$_2$O$_5$) for the films exposed to heat treatments at 948 K and 1048 K; orthorhombic ($β$-Ta$_2$O$_5$) for samples annealed at 1148 K and 1273 K; and amorphous for samples annealed at temperatures below 948 K. X-ray photoelectron spectroscopy for Ta $4f$ and O $1s$ core-levels were performed to evaluate the chemical properties of all films as a function of annealing temperature. Upon analysis, it is observed the Ta $4f$ spectrum characteristic of Ta in Ta$^{5+}$ and the formation of Ta-oxide phases with oxidation states Ta$^{1+}$, Ta$^{2+}$, Ta$^{3+}$, and Ta$^{4+}$. The study reveals that the increase in annealing temperature increases the percentage of the state Ta$^{5+}$ and the reduction of the others indicating that higher temperatures are more desirable to produce Ta$_2$O$_5$, however, there seems to be an optimal annealing temperature that maximizes the O\% to Ta\% ratio. We found that at 1273 K the ratio slightly reduces suggesting oxygen depletion. △ Less

Submitted 22 November, 2018; v1 submitted 13 April, 2018; originally announced April 2018.

Comments: 7 pages, 1 table, 6 figures. arXiv admin note: text overlap with arXiv:1804.02067, arXiv:1704.05514

Journal ref: Appl. Phys. A, 124, 792, 2018

arXiv:1804.02067 [pdf, other]

doi 10.1016/j.vacuum.2019.04.037

Effect of Ion Bombardment on the Chemical Properties of Crystalline Tantalum Pentoxide Films

Authors: Israel Perez, Victor Sosa, Fidel Gamboa Perera, Jose Trinidad Elizalde Galindo, Jose Luis Enriquez-Carrejo, Pierre Giovanni Mani Gonzalez

Abstract: The effect of argon ion bombardment on the chemical properties of crystalline Ta$_2$O$_5$ films grown on Si(100) substrates by radio frequency magnetron sputtering was investigated by X-ray photoelectron spectroscopy. All samples were irradiated for several time intervals [(0.5, 3, 6, 9) min] and the Ta $4f$ and O $1s$ core levels were measured each time. Upon analysis at the surface of the films,… ▽ More The effect of argon ion bombardment on the chemical properties of crystalline Ta$_2$O$_5$ films grown on Si(100) substrates by radio frequency magnetron sputtering was investigated by X-ray photoelectron spectroscopy. All samples were irradiated for several time intervals [(0.5, 3, 6, 9) min] and the Ta $4f$ and O $1s$ core levels were measured each time. Upon analysis at the surface of the films, we observe the Ta $4f$ spectrum characteristic of Ta$_2$O$_5$. Irradiated samples exhibit the formation of Ta suboxides with oxidation states Ta$^{1+}$, Ta$^{2+}$, Ta$^{3+}$, Ta$^{4+}$, and Ta$^{5+}$. Exposing the films, after ion bombardment, to ambient for some days stimulates the amorphous phase of Ta$_2$O$_5$ at the surface suggesting that the suboxides of Ta are unstable. Using a sputtering simulation we discuss that these suboxides are largely generated during ion bombardment that greatly reduces the oxygen to tantalum ratio as the irradiation time increases. The computer simulation indicates that this is due to the high sputtering yield of oxygen. △ Less

Submitted 3 May, 2019; v1 submitted 5 April, 2018; originally announced April 2018.

Comments: 12 pages, 8 figures

Journal ref: Vacuum, 165, 274-282, (2019)

Showing 1–9 of 9 results for author: Gonzalez, G M