-
AstroMLab 2: AstroLLaMA-2-70B Model and Benchmarking Specialised LLMs for Astronomy
Authors:
Rui Pan,
Tuan Dung Nguyen,
Hardik Arora,
Alberto Accomazzi,
Tirthankar Ghosal,
Yuan-Sen Ting
Abstract:
Continual pretraining of large language models on domain-specific data has been proposed to enhance performance on downstream tasks. In astronomy, the previous absence of astronomy-focused benchmarks has hindered objective evaluation of these specialized LLM models. Leveraging a recent initiative to curate high-quality astronomical MCQs, this study aims to quantitatively assess specialized LLMs in…
▽ More
Continual pretraining of large language models on domain-specific data has been proposed to enhance performance on downstream tasks. In astronomy, the previous absence of astronomy-focused benchmarks has hindered objective evaluation of these specialized LLM models. Leveraging a recent initiative to curate high-quality astronomical MCQs, this study aims to quantitatively assess specialized LLMs in astronomy. We find that the previously released AstroLLaMA series, based on LLaMA-2-7B, underperforms compared to the base model. We demonstrate that this performance degradation can be partially mitigated by utilizing high-quality data for continual pretraining, such as summarized text from arXiv. Despite the observed catastrophic forgetting in smaller models, our results indicate that continual pretraining on the 70B model can yield significant improvements. However, the current supervised fine-tuning dataset still constrains the performance of instruct models. In conjunction with this study, we introduce a new set of models, AstroLLaMA-3-8B and AstroLLaMA-2-70B, building upon the previous AstroLLaMA series.
△ Less
Submitted 29 September, 2024;
originally announced September 2024.
-
pathfinder: A Semantic Framework for Literature Review and Knowledge Discovery in Astronomy
Authors:
Kartheik G. Iyer,
Mikaeel Yunus,
Charles O'Neill,
Christine Ye,
Alina Hyk,
Kiera McCormick,
Ioana Ciuca,
John F. Wu,
Alberto Accomazzi,
Simone Astarita,
Rishabh Chakrabarty,
Jesse Cranney,
Anjalie Field,
Tirthankar Ghosal,
Michele Ginolfi,
Marc Huertas-Company,
Maja Jablonska,
Sandor Kruk,
Huiling Liu,
Gabriel Marchidan,
Rohit Mistry,
J. P. Naiman,
J. E. G. Peek,
Mugdha Polimera,
Sergio J. Rodriguez
, et al. (5 additional authors not shown)
Abstract:
The exponential growth of astronomical literature poses significant challenges for researchers navigating and synthesizing general insights or even domain-specific knowledge. We present Pathfinder, a machine learning framework designed to enable literature review and knowledge discovery in astronomy, focusing on semantic searching with natural language instead of syntactic searches with keywords.…
▽ More
The exponential growth of astronomical literature poses significant challenges for researchers navigating and synthesizing general insights or even domain-specific knowledge. We present Pathfinder, a machine learning framework designed to enable literature review and knowledge discovery in astronomy, focusing on semantic searching with natural language instead of syntactic searches with keywords. Utilizing state-of-the-art large language models (LLMs) and a corpus of 350,000 peer-reviewed papers from the Astrophysics Data System (ADS), Pathfinder offers an innovative approach to scientific inquiry and literature exploration. Our framework couples advanced retrieval techniques with LLM-based synthesis to search astronomical literature by semantic context as a complement to currently existing methods that use keywords or citation graphs. It addresses complexities of jargon, named entities, and temporal aspects through time-based and citation-based weighting schemes. We demonstrate the tool's versatility through case studies, showcasing its application in various research scenarios. The system's performance is evaluated using custom benchmarks, including single-paper and multi-paper tasks. Beyond literature review, Pathfinder offers unique capabilities for reformatting answers in ways that are accessible to various audiences (e.g. in a different language or as simplified text), visualizing research landscapes, and tracking the impact of observatories and methodologies. This tool represents a significant advancement in applying AI to astronomical research, aiding researchers at all career stages in navigating modern astronomy literature.
△ Less
Submitted 2 August, 2024;
originally announced August 2024.
-
AstroMLab 1: Who Wins Astronomy Jeopardy!?
Authors:
Yuan-Sen Ting,
Tuan Dung Nguyen,
Tirthankar Ghosal,
Rui Pan,
Hardik Arora,
Zechang Sun,
Tijmen de Haan,
Nesar Ramachandra,
Azton Wells,
Sandeep Madireddy,
Alberto Accomazzi
Abstract:
We present a comprehensive evaluation of proprietary and open-weights large language models using the first astronomy-specific benchmarking dataset. This dataset comprises 4,425 multiple-choice questions curated from the Annual Review of Astronomy and Astrophysics, covering a broad range of astrophysical topics. Our analysis examines model performance across various astronomical subfields and asse…
▽ More
We present a comprehensive evaluation of proprietary and open-weights large language models using the first astronomy-specific benchmarking dataset. This dataset comprises 4,425 multiple-choice questions curated from the Annual Review of Astronomy and Astrophysics, covering a broad range of astrophysical topics. Our analysis examines model performance across various astronomical subfields and assesses response calibration, crucial for potential deployment in research environments. Claude-3.5-Sonnet outperforms competitors by up to 4.6 percentage points, achieving 85.0% accuracy. For proprietary models, we observed a universal reduction in cost every 3-to-12 months to achieve similar score in this particular astronomy benchmark. Open-source models have rapidly improved, with LLaMA-3-70b (80.6%) and Qwen-2-72b (77.7%) now competing with some of the best proprietary models. We identify performance variations across topics, with non-English-focused models generally struggling more in exoplanet-related fields, stellar astrophysics, and instrumentation related questions. These challenges likely stem from less abundant training data, limited historical context, and rapid recent developments in these areas. This pattern is observed across both open-weights and proprietary models, with regional dependencies evident, highlighting the impact of training data diversity on model performance in specialized scientific domains. Top-performing models demonstrate well-calibrated confidence, with correlations above 0.9 between confidence and correctness, though they tend to be slightly underconfident. The development for fast, low-cost inference of open-weights models presents new opportunities for affordable deployment in astronomy. The rapid progress observed suggests that LLM-driven research in astronomy may become feasible in the near future.
△ Less
Submitted 15 July, 2024;
originally announced July 2024.
-
INDUS: Effective and Efficient Language Models for Scientific Applications
Authors:
Bishwaranjan Bhattacharjee,
Aashka Trivedi,
Masayasu Muraoka,
Muthukumaran Ramasubramanian,
Takuma Udagawa,
Iksha Gurung,
Rong Zhang,
Bharath Dandala,
Rahul Ramachandran,
Manil Maskey,
Kaylin Bugbee,
Mike Little,
Elizabeth Fancher,
Lauren Sanders,
Sylvain Costes,
Sergi Blanco-Cuaresma,
Kelly Lockhart,
Thomas Allen,
Felix Grezes,
Megan Ansdell,
Alberto Accomazzi,
Yousef El-Kurdi,
Davis Wertheimer,
Birgit Pfitzmann,
Cesar Berrospi Ramis
, et al. (9 additional authors not shown)
Abstract:
Large language models (LLMs) trained on general domain corpora showed remarkable results on natural language processing (NLP) tasks. However, previous research demonstrated LLMs trained using domain-focused corpora perform better on specialized tasks. Inspired by this pivotal insight, we developed INDUS, a comprehensive suite of LLMs tailored for the Earth science, biology, physics, heliophysics,…
▽ More
Large language models (LLMs) trained on general domain corpora showed remarkable results on natural language processing (NLP) tasks. However, previous research demonstrated LLMs trained using domain-focused corpora perform better on specialized tasks. Inspired by this pivotal insight, we developed INDUS, a comprehensive suite of LLMs tailored for the Earth science, biology, physics, heliophysics, planetary sciences and astrophysics domains and trained using curated scientific corpora drawn from diverse data sources. The suite of models include: (1) an encoder model trained using domain-specific vocabulary and corpora to address natural language understanding tasks, (2) a contrastive-learning-based general text embedding model trained using a diverse set of datasets drawn from multiple sources to address information retrieval tasks and (3) smaller versions of these models created using knowledge distillation techniques to address applications which have latency or resource constraints. We also created three new scientific benchmark datasets namely, CLIMATE-CHANGE-NER (entity-recognition), NASA-QA (extractive QA) and NASA-IR (IR) to accelerate research in these multi-disciplinary fields. Finally, we show that our models outperform both general-purpose encoders (RoBERTa) and existing domain-specific encoders (SciBERT) on these new tasks as well as existing benchmark tasks in the domains of interest.
△ Less
Submitted 20 May, 2024; v1 submitted 17 May, 2024;
originally announced May 2024.
-
Decades of Transformation: Evolution of the NASA Astrophysics Data System's Infrastructure
Authors:
Alberto Accomazzi
Abstract:
The NASA Astrophysics Data System (ADS) is the primary Digital Library portal for researchers in astronomy and astrophysics. Over the past 30 years, the ADS has gone from being an astronomy-focused bibliographic database to an open digital library system supporting research in space and (soon) earth sciences. This paper describes the evolution of the ADS system, its capabilities, and the technolog…
▽ More
The NASA Astrophysics Data System (ADS) is the primary Digital Library portal for researchers in astronomy and astrophysics. Over the past 30 years, the ADS has gone from being an astronomy-focused bibliographic database to an open digital library system supporting research in space and (soon) earth sciences. This paper describes the evolution of the ADS system, its capabilities, and the technological infrastructure underpinning it.
We give an overview of the ADS's original architecture, constructed primarily around simple database models. This bespoke system allowed for the efficient indexing of metadata and citations, the digitization and archival of full-text articles, and the rapid development of discipline-specific capabilities running on commodity hardware. The move towards a cloud-based microservices architecture and an open-source search engine in the late 2010s marked a significant shift, bringing full-text search capabilities, a modern API, higher uptime, more reliable data retrieval, and integration of advanced visualizations and analytics.
Another crucial evolution came with the gradual and ongoing incorporation of Machine Learning and Natural Language Processing algorithms in our data pipelines. Originally used for information extraction and classification tasks, NLP and ML techniques are now being developed to improve metadata enrichment, search, notifications, and recommendations. we describe how these computational techniques are being embedded into our software infrastructure, the challenges faced, and the benefits reaped.
Finally, we conclude by describing the future prospects of ADS and its ongoing expansion, discussing the challenges of managing an interdisciplinary information system in the era of AI and Open Science, where information is abundant, technology is transformative, but their trustworthiness can be elusive.
△ Less
Submitted 17 January, 2024;
originally announced January 2024.
-
Improving the visibility and citability of exoplanet research software
Authors:
Alice Allen,
Alberto Accomazzi,
Joe P. Renaud
Abstract:
The Astrophysics Source Code Library (ASCL) is a free online registry for source codes of interest to astronomers, astrophysicists, and planetary scientists. It lists, and in some cases houses, software that has been used in research appearing in or submitted to peer-reviewed publications. As of December 2023, it has over 3300 software entries and is indexed by NASA's Astrophysics Data System (ADS…
▽ More
The Astrophysics Source Code Library (ASCL) is a free online registry for source codes of interest to astronomers, astrophysicists, and planetary scientists. It lists, and in some cases houses, software that has been used in research appearing in or submitted to peer-reviewed publications. As of December 2023, it has over 3300 software entries and is indexed by NASA's Astrophysics Data System (ADS) and Clarivate's Web of Science.
In 2020, NASA created the Exoplanet Modeling and Analysis Center (EMAC). Housed at the Goddard Space Flight Center, EMAC serves, in part, as a catalog and repository for exoplanet research resources. EMAC has 240 entries (as of December 2023), 78% of which are for downloadable software.
This oral presentation covered the collaborative work the ASCL, EMAC, and ADS are doing to increase the discoverability and citability of EMAC's software entries and to strengthen the ASCL's ability to serve the planetary science community. It also introduced two new projects, Virtual Astronomy Software Talks (VAST) and Exoplanet Virtual Astronomy Software Talks (exoVAST), that provide additional opportunities for discoverability of EMAC software resources.
△ Less
Submitted 28 December, 2023;
originally announced December 2023.
-
Experimenting with Large Language Models and vector embeddings in NASA SciX
Authors:
Sergi Blanco-Cuaresma,
Ioana Ciucă,
Alberto Accomazzi,
Michael J. Kurtz,
Edwin A. Henneken,
Kelly E. Lockhart,
Felix Grezes,
Thomas Allen,
Golnaz Shapurian,
Carolyn S. Grant,
Donna M. Thompson,
Timothy W. Hostetler,
Matthew R. Templeton,
Shinyi Chen,
Jennifer Koch,
Taylor Jacovich,
Daniel Chivvis,
Fernanda de Macedo Alves,
Jean-Claude Paquin,
Jennifer Bartlett,
Mugdha Polimera,
Stephanie Jarmak
Abstract:
Open-source Large Language Models enable projects such as NASA SciX (i.e., NASA ADS) to think out of the box and try alternative approaches for information retrieval and data augmentation, while respecting data copyright and users' privacy. However, when large language models are directly prompted with questions without any context, they are prone to hallucination. At NASA SciX we have developed a…
▽ More
Open-source Large Language Models enable projects such as NASA SciX (i.e., NASA ADS) to think out of the box and try alternative approaches for information retrieval and data augmentation, while respecting data copyright and users' privacy. However, when large language models are directly prompted with questions without any context, they are prone to hallucination. At NASA SciX we have developed an experiment where we created semantic vectors for our large collection of abstracts and full-text content, and we designed a prompt system to ask questions using contextual chunks from our system. Based on a non-systematic human evaluation, the experiment shows a lower degree of hallucination and better responses when using Retrieval Augmented Generation. Further exploration is required to design new features and data augmentation processes at NASA SciX that leverages this technology while respecting the high level of trust and quality that the project holds.
△ Less
Submitted 21 December, 2023;
originally announced December 2023.
-
Identifying Planetary Names in Astronomy Papers: A Multi-Step Approach
Authors:
Golnaz Shapurian,
Michael J Kurtz,
Alberto Accomazzi
Abstract:
The automatic identification of planetary feature names in astronomy publications presents numerous challenges. These features include craters, defined as roughly circular depressions resulting from impact or volcanic activity; dorsas, which are elongate raised structures or wrinkle ridges; and lacus, small irregular patches of dark, smooth material on the Moon, referred to as "lake" (Planetary Na…
▽ More
The automatic identification of planetary feature names in astronomy publications presents numerous challenges. These features include craters, defined as roughly circular depressions resulting from impact or volcanic activity; dorsas, which are elongate raised structures or wrinkle ridges; and lacus, small irregular patches of dark, smooth material on the Moon, referred to as "lake" (Planetary Names Working Group, n.d.). Many feature names overlap with places or people's names that they are named after, for example, Syria, Tempe, Einstein, and Sagan, to name a few (U.S. Geological Survey, n.d.). Some feature names have been used in many contexts, for instance, Apollo, which can refer to mission, program, sample, astronaut, seismic, seismometers, core, era, data, collection, instrument, and station, in addition to the crater on the Moon. Some feature names can appear in the text as adjectives, like the lunar craters Black, Green, and White. Some feature names in other contexts serve as directions, like craters West and South on the Moon. Additionally, some features share identical names across different celestial bodies, requiring disambiguation, such as the Adams crater, which exists on both the Moon and Mars. We present a multi-step pipeline combining rule-based filtering, statistical relevance analysis, part-of-speech (POS) tagging, named entity recognition (NER) model, hybrid keyword harvesting, knowledge graph (KG) matching, and inference with a locally installed large language model (LLM) to reliably identify planetary names despite these challenges. When evaluated on a dataset of astronomy papers from the Astrophysics Data System (ADS), this methodology achieves an F1-score over 0.97 in disambiguating planetary feature names.
△ Less
Submitted 17 December, 2023; v1 submitted 13 December, 2023;
originally announced December 2023.
-
The Future of Astronomical Data Infrastructure: Meeting Report
Authors:
Michael R. Blanton,
Janet D. Evans,
Dara Norman,
William O'Mullane,
Adrian Price-Whelan,
Luca Rizzi,
Alberto Accomazzi,
Megan Ansdell,
Stephen Bailey,
Paul Barrett,
Steven Berukoff,
Adam Bolton,
Julian Borrill,
Kelle Cruz,
Julianne Dalcanton,
Vandana Desai,
Gregory P. Dubois-Felsmann,
Frossie Economou,
Henry Ferguson,
Bryan Field,
Dan Foreman-Mackey,
Jaime Forero-Romero,
Niall Gaffney,
Kim Gillies,
Matthew J. Graham
, et al. (47 additional authors not shown)
Abstract:
The astronomical community is grappling with the increasing volume and complexity of data produced by modern telescopes, due to difficulties in reducing, accessing, analyzing, and combining archives of data. To address this challenge, we propose the establishment of a coordinating body, an "entity," with the specific mission of enhancing the interoperability, archiving, distribution, and productio…
▽ More
The astronomical community is grappling with the increasing volume and complexity of data produced by modern telescopes, due to difficulties in reducing, accessing, analyzing, and combining archives of data. To address this challenge, we propose the establishment of a coordinating body, an "entity," with the specific mission of enhancing the interoperability, archiving, distribution, and production of both astronomical data and software. This report is the culmination of a workshop held in February 2023 on the Future of Astronomical Data Infrastructure. Attended by 70 scientists and software professionals from ground-based and space-based missions and archives spanning the entire spectrum of astronomical research, the group deliberated on the prevailing state of software and data infrastructure in astronomy, identified pressing issues, and explored potential solutions. In this report, we describe the ecosystem of astronomical data, its existing flaws, and the many gaps, duplication, inconsistencies, barriers to access, drags on productivity, missed opportunities, and risks to the long-term integrity of essential data sets. We also highlight the successes and failures in a set of deep dives into several different illustrative components of the ecosystem, included as an appendix.
△ Less
Submitted 7 November, 2023;
originally announced November 2023.
-
AstroLLaMA: Towards Specialized Foundation Models in Astronomy
Authors:
Tuan Dung Nguyen,
Yuan-Sen Ting,
Ioana Ciucă,
Charlie O'Neill,
Ze-Chang Sun,
Maja Jabłońska,
Sandor Kruk,
Ernest Perkowski,
Jack Miller,
Jason Li,
Josh Peek,
Kartheik Iyer,
Tomasz Różański,
Pranav Khetarpal,
Sharaf Zaman,
David Brodrick,
Sergio J. Rodríguez Méndez,
Thang Bui,
Alyssa Goodman,
Alberto Accomazzi,
Jill Naiman,
Jesse Cranney,
Kevin Schawinski,
UniverseTBD
Abstract:
Large language models excel in many human-language tasks but often falter in highly specialized domains like scholarly astronomy. To bridge this gap, we introduce AstroLLaMA, a 7-billion-parameter model fine-tuned from LLaMA-2 using over 300,000 astronomy abstracts from arXiv. Optimized for traditional causal language modeling, AstroLLaMA achieves a 30% lower perplexity than Llama-2, showing marke…
▽ More
Large language models excel in many human-language tasks but often falter in highly specialized domains like scholarly astronomy. To bridge this gap, we introduce AstroLLaMA, a 7-billion-parameter model fine-tuned from LLaMA-2 using over 300,000 astronomy abstracts from arXiv. Optimized for traditional causal language modeling, AstroLLaMA achieves a 30% lower perplexity than Llama-2, showing marked domain adaptation. Our model generates more insightful and scientifically relevant text completions and embedding extraction than state-of-the-arts foundation models despite having significantly fewer parameters. AstroLLaMA serves as a robust, domain-specific model with broad fine-tuning potential. Its public release aims to spur astronomy-focused research, including automatic paper summarization and conversational agent development.
△ Less
Submitted 12 September, 2023;
originally announced September 2023.
-
Improving astroBERT using Semantic Textual Similarity
Authors:
Felix Grezes,
Thomas Allen,
Sergi Blanco-Cuaresma,
Alberto Accomazzi,
Michael J. Kurtz,
Golnaz Shapurian,
Edwin Henneken,
Carolyn S. Grant,
Donna M. Thompson,
Timothy W. Hostetler,
Matthew R. Templeton,
Kelly E. Lockhart,
Shinyi Chen,
Jennifer Koch,
Taylor Jacovich,
Pavlos Protopapas
Abstract:
The NASA Astrophysics Data System (ADS) is an essential tool for researchers that allows them to explore the astronomy and astrophysics scientific literature, but it has yet to exploit recent advances in natural language processing. At ADASS 2021, we introduced astroBERT, a machine learning language model tailored to the text used in astronomy papers in ADS. In this work we:
- announce the first…
▽ More
The NASA Astrophysics Data System (ADS) is an essential tool for researchers that allows them to explore the astronomy and astrophysics scientific literature, but it has yet to exploit recent advances in natural language processing. At ADASS 2021, we introduced astroBERT, a machine learning language model tailored to the text used in astronomy papers in ADS. In this work we:
- announce the first public release of the astroBERT language model;
- show how astroBERT improves over existing public language models on astrophysics specific tasks;
- and detail how ADS plans to harness the unique structure of scientific papers, the citation graph and citation context, to further improve astroBERT.
△ Less
Submitted 29 November, 2022;
originally announced December 2022.
-
Web accessibility trends and implementation in dynamic web applications
Authors:
Timothy W. Hostetler,
Shinyi Chen,
Sergi Blanco-Cuaresma,
Alberto Accomazzi,
Michael J. Kurtz,
Carolyn S. Grant,
Edwin Henneken,
Donna M. Thompson,
Roman Chyla,
Golnaz Shapurian,
Matthew R. Templeton,
Kelly E. Lockhart,
Nemanja Martinovic,
Stephen McDonald,
Felix Grezes
Abstract:
The NASA Astrophysics Data System (ADS), a critical research service for the astrophysics community, strives to provide the most accessible and inclusive environment for the discovery and exploration of the astronomical literature. Part of this goal involves creating a digital platform that can accommodate everybody, including those with disabilities that would benefit from alternative ways to pre…
▽ More
The NASA Astrophysics Data System (ADS), a critical research service for the astrophysics community, strives to provide the most accessible and inclusive environment for the discovery and exploration of the astronomical literature. Part of this goal involves creating a digital platform that can accommodate everybody, including those with disabilities that would benefit from alternative ways to present the information provided by the website. NASA ADS follows the official Web Content Accessibility Guidelines (WCAG) standard for ensuring accessibility of all its applications, striving to exceed this standard where possible. Through the use of both internal audits and external expert review based on these guidelines, we have identified many areas for improving accessibility in our current web application, and have implemented a number of updates to the UI as a result of this. We present an overview of some current web accessibility trends, discuss our experience incorporating these trends in our web application, and discuss the lessons learned and recommendations for future projects.
△ Less
Submitted 1 February, 2022;
originally announced February 2022.
-
Building astroBERT, a language model for Astronomy & Astrophysics
Authors:
Felix Grezes,
Sergi Blanco-Cuaresma,
Alberto Accomazzi,
Michael J. Kurtz,
Golnaz Shapurian,
Edwin Henneken,
Carolyn S. Grant,
Donna M. Thompson,
Roman Chyla,
Stephen McDonald,
Timothy W. Hostetler,
Matthew R. Templeton,
Kelly E. Lockhart,
Nemanja Martinovic,
Shinyi Chen,
Chris Tanner,
Pavlos Protopapas
Abstract:
The existing search tools for exploring the NASA Astrophysics Data System (ADS) can be quite rich and empowering (e.g., similar and trending operators), but researchers are not yet allowed to fully leverage semantic search. For example, a query for "results from the Planck mission" should be able to distinguish between all the various meanings of Planck (person, mission, constant, institutions and…
▽ More
The existing search tools for exploring the NASA Astrophysics Data System (ADS) can be quite rich and empowering (e.g., similar and trending operators), but researchers are not yet allowed to fully leverage semantic search. For example, a query for "results from the Planck mission" should be able to distinguish between all the various meanings of Planck (person, mission, constant, institutions and more) without further clarification from the user. At ADS, we are applying modern machine learning and natural language processing techniques to our dataset of recent astronomy publications to train astroBERT, a deeply contextual language model based on research at Google. Using astroBERT, we aim to enrich the ADS dataset and improve its discoverability, and in particular we are developing our own named entity recognition tool. We present here our preliminary results and lessons learned.
△ Less
Submitted 1 December, 2021;
originally announced December 2021.
-
Best Practices for Data Publication in the Astronomical Literature
Authors:
Tracy X. Chen,
Marion Schmitz,
Joseph M. Mazzarella,
Xiuqin Wu,
Julian C. van Eyken,
Alberto Accomazzi,
Rachel L. Akeson,
Mark Allen,
Rachael Beaton,
G. Bruce Berriman,
Andrew W. Boyle,
Marianne Brouty,
Ben Chan,
Jessie L. Christiansen,
David R. Ciardi,
David Cook,
Raffaele D'Abrusco,
Rick Ebert,
Cren Frayer,
Benjamin J. Fulton,
Christopher Gelino,
George Helou,
Calen B. Henderson,
Justin Howell,
Joyce Kim
, et al. (20 additional authors not shown)
Abstract:
We present an overview of best practices for publishing data in astronomy and astrophysics journals. These recommendations are intended as a reference for authors to help prepare and publish data in a way that will better represent and support science results, enable better data sharing, improve reproducibility, and enhance the reusability of data. Observance of these guidelines will also help to…
▽ More
We present an overview of best practices for publishing data in astronomy and astrophysics journals. These recommendations are intended as a reference for authors to help prepare and publish data in a way that will better represent and support science results, enable better data sharing, improve reproducibility, and enhance the reusability of data. Observance of these guidelines will also help to streamline the extraction, preservation, integration and cross-linking of valuable data from astrophysics literature into major astronomical databases, and consequently facilitate new modes of science discovery that will better exploit the vast quantities of panchromatic and multi-dimensional data associated with the literature. We encourage authors, journal editors, referees, and publishers to implement the best practices reviewed here, as well as related recommendations from international astronomical organizations such as the International Astronomical Union (IAU) for publication of nomenclature, data, and metadata. A convenient Checklist of Recommendations for Publishing Data in the Literature is included for authors to consult before the submission of the final version of their journal articles and associated data files. We recommend that publishers of journals in astronomy and astrophysics incorporate a link to this document in their Instructions to Authors.
△ Less
Submitted 16 April, 2022; v1 submitted 2 June, 2021;
originally announced June 2021.
-
Enabling Synergy: Improving the Information Infrastructure for Planetary Science
Authors:
Michael J. Kurtz,
Alberto Accomazzi,
Edwin A. Henneken
Abstract:
In this whitepaper we advocate that the Planetary Science (PS) community build a discipline-specific digital library, in collaboration with the existing astronomy digital library, ADS. We suggest that the PS data archives increase their level of curation to allow for direct linking between the archival data and the derived journal articles. And we suggest that a new component of the PS information…
▽ More
In this whitepaper we advocate that the Planetary Science (PS) community build a discipline-specific digital library, in collaboration with the existing astronomy digital library, ADS. We suggest that the PS data archives increase their level of curation to allow for direct linking between the archival data and the derived journal articles. And we suggest that a new component of the PS information infrastructure be created to collate and curate information on features and objects in our solar system, beginning with the USGS/IAU Gazetteer of Planetary Nomenclature.
△ Less
Submitted 29 September, 2020;
originally announced September 2020.
-
Agile methodologies in teams with highly creative and autonomous members
Authors:
Sergi Blanco-Cuaresma,
Alberto Accomazzi,
Michael J. Kurtz,
Edwin Henneken,
Carolyn S. Grant,
Donna M. Thompson,
Roman Chyla,
Stephen McDonald,
Golnaz Shapurian,
Timothy W. Hostetler,
Matthew R. Templeton,
Kelly E. Lockhart,
Kris Bukovi
Abstract:
The Agile manifesto encourages us to value individuals and interactions over processes and tools, while Scrum, the most adopted Agile development methodology, is essentially based on roles, events, artifacts, and the rules that bind them together (i.e., processes). Moreover, it is generally proclaimed that whenever a Scrum project does not succeed, the reason is because Scrum was not implemented c…
▽ More
The Agile manifesto encourages us to value individuals and interactions over processes and tools, while Scrum, the most adopted Agile development methodology, is essentially based on roles, events, artifacts, and the rules that bind them together (i.e., processes). Moreover, it is generally proclaimed that whenever a Scrum project does not succeed, the reason is because Scrum was not implemented correctly and not because Scrum may have its own flaws. This grants irrefutability to the methodology, discouraging deviations to fit the actual needs and peculiarities of the developers. In particular, the members of the NASA ADS team are highly creative and autonomous whose motivation can be affected if their freedom is too strongly constrained. We present our experience following Agile principles, reusing certain Scrum elements and seeking the satisfaction of the team members, while rapidly reacting/keeping the project in line with our stakeholders expectations.
△ Less
Submitted 10 September, 2020;
originally announced September 2020.
-
Enabling Effective Exoplanet / Planetary Collaborative Science
Authors:
Mark S. Marley,
Chester Harman,
Heidi B. Hammel,
Paul Byrne,
Jonathan Fortney,
Alberto Accomazzi,
Sarah E. Moran,
Michael Way,
Jessie Christiansen,
Noam Izenberg,
Timothy Holt,
Sanaz Vahidinia,
Erika Kohler,
Karalee Brugman
Abstract:
The field of exoplanetary science has emerged over the past two decades, rising up alongside traditional solar system planetary science. Both fields focus on understanding the processes which form and sculpt planets through time, yet there has been less scientific exchange between the two communities than is ideal. This white paper explores some of the institutional and cultural barriers which imp…
▽ More
The field of exoplanetary science has emerged over the past two decades, rising up alongside traditional solar system planetary science. Both fields focus on understanding the processes which form and sculpt planets through time, yet there has been less scientific exchange between the two communities than is ideal. This white paper explores some of the institutional and cultural barriers which impede cross-discipline collaborations and suggests solutions that would foster greater collaboration. Some solutions require structural or policy changes within NASA itself, while others are directed towards other institutions, including academic publishers, that can also facilitate greater interdisciplinarity.
△ Less
Submitted 20 July, 2020;
originally announced July 2020.
-
Practice meets Principle: Tracking Software and Data Citations to Zenodo DOIs
Authors:
Stephanie van de Sandt,
Lars Holm Nielsen,
Alexandros Ioannidis,
August Muench,
Edwin Henneken,
Alberto Accomazzi,
Chiara Bigarella,
Jose Benito Gonzalez Lopez,
Sünje Dallmeier-Tiessen
Abstract:
Data and software citations are crucial for the transparency of research results and for the transmission of credit. But they are hard to track, because of the absence of a common citation standard. As a consequence, the FORCE11 recently proposed data and software citation principles as guidance for authors. Zenodo is recognized for the implementation of DOIs for software on a large scale. The min…
▽ More
Data and software citations are crucial for the transparency of research results and for the transmission of credit. But they are hard to track, because of the absence of a common citation standard. As a consequence, the FORCE11 recently proposed data and software citation principles as guidance for authors. Zenodo is recognized for the implementation of DOIs for software on a large scale. The minting of complementary DOIs for the version and concept allows measuring the impact of dynamic software. This article investigates characteristics of 5,456 citations to Zenodo data and software that were captured by the Asclepias Broker in January 2019. We analyzed the current state of data and software citation practices and the quality of software citation recommendations with regard to the impact of recent standardization efforts. Our findings prove that current citation practices and recommendations do not match proposed citation standards. We consequently suggest practical first steps towards the implementation of the software citation principles.
△ Less
Submitted 1 November, 2019;
originally announced November 2019.
-
Increasing the Discovery Space in Astrophysics - A Collation of Six Submitted White Papers
Authors:
G. Fabbiano,
M. Elvis,
A. Accomazzi,
G. B. Berriman,
N. Brickhouse,
S. Bose,
D. Carrera,
I. Chilingarian,
F. Civano,
B. Czerny,
R. D'Abrusco,
B. Diemer,
J. Drake,
R. Emami Meibody,
J. R. Farah,
G. G. Fazio,
E. Feigelson,
F. Fornasini,
Jay Gallagher,
J. Grindlay,
L. Hernquist,
D. J. James,
M. Karovska,
V. Kashyap,
D. -W. Kim
, et al. (24 additional authors not shown)
Abstract:
We write in response to the call from the 2020 Decadal Survey to submit white papers illustrating the most pressing scientific questions in astrophysics for the coming decade. We propose exploration as the central question for the Decadal Committee's discussions.The history of astronomy shows that paradigm changing discoveries are not driven by well formulated scientific questions, based on the kn…
▽ More
We write in response to the call from the 2020 Decadal Survey to submit white papers illustrating the most pressing scientific questions in astrophysics for the coming decade. We propose exploration as the central question for the Decadal Committee's discussions.The history of astronomy shows that paradigm changing discoveries are not driven by well formulated scientific questions, based on the knowledge of the time. They were instead the result of the increase in discovery space fostered by new telescopes and instruments. An additional tool for increasing the discovery space is provided by the analysis and mining of the increasingly larger amount of archival data available to astronomers. Revolutionary observing facilities, and the state of the art astronomy archives needed to support these facilities, will open up the universe to new discovery. Here we focus on exploration for compact objects and multi messenger science. This white paper includes science examples of the power of the discovery approach, encompassing all the areas of astrophysics covered by the 2020 Decadal Survey.
△ Less
Submitted 18 March, 2019; v1 submitted 15 March, 2019;
originally announced March 2019.
-
From Dark Energy to Exolife: Improving the Digital Information Infrastructure for Astrophysics
Authors:
Michael J. Kurtz,
Alberto Accomazzi
Abstract:
Some of the most exciting and promising areas of Astronomy research today are found at the boundaries of the discipline: the search for Exoplanets and Multi-Messenger Astronomy. In order to achieve breakthroughs in these research fields over the next decade, innovation and expansion of the digital information infrastructure which supports this research is required. Astronomy has been well-served b…
▽ More
Some of the most exciting and promising areas of Astronomy research today are found at the boundaries of the discipline: the search for Exoplanets and Multi-Messenger Astronomy. In order to achieve breakthroughs in these research fields over the next decade, innovation and expansion of the digital information infrastructure which supports this research is required. Astronomy has been well-served by the existence of an open, distributed network of data centers and archives. However, institutional barriers and differing research cultures have prevented cross-disciplinary collaborations, creating fragmented knowledge and stove-piped research activities. This must change in order for the broader community of scientists to work together and solve our most ambitious decadal challenges. Interdisciplinary inquiry is best supported by bringing researchers together at the information discovery level. In order to cross the traditional disciplinary silos we must allow scientists both to explore new ideas and to gain access to new data and knowledge. This is best enabled by providing discovery platforms which allow them to explore and connect different research threads in the literature, identify communities of experts, access and analyze the related published datasets, measurements and catalogs.
△ Less
Submitted 1 March, 2019;
originally announced March 2019.
-
Fundamentals of effective cloud management for the new NASA Astrophysics Data System
Authors:
Sergi Blanco-Cuaresma,
Alberto Accomazzi,
Michael J. Kurtz,
Edwin Henneken,
Carolyn S. Grant,
Donna M. Thompson,
Roman Chyla,
Stephen McDonald,
Golnaz Shapurian,
Timothy W. Hostetler,
Matthew R. Templeton,
Kelly E. Lockhart,
Kris Bukovi,
Nathan Rapport
Abstract:
The new NASA Astrophysics Data System (ADS) is designed with a serviceoriented architecture (SOA) that consists of multiple customized Apache Solr search engine instances plus a collection of microservices, containerized using Docker, and deployed in Amazon Web Services (AWS). For complex systems, like the ADS, this loosely coupled architecture can lead to a more scalable, reliable and resilient s…
▽ More
The new NASA Astrophysics Data System (ADS) is designed with a serviceoriented architecture (SOA) that consists of multiple customized Apache Solr search engine instances plus a collection of microservices, containerized using Docker, and deployed in Amazon Web Services (AWS). For complex systems, like the ADS, this loosely coupled architecture can lead to a more scalable, reliable and resilient system if some fundamental questions are addressed. After having experimented with different AWS environments and deployment methods, we decided in December 2017 to go with Kubernetes as our container orchestration. Defining the best strategy to properly setup Kubernetes has shown to be challenging: automatic scaling services and load balancing traffic can lead to errors whose origin is difficult to identify, monitoring and logging the activity that happens across multiple layers for a single request needs to be carefully addressed, and the best workflow for a Continuous Integration and Delivery (CI/CD) system is not self-evident. We present here how we tackle these challenges and our plans for the future.
△ Less
Submitted 16 January, 2019;
originally announced January 2019.
-
Merging the Astrophysics and Planetary Science Information Systems
Authors:
Michael J. Kurtz,
Alberto Accomazzi,
Edwin A. Henneken
Abstract:
Conceptually exoplanet research has one foot in the discipline of Astrophysics and the other foot in Planetary Science. Research strategies for exoplanets will require efficient access to data and information from both realms. Astrophysics has a sophisticated, well integrated, distributed information system with archives and data centers which are interlinked with the technical literature via the…
▽ More
Conceptually exoplanet research has one foot in the discipline of Astrophysics and the other foot in Planetary Science. Research strategies for exoplanets will require efficient access to data and information from both realms. Astrophysics has a sophisticated, well integrated, distributed information system with archives and data centers which are interlinked with the technical literature via the Astrophysics Data System (ADS). The information system for Planetary Science does not have a central component linking the literature with the observational and theoretical data. Here we propose that the Committee on an Exoplanet Science Strategy recommend that this linkage be built, with the ADS playing the role in Planetary Science which it already plays in Astrophysics. This will require additional resources for the ADS, and the Planetary Data System (PDS), as well as other international collaborators
△ Less
Submitted 9 March, 2018;
originally announced March 2018.
-
The Unified Astronomy Thesaurus: Semantic Metadata for Astronomy and Astrophysics
Authors:
Katie Frey,
Alberto Accomazzi
Abstract:
Several different controlled vocabularies have been developed and used by the astronomical community, each designed to serve a specific need and a specific group. The Unified Astronomy Thesaurus (UAT) attempts to provide a highly structured controlled vocabulary that will be relevant and useful across the entire discipline, regardless of content or platform. As two major use cases for the UAT incl…
▽ More
Several different controlled vocabularies have been developed and used by the astronomical community, each designed to serve a specific need and a specific group. The Unified Astronomy Thesaurus (UAT) attempts to provide a highly structured controlled vocabulary that will be relevant and useful across the entire discipline, regardless of content or platform. As two major use cases for the UAT include classifying articles and data, we examine the UAT in comparison with the Astronomical Subject Keywords used by major publications and the JWST Science Keywords used by STScI's Astronomer's Proposal Tool.
△ Less
Submitted 3 January, 2018;
originally announced January 2018.
-
Multilingual Topic Models
Authors:
Kriste Krstovski,
Michael J. Kurtz,
David A. Smith,
Alberto Accomazzi
Abstract:
Scientific publications have evolved several features for mitigating vocabulary mismatch when indexing, retrieving, and computing similarity between articles. These mitigation strategies range from simply focusing on high-value article sections, such as titles and abstracts, to assigning keywords, often from controlled vocabularies, either manually or through automatic annotation. Various document…
▽ More
Scientific publications have evolved several features for mitigating vocabulary mismatch when indexing, retrieving, and computing similarity between articles. These mitigation strategies range from simply focusing on high-value article sections, such as titles and abstracts, to assigning keywords, often from controlled vocabularies, either manually or through automatic annotation. Various document representation schemes possess different cost-benefit tradeoffs. In this paper, we propose to model different representations of the same article as translations of each other, all generated from a common latent representation in a multilingual topic model. We start with a methodological overview on latent variable models for parallel document representations that could be used across many information science tasks. We then show how solving the inference problem of mapping diverse representations into a shared topic space allows us to evaluate representations based on how topically similar they are to the original article. In addition, our proposed approach provides means to discover where different concept vocabularies require improvement.
△ Less
Submitted 18 December, 2017;
originally announced December 2017.
-
New ADS Functionality for the Curator
Authors:
Alberto Accomazzi,
Michael J. Kurtz,
Edwin A. Henneken,
Carolyn S. Grant,
Donna M. Thompson,
Roman Chyla,
Steven McDonald,
Taylor J. Shaulis,
Sergi Blanco-Cuaresma,
Golnaz Shapurian,
Timothy W. Hostetler,
Matthew R. Templeton
Abstract:
In this paper we provide an update concerning the operations of the NASA Astrophysics Data System (ADS), its services and user interface, and the content currently indexed in its database. As the primary information system used by researchers in Astronomy, the ADS aims to provide a comprehensive index of all scholarly resources appearing in the literature. With the current effort in our community…
▽ More
In this paper we provide an update concerning the operations of the NASA Astrophysics Data System (ADS), its services and user interface, and the content currently indexed in its database. As the primary information system used by researchers in Astronomy, the ADS aims to provide a comprehensive index of all scholarly resources appearing in the literature. With the current effort in our community to support data and software citations, we discuss what steps the ADS is taking to provide the needed infrastructure in collaboration with publishers and data providers. A new API provides access to the ADS search interface, metrics, and libraries allowing users to programmatically automate discovery and curation tasks. The new ADS interface supports a greater integration of content and services with a variety of partners, including ORCID claiming, indexing of SIMBAD objects, and article graphics from a variety of publishers. Finally, we highlight how librarians can facilitate the ingest of gray literature that they curate into our system.
△ Less
Submitted 23 October, 2017;
originally announced October 2017.
-
NASA's Long-Term Astrophysics Data Archives
Authors:
L. M. Rebull,
V. Desai,
H. Teplitz,
S. Groom,
R. Akeson,
G. B. Berriman,
G. Helou,
D. Imel,
J. M. Mazzarella,
A. Accomazzi,
T. McGlynn,
A. Smale,
R. White
Abstract:
NASA regards data handling and archiving as an integral part of space missions, and has a strong track record of serving astrophysics data to the public, beginning with the the IRAS satellite in 1983. Archives enable a major science return on the significant investment required to develop a space mission. In fact, the presence and accessibility of an archive can more than double the number of pape…
▽ More
NASA regards data handling and archiving as an integral part of space missions, and has a strong track record of serving astrophysics data to the public, beginning with the the IRAS satellite in 1983. Archives enable a major science return on the significant investment required to develop a space mission. In fact, the presence and accessibility of an archive can more than double the number of papers resulting from the data. In order for the community to be able to use the data, they have to be able to find the data (ease of access) and interpret the data (ease of use). Funding of archival research (e.g., the ADAP program) is also important not only for making scientific progress, but also for encouraging authors to deliver data products back to the archives to be used in future studies. NASA has also enabled a robust system that can be maintained over the long term, through technical innovation and careful attention to resource allocation. This article provides a brief overview of some of NASA's major astrophysics archive systems, including IRSA, MAST, HEASARC, KOA, NED, the Exoplanet Archive, and ADS.
△ Less
Submitted 27 September, 2017;
originally announced September 2017.
-
Aggregation and Linking of Observational Metadata in the ADS
Authors:
Alberto Accomazzi,
Michael J. Kurtz,
Edwin A. Henneken,
Carolyn S. Grant,
Donna M. Thompson,
Roman Chyla,
Alexandra Holachek,
Jonathan Elliott
Abstract:
We discuss current efforts behind the curation of observing proposals, archive bibliographies, and data links in the NASA Astrophysics Data System (ADS). The primary data in the ADS is the bibliographic content from scholarly articles in Astronomy and Physics, which ADS aggregates from publishers, arXiv and conference proceeding sites. This core bibliographic information is then further enriched b…
▽ More
We discuss current efforts behind the curation of observing proposals, archive bibliographies, and data links in the NASA Astrophysics Data System (ADS). The primary data in the ADS is the bibliographic content from scholarly articles in Astronomy and Physics, which ADS aggregates from publishers, arXiv and conference proceeding sites. This core bibliographic information is then further enriched by ADS via the generation of citations and usage data, and through the aggregation of external resources from astronomy data archives and libraries. Important sources of such additional information are the metadata describing observing proposals and high level data products, which, once ingested in ADS, become easily discoverable and citeable by the science community. Bibliographic studies have shown that the integration of links between data archives and the ADS provides greater visibility to data products and increased citations to the literature associated with them.
△ Less
Submitted 28 January, 2016;
originally announced January 2016.
-
ADS 2.0: new architecture, API and services
Authors:
Roman Chyla,
Alberto Accomazzi,
Alexandra Holachek,
Carolyn S. Grant,
Jonathan Elliott,
Edwin A. Henneken,
Donna M. Thompson,
Michael J. Kurtz,
Stephen S. Murray,
Vladimir Sudilovsky
Abstract:
The ADS platform is undergoing the biggest rewrite of its 20-year history. While several components have been added to its architecture over the past couple of years, this talk will concentrate on the underpinnings of ADS's search layer and its API. To illustrate the design of the components in the new system, we will show how the new ADS user interface is built exclusively on top of the API using…
▽ More
The ADS platform is undergoing the biggest rewrite of its 20-year history. While several components have been added to its architecture over the past couple of years, this talk will concentrate on the underpinnings of ADS's search layer and its API. To illustrate the design of the components in the new system, we will show how the new ADS user interface is built exclusively on top of the API using RESTful web services. Taking one step further, we will discuss how we plan to expose the treasure trove of information hosted by ADS (10 million records and fulltext for much of the Astronomy and Physics refereed literature) to partners interested in using this API. This will provide you (and your intelligent applications) with access to ADS's underlying data to enable the extraction of new knowledge and the ingestion of these results back into the ADS. Using this framework, researchers could run controlled experiments with content extraction, machine learning, natural language processing, etc. In this talk, we will discuss what is already implemented, what will be available soon, and where we are going next.
△ Less
Submitted 19 March, 2015;
originally announced March 2015.
-
ADS: The Next Generation Search Platform
Authors:
Alberto Accomazzi,
Michael J. Kurtz,
Edwin A. Henneken,
Roman Chyla,
James Luker,
Carolyn S. Grant,
Donna M. Thompson,
Alexandra Holachek,
Rahul Dave,
Stephen S. Murray
Abstract:
Four years after the last LISA meeting, the NASA Astrophysics Data System (ADS) finds itself in the middle of major changes to the infrastructure and contents of its database. In this paper we highlight a number of features of great importance to librarians and discuss the additional functionality that we are currently developing. Starting in 2011, the ADS started to systematically collect, parse…
▽ More
Four years after the last LISA meeting, the NASA Astrophysics Data System (ADS) finds itself in the middle of major changes to the infrastructure and contents of its database. In this paper we highlight a number of features of great importance to librarians and discuss the additional functionality that we are currently developing. Starting in 2011, the ADS started to systematically collect, parse and index full-text documents for all the major publications in Physics and Astronomy as well as many smaller Astronomy journals and arXiv e-prints, for a total of over 3.5 million papers. Our citation coverage has doubled since 2010 and now consists of over 70 million citations. We are normalizing the affiliation information in our records and, in collaboration with the CfA library and NASA, we have started collecting and linking funding sources with papers in our system. At the same time, we are undergoing major technology changes in the ADS platform which affect all aspects of the system and its operations. We have rolled out and are now enhancing a new high-performance search engine capable of performing full-text as well as metadata searches using an intuitive query language which supports fielded, unfielded and functional searches. We are currently able to index acknowledgments, affiliations, citations, funding sources, and to the extent that these metadata are available to us they are now searchable under our new platform. The ADS private library system is being enhanced to support reading groups, collaborative editing of lists of papers, tagging, and a variety of privacy settings when managing one's paper collection. While this effort is still ongoing, some of its benefits are already available through the ADS Labs user interface and API at http://adslabs.org/adsabs/
△ Less
Submitted 13 March, 2015;
originally announced March 2015.
-
Computing and Using Metrics in the ADS
Authors:
Edwin A. Henneken,
Alberto Accomazzi,
Michael J. Kurtz,
Carolyn S. Grant,
Donna Thompson,
Jay Luker,
Roman Chyla,
Alexandra Holachek,
Stephen S. Murray
Abstract:
Finding measures for research impact, be it for individuals, institutions, instruments or projects, has gained a lot of popularity. More papers than ever are being written on new impact measures, and problems with existing measures are being pointed out on a regular basis. Funding agencies require impact statistics in their reports, job candidates incorporate them in their resumes, and publication…
▽ More
Finding measures for research impact, be it for individuals, institutions, instruments or projects, has gained a lot of popularity. More papers than ever are being written on new impact measures, and problems with existing measures are being pointed out on a regular basis. Funding agencies require impact statistics in their reports, job candidates incorporate them in their resumes, and publication metrics have even been used in at least one recent court case. To support this need for research impact indicators, the SAO/NASA Astrophysics Data System (ADS) has developed a service which provides a broad overview of various impact measures. In this presentation we discuss how the ADS can be used to quench the thirst for impact measures. We will also discuss a couple of the lesser known indicators in the metrics overview and the main issues to be aware of when compiling publication-based metrics in the ADS, namely author name ambiguity and citation incompleteness.
△ Less
Submitted 17 June, 2014;
originally announced June 2014.
-
The Unified Astronomy Thesaurus
Authors:
Alberto Accomazzi,
Norman Gray,
Chris Erdmann,
Chris Biemesderfer,
Katie Frey,
Justin Soles
Abstract:
The Unified Astronomy Thesaurus (UAT) is an open, interoperable and community-supported thesaurus which unifies the existing divergent and isolated Astronomy & Astrophysics vocabularies into a single high-quality, freely-available open thesaurus formalizing astronomical concepts and their inter-relationships. The UAT builds upon the existing IAU Thesaurus with major contributions from the astronom…
▽ More
The Unified Astronomy Thesaurus (UAT) is an open, interoperable and community-supported thesaurus which unifies the existing divergent and isolated Astronomy & Astrophysics vocabularies into a single high-quality, freely-available open thesaurus formalizing astronomical concepts and their inter-relationships. The UAT builds upon the existing IAU Thesaurus with major contributions from the astronomy portions of the thesauri developed by the Institute of Physics Publishing, the American Institute of Physics, and SPIE. We describe the effort behind the creation of the UAT and the process through which we plan to maintain the document updated through broad community participation.
△ Less
Submitted 26 March, 2014;
originally announced March 2014.
-
Astronomy and Computing: a New Journal for the Astronomical Computing Community
Authors:
Alberto Accomazzi,
Tamás Budavári,
Christopher Fluke,
Norman Gray,
Robert G Mann,
William O'Mullane,
Andreas Wicenec,
Michael Wise
Abstract:
We introduce \emph{Astronomy and Computing}, a new journal for the growing population of people working in the domain where astronomy overlaps with computer science and information technology. The journal aims to provide a new communication channel within that community, which is not well served by current journals, and to help secure recognition of its true importance within modern astronomy. In…
▽ More
We introduce \emph{Astronomy and Computing}, a new journal for the growing population of people working in the domain where astronomy overlaps with computer science and information technology. The journal aims to provide a new communication channel within that community, which is not well served by current journals, and to help secure recognition of its true importance within modern astronomy. In this inaugural editorial, we describe the rationale for creating the journal, outline its scope and ambitions, and seek input from the community in defining in detail how the journal should work towards its high-level goals.
△ Less
Submitted 30 October, 2012;
originally announced October 2012.
-
Telescope Bibliographies: an Essential Component of Archival Data Management and Operations
Authors:
Alberto Accomazzi,
Edwin Henneken,
Christopher Erdmann,
Arnold Rots
Abstract:
Assessing the impact of astronomical facilities rests upon an evaluation of the scientific discoveries which their data have enabled. Telescope bibliographies, which link data products with the literature, provide a way to use bibliometrics as an impact measure for the underlying data. In this paper we argue that the creation and maintenance of telescope bibliographies should be considered an inte…
▽ More
Assessing the impact of astronomical facilities rests upon an evaluation of the scientific discoveries which their data have enabled. Telescope bibliographies, which link data products with the literature, provide a way to use bibliometrics as an impact measure for the underlying data. In this paper we argue that the creation and maintenance of telescope bibliographies should be considered an integral part of an observatory's operations. We review the existing tools, services, and workflows which support these curation activities, giving an estimate of the effort and expertise required to maintain an archive-based telescope bibliography.
△ Less
Submitted 30 July, 2012; v1 submitted 27 June, 2012;
originally announced June 2012.
-
Why don't we already have an Integrated Framework for the Publication and Preservation of all Data Products?
Authors:
Alberto Accomazzi,
Sebastien Derriere,
Chris Biemesderfer,
Norman Gray
Abstract:
Astronomy has long had a working network of archives supporting the curation of publications and data. The discipline has already created many of the features which perplex other areas of science: (1) data repositories: (supra)national institutes, dedicated to large projects; a culture of user-contributed data; practical experience of long-term data preservation; (2) dataset identifiers: the commu…
▽ More
Astronomy has long had a working network of archives supporting the curation of publications and data. The discipline has already created many of the features which perplex other areas of science: (1) data repositories: (supra)national institutes, dedicated to large projects; a culture of user-contributed data; practical experience of long-term data preservation; (2) dataset identifiers: the community has already piloted experiments, knows what can undermine these efforts, and is participating in the development of next-generation standards; (3) citation of datasets in papers: the community has an innovative and expanding infrastructure for the curation of data and bibliographic resources, and through them a community of author s and editors familiar with such electronic publication efforts; as well, it has experimented with next-generation web standards (e.g. the Semantic Web); (4) publisher buy-in: publishers in this area have been willing to innovate within the constraints of their commercial imperatives. What can possibly be missing? Why don't we have an integrated framework for the publication and preservation of all data products already? Are there technical barriers? We don't believe so. Are there cultural or commercial forces inhibiting this? We aren't aware of any. This Birds of a Feather session (BoF) attempted to identify existing barriers to the creation of such a framework, and attempted to identify the parties or groups which can contribute to the creation of a VO-powered data-publishing framework.
△ Less
Submitted 7 December, 2011;
originally announced December 2011.
-
Linking to Data - Effect on Citation Rates in Astronomy
Authors:
Edwin A. Henneken,
Alberto Accomazzi
Abstract:
Is there a difference in citation rates between articles that were published with links to data and articles that were not? Besides being interesting from a purely academic point of view, this question is also highly relevant for the process of furthering science. Data sharing not only helps the process of verification of claims, but also the discovery of new findings in archival data. However, li…
▽ More
Is there a difference in citation rates between articles that were published with links to data and articles that were not? Besides being interesting from a purely academic point of view, this question is also highly relevant for the process of furthering science. Data sharing not only helps the process of verification of claims, but also the discovery of new findings in archival data. However, linking to data still is a far cry away from being a "practice", especially where it comes to authors providing these links during the writing and submission process. You need to have both a willingness and a publication mechanism in order to create such a practice. Showing that articles with links to data get higher citation rates might increase the willingness of scientists to take the extra steps of linking data sources to their publications. In this presentation we will show this is indeed the case: articles with links to data result in higher citation rates than articles without such links. The ADS is funded by NASA Grant NNX09AB39G.
△ Less
Submitted 15 November, 2011;
originally announced November 2011.
-
The ADS in the Information Age - Impact on Discovery
Authors:
Edwin A. Henneken,
Michael J. Kurtz,
Alberto Accomazzi
Abstract:
The SAO/NASA Astrophysics Data System (ADS) grew up with and has been riding the waves of the Information Age, closely monitoring and anticipating the needs of its end-users. By now, all professional astronomers are using the ADS on a daily basis, and a substantial fraction have been using it for their entire professional career. In addition to being an indispensable tool for professional scientis…
▽ More
The SAO/NASA Astrophysics Data System (ADS) grew up with and has been riding the waves of the Information Age, closely monitoring and anticipating the needs of its end-users. By now, all professional astronomers are using the ADS on a daily basis, and a substantial fraction have been using it for their entire professional career. In addition to being an indispensable tool for professional scientists, the ADS also moved into the public domain, as a tool for science education. In this paper we will highlight and discuss some aspects indicative of the impact the ADS has had on research and the access to scholarly publications.
The ADS is funded by NASA Grant NNX09AB39G
△ Less
Submitted 28 June, 2011;
originally announced June 2011.
-
Semantic Interlinking of Resources in the Virtual Observatory Era
Authors:
Alberto Accomazzi,
Rahul Dave
Abstract:
In the coming era of data-intensive science, it will be increasingly important to be able to seamlessly move between scientific results, the data analyzed in them, and the processes used to produce them. As observations, derived data products, publications, and object metadata are curated by different projects and archived in different locations, establishing the proper linkages between these reso…
▽ More
In the coming era of data-intensive science, it will be increasingly important to be able to seamlessly move between scientific results, the data analyzed in them, and the processes used to produce them. As observations, derived data products, publications, and object metadata are curated by different projects and archived in different locations, establishing the proper linkages between these resources and describing their relationships becomes an essential activity in their curation and preservation. In this paper we describe initial efforts to create a semantic knowledge base allowing easier integration and linking of the body of heterogeneous astronomical resources which we call the Virtual Observatory (VO). The ultimate goal of this effort is the creation of a semantic layer over existing resources, allowing applications to cross boundaries between archives. The proposed approach follows the current best practices in Semantic Computing and the architecture of the web, allowing the use of off-the-shelf technologies and providing a path for VO resources to become part of the global web of linked data.
△ Less
Submitted 30 March, 2011;
originally announced March 2011.
-
Linking Literature and Data: Status Report and Future Efforts
Authors:
Alberto Accomazzi
Abstract:
In the current era of data-intensive science, it is increasingly important for researchers to be able to have access to published results, the supporting data, and the processes used to produce them. Six years ago, recognizing this need, the American Astronomical Society and the Astrophysics Data Centers Executive Committee (ADEC) sponsored an effort to facilitate the annotation and linking of dat…
▽ More
In the current era of data-intensive science, it is increasingly important for researchers to be able to have access to published results, the supporting data, and the processes used to produce them. Six years ago, recognizing this need, the American Astronomical Society and the Astrophysics Data Centers Executive Committee (ADEC) sponsored an effort to facilitate the annotation and linking of datasets during the publishing process, with limited success. I will review the status of this effort and describe a new, more general one now being considered in the context of the Virtual Astronomical Observatory.
△ Less
Submitted 22 March, 2011;
originally announced March 2011.
-
Astronomy 3.0 Style
Authors:
Alberto Accomazzi
Abstract:
Over the next decade we will witness the development of a new infrastructure in support of data-intensive scientific research, which includes Astronomy. This new networked environment will offer both challenges and opportunities to our community and has the potential to transform the way data are described, curated and preserved. Based on the lessons learned during the development and management o…
▽ More
Over the next decade we will witness the development of a new infrastructure in support of data-intensive scientific research, which includes Astronomy. This new networked environment will offer both challenges and opportunities to our community and has the potential to transform the way data are described, curated and preserved. Based on the lessons learned during the development and management of the ADS, a case is made for adopting the emerging technologies and practices of the Semantic Web to support the way Astronomy research will be conducted. Examples of how small, incremental steps can, in the aggregate, make a significant difference in the provision and repurposing of astronomical data are provided.
△ Less
Submitted 3 June, 2010;
originally announced June 2010.
-
Finding Your Literature Match -- A Recommender System
Authors:
Edwin A. Henneken,
Michael J. Kurtz,
Alberto Accomazzi,
Carolyn Grant,
Donna Thompson,
Elizabeth Bohlen,
Giovanni Di Milia,
Jay Luker,
Stephen S. Murray
Abstract:
The universe of potentially interesting, searchable literature is expanding continuously. Besides the normal expansion, there is an additional influx of literature because of interdisciplinary boundaries becoming more and more diffuse. Hence, the need for accurate, efficient and intelligent search tools is bigger than ever. Even with a sophisticated search engine, looking for information can still…
▽ More
The universe of potentially interesting, searchable literature is expanding continuously. Besides the normal expansion, there is an additional influx of literature because of interdisciplinary boundaries becoming more and more diffuse. Hence, the need for accurate, efficient and intelligent search tools is bigger than ever. Even with a sophisticated search engine, looking for information can still result in overwhelming results. An overload of information has the intrinsic danger of scaring visitors away, and any organization, for-profit or not-for-profit, in the business of providing scholarly information wants to capture and keep the attention of its target audience. Publishers and search engine engineers alike will benefit from a service that is able to provide visitors with recommendations that closely meet their interests. Providing visitors with special deals, new options and highlights may be interesting to a certain degree, but what makes more sense (especially from a commercial point of view) than to let visitors do most of the work by the mere action of making choices? Hiring psychics is not an option, so a technological solution is needed to recommend items that a visitor is likely to be looking for. In this presentation we will introduce such a solution and argue that it is practically feasible to incorporate this approach into a useful addition to any information retrieval system with enough usage.
△ Less
Submitted 13 May, 2010;
originally announced May 2010.
-
Towards a Resource-Centric Data Network for Astronomy
Authors:
Alberto Accomazzi,
Michael J. Kurtz,
Stephen S. Murray
Abstract:
Over the past decade, astronomers have been using an increasingly larger number of web-based applications and archives to conduct their research. However, despite the early success in creating links across projects and data centers, the promise of a single integrated digital library environment supporting e-science in astronomy has proven elusive. While some of the issues hampering progress in t…
▽ More
Over the past decade, astronomers have been using an increasingly larger number of web-based applications and archives to conduct their research. However, despite the early success in creating links across projects and data centers, the promise of a single integrated digital library environment supporting e-science in astronomy has proven elusive. While some of the issues hampering progress in this area are of technical nature, others are rooted in existing policies which should be re-analyzed if further rapid progress is to be made in this area. This paper describes a proposal that the NASA Astrophysics Data System project has put forth in order to improve its role as one of the primary discovery portals for astronomers, focusing on those aspects which could benefit from an increased level of involvement from the community, namely the effort to expose astronomy resources as linked data, and the harvesting of observational metadata.
△ Less
Submitted 11 May, 2010;
originally announced May 2010.
-
Using Multipartite Graphs for Recommendation and Discovery
Authors:
Michael J. Kurtz,
Alberto Accomazzi,
Edwin Henneken,
Giovanni Di Milia,
Carolyn S. Grant
Abstract:
The Smithsonian/NASA Astrophysics Data System exists at the nexus of a dense system of interacting and interlinked information networks. The syntactic and the semantic content of this multipartite graph structure can be combined to provide very specific research recommendations to the scientist/user.
The Smithsonian/NASA Astrophysics Data System exists at the nexus of a dense system of interacting and interlinked information networks. The syntactic and the semantic content of this multipartite graph structure can be combined to provide very specific research recommendations to the scientist/user.
△ Less
Submitted 30 December, 2009;
originally announced December 2009.
-
The Bibliometric Properties of Article Readership Information
Authors:
Michael J. Kurtz,
Guenther Eichhorn,
Alberto Accomazzi,
Carolyn S. Grant,
Markus Demleitner,
Stephen S. Murray,
Nathalie Martimbeau,
Barbara Elwell
Abstract:
The NASA Astrophysics Data System (ADS), along with astronomy's journals and data centers (a collaboration dubbed URANIA), has developed a distributed on-line digital library which has become the dominant means by which astronomers search, access and read their technical literature. Digital libraries such as the NASA Astrophysics Data System permit the easy accumulation of a new type of bibliome…
▽ More
The NASA Astrophysics Data System (ADS), along with astronomy's journals and data centers (a collaboration dubbed URANIA), has developed a distributed on-line digital library which has become the dominant means by which astronomers search, access and read their technical literature. Digital libraries such as the NASA Astrophysics Data System permit the easy accumulation of a new type of bibliometric measure, the number of electronic accesses (``reads'') of individual articles. We explore various aspects of this new measure. We examine the obsolescence function as measured by actual reads, and show that it can be well fit by the sum of four exponentials with very different time constants. We compare the obsolescence function as measured by readership with the obsolescence function as measured by citations. We find that the citation function is proportional to the sum of two of the components of the readership function. This proves that the normative theory of citation is true in the mean. We further examine in detail the similarities and differences between the citation rate, the readership rate and the total citations for individual articles, and discuss some of the causes. Using the number of reads as a bibliometric measure for individuals, we introduce the read-cite diagram to provide a two-dimensional view of an individual's scientific productivity. We develop a simple model to account for an individual's reads and cites and use it to show that the position of a person in the read-cite diagram is a function of age, innate productivity, and work history. We show the age biases of both reads and cites, and develop two new bibliometric measures which have substantially less age bias than citations
△ Less
Submitted 25 September, 2009;
originally announced September 2009.
-
Worldwide Use and Impact of the NASA Astrophysics Data System Digital Library
Authors:
Michael J. Kurtz,
Guenther Eichhorn,
Alberto Accomazzi,
Carolyn Grant,
Markus Demleitner,
Stephen S. Murray
Abstract:
By combining data from the text, citation, and reference databases with data from the ADS readership logs we have been able to create Second Order Bibliometric Operators, a customizable class of collaborative filters which permits substantially improved accuracy in literature queries.
Using the ADS usage logs along with membership statistics from the International Astronomical Union and data o…
▽ More
By combining data from the text, citation, and reference databases with data from the ADS readership logs we have been able to create Second Order Bibliometric Operators, a customizable class of collaborative filters which permits substantially improved accuracy in literature queries.
Using the ADS usage logs along with membership statistics from the International Astronomical Union and data on the population and gross domestic product (GDP) we develop an accurate model for world-wide basic research where the number of scientists in a country is proportional to the GDP of that country, and the amount of basic research done by a country is proportional to the number of scientists in that country times that country's per capita GDP.
We introduce the concept of utility time to measure the impact of the ADS/URANIA and the electronic astronomical library on astronomical research. We find that in 2002 it amounted to the equivalent of 736 FTE researchers, or $250 Million, or the astronomical research done in France.
Subject headings: digital libraries; bibliometrics; sociology of science; information retrieval
△ Less
Submitted 25 September, 2009;
originally announced September 2009.
-
The Smithsonian/NASA Astrophysics Data System (ADS) Decennial Report
Authors:
Michael J. Kurtz,
Alberto Accomazzi,
Stephen S. Murray
Abstract:
Eight years after the ADS first appeared the last decadal survey wrote: "NASA's initiative for the Astrophysics Data System has vastly increased the accessibility of the scientific literature for astronomers. NASA deserves credit for this valuable initiative and is urged to continue it." Here we summarize some of the changes concerning the ADS which have occurred in the past ten years, and we de…
▽ More
Eight years after the ADS first appeared the last decadal survey wrote: "NASA's initiative for the Astrophysics Data System has vastly increased the accessibility of the scientific literature for astronomers. NASA deserves credit for this valuable initiative and is urged to continue it." Here we summarize some of the changes concerning the ADS which have occurred in the past ten years, and we describe the current status of the ADS. We then point out two areas where the ADS is building an improved capability which could benefit from a policy statement of support in the ASTRO2010 report. These are: The Semantic Interlinking of Astronomy Observations and Datasets and The Indexing of the Full Text of Astronomy Research Publications.
△ Less
Submitted 18 March, 2009;
originally announced March 2009.
-
Use of Astronomical Literature - A Report on Usage Patterns
Authors:
Edwin A. Henneken,
Michael J. Kurtz,
Alberto Accomazzi,
Carolyn S. Grant,
Donna Thompson,
Elizabeth Bohlen,
Stephen S. Murray
Abstract:
In this paper we present a number of metrics for usage of the SAO/NASA Astrophysics Data System (ADS). Since the ADS is used by the entire astronomical community, these are indicative of how the astronomical literature is used. We will show how the use of the ADS has changed both quantitatively and qualitatively. We will also show that different types of users access the system in different ways…
▽ More
In this paper we present a number of metrics for usage of the SAO/NASA Astrophysics Data System (ADS). Since the ADS is used by the entire astronomical community, these are indicative of how the astronomical literature is used. We will show how the use of the ADS has changed both quantitatively and qualitatively. We will also show that different types of users access the system in different ways. Finally, we show how use of the ADS has evolved over the years in various regions of the world.
The ADS is funded by NASA Grant NNG06GG68G.
△ Less
Submitted 3 October, 2008; v1 submitted 1 August, 2008;
originally announced August 2008.
-
Finding Astronomical Communities Through Co-readership Analysis
Authors:
Edwin A. Henneken,
Michael J. Kurtz,
Guenther Eichhorn,
Alberto Accomazzi,
Carolyn S. Grant,
Donna Thompson,
Elizabeth Bohlen,
Stephen S. Murray
Abstract:
Whenever a large group of people are engaged in an activity, communities will form. The nature of these communities depends on the relationship considered. In the group of people who regularly use scholarly literature, a relationship like ``person i and person j have cited the same paper'' might reveal communities of people working in a particular field. On this poster, we will investigate the r…
▽ More
Whenever a large group of people are engaged in an activity, communities will form. The nature of these communities depends on the relationship considered. In the group of people who regularly use scholarly literature, a relationship like ``person i and person j have cited the same paper'' might reveal communities of people working in a particular field. On this poster, we will investigate the relationship ``person i and person j have read the same paper''. Using the data logs of the NASA/Smithsonian Astrophysics Data System (ADS), we first determine the population that will participate by requiring that a user queries the ADS at a certain rate. Next, we apply the relationship to this population. The result of this will be an abstract ``relationship space'', which we will describe in terms of various ``representations''. Examples of such ``representations'' are the projection of co-read vectors onto Principal Components and the spectral density of the co-read network. We will show that the co-read relationship results in structure, we will describe this structure and we will provide a first attempt in the classification of this structure in terms of astronomical communities.
The ADS is funded by NASA Grant NNG06GG68G.
△ Less
Submitted 5 January, 2007;
originally announced January 2007.
-
Closing the loop: Linking Datasets to Publications and Back
Authors:
Alberto Accomazzi,
Guenther Eichhorn,
Arnold Rots
Abstract:
With the mainstream adoption of references to datasets in astronomical manuscripts, researchers today are able to provide direct links from their papers to the original data that were used in their study. Following a process similar to the verification of references in manuscripts, publishers have been working with the NASA Astrophysics Data System (ADS) to validate and maintain links to these d…
▽ More
With the mainstream adoption of references to datasets in astronomical manuscripts, researchers today are able to provide direct links from their papers to the original data that were used in their study. Following a process similar to the verification of references in manuscripts, publishers have been working with the NASA Astrophysics Data System (ADS) to validate and maintain links to these datasets.
Similarly, many astronomical data centers have been tracking publications based on the observations that they archive, and have been working with the ADS to maintain links between their datasets and the bibliographic records in question. In addition to providing a valuable service to ADS users, maintaining these correlations allows the data centers to evaluate the scientific impact of their missions.
Until recently, these two activities have evolved in parallel on independent tracks, with ADS playing a central role in bridging the connection between publishers and data centers. However, the ADS is now implementing the capability for all parties involved to find out which data links have been published with which manuscripts, and vice versa. This will allow data centers to periodically harvest the ADS to find out if there are new papers which reference datasets available in their archives. In this paper we summarize the state of the dataset linking project and describe the new harvesting interface.
△ Less
Submitted 17 November, 2006;
originally announced November 2006.
-
Paper to Screen: Processing Historical Scans in the ADS
Authors:
Donna M. Thompson,
Alberto Accomazzi,
Guenther Eichhorn,
Carolyn Grant,
Edwin Henneken,
Michael J. Kurtz,
Elizabeth Bohlen,
Stephen S. Murray
Abstract:
The NASA Astrophysics Data System in conjunction with the Wolbach Library at the Harvard-Smithsonian Center for Astrophysics is working on a project to microfilm historical observatory publications. The microfilm is then scanned for inclusion in the ADS. The ADS currently contains over 700,000 scanned pages of volumes of historical literature. Many of these volumes lack clear pagination or other…
▽ More
The NASA Astrophysics Data System in conjunction with the Wolbach Library at the Harvard-Smithsonian Center for Astrophysics is working on a project to microfilm historical observatory publications. The microfilm is then scanned for inclusion in the ADS. The ADS currently contains over 700,000 scanned pages of volumes of historical literature. Many of these volumes lack clear pagination or other bibliographic data that are necessary to take advantage of the searching capabilities of the ADS. This paper will address some of the interesting challenges that needed to be resolved during the processing of the Observatory Reports included in the ADS.
△ Less
Submitted 5 October, 2006;
originally announced October 2006.
-
Data in the ADS -- Understanding How to Use it Better
Authors:
Carolyn S. Grant,
Alberto Accomazzi,
Donna Thompson,
Edwin Henneken,
Guenther Eichhorn,
Michael J. Kurtz,
Stephen S. Murray
Abstract:
The Smithsonian/NASA ADS Abstract Service contains a wealth of data for astronomers and librarians alike, yet the vast majority of usage consists of rudimentary searches. Hints on how to obtain more focused search results by using more of the various capabilities of the ADS are presented, including searching by affiliation. We also discuss the classification of articles by content and by referee…
▽ More
The Smithsonian/NASA ADS Abstract Service contains a wealth of data for astronomers and librarians alike, yet the vast majority of usage consists of rudimentary searches. Hints on how to obtain more focused search results by using more of the various capabilities of the ADS are presented, including searching by affiliation. We also discuss the classification of articles by content and by referee status.
The ADS is funded by NASA Grant NNG06GG68G-16613687.
△ Less
Submitted 5 October, 2006;
originally announced October 2006.