Project 35: FAIRX: Quantitative bias assessment in ELIXIR biomedical data resources

Abstract

The design of AI systems for health is a grand achievement of science and technology of our times. Nevertheless, such systems learn to perform specific tasks by processing extensive amounts of data that is produced and stored in large biomedical repositories. The quality and content of this data have an immense impact on what and how AI learns. If the data contains biases, such as skewed representation of certain categories or missing information, the application of AI can lead to discriminatory outcomes and propagate them into society, as we recently pointed out (Cirillo et al. NPJ Digit Med. 2020 doi:10.1038/s41746-020-0288-5). The aim of our project is to determine the extent of biases in available demographic categories (sex, age, race) in ELIXIR biomedical data repositories, which are largely used in the community to train AI systems. We aim to quantify bias and provide recommendations on how to properly use the data to develop fair and trustworthy AI, including solutions and best practices. We have recently collected endorsement and support regarding this project from representatives of several ELIXIR platforms, communities and focus groups, namely Data platform, Human Data Communities, Diversity, Equity, & Inclusion group, Impact group, Industry group and Communication.

Topics

Cancer Data Platform Federated Human Data Human Copy Number Variation Machine learning Rare Disease

Project Number: 35

EasyChair Number: 61

Team

Lead(s)

Davide Cirillo davide.cirillo@bsc.es Nataly Buslón nataly.buslon@bsc.es

Expected outcomes

Task 1. Quantification of bias in selected resources Task 2. Evaluation of social and ethical impact

Expected audience

ELIXIR data resources representatives especially designers, developers and data miners Computer scientists with database skills including development and data management Researchers in computational biology with strong programming background Researchers in social sciences with interests in biomedicine and technology Data scientists with strong analytical and statistical knowledge Bioinformaticians with knowledge on biological data resources Biostatisticians with interests in bias and data mining Researchers and practitioners in academic or industrial fields devoted to social equity

Qualitative analysis (policies & recommendations)

Nataly Buslón, subgroup spokesperson
Gemma Holliday
Atia Cortés

Quantitative analysis (dbGAP)

Useful links

FTP access to the dataset: http://ftp.ncbi.nlm.nih.gov/dbgap/studies
Study Submission Guide: https://www.ncbi.nlm.nih.gov/gap/docs/submissionguide/ and https://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/GetPdf.cgi?document_name=HowToSubmit.pdf

Non-NIH funded "expectations": https://osp.od.nih.gov/wp-content/uploads/Expectations_for_Non-NIH_Funded_Submission_Requests.pdf
Basic Requirements: https://osp.od.nih.gov/wp-content/uploads/Non-NIH-Funded_Basic_Study_Information.pdf
Template files: https://ftp.ncbi.nlm.nih.gov/dbgap/dbGaP_Submission_Guide_Templates/Individual_Submission_Templates/
Data Access: https://www.ncbi.nlm.nih.gov/books/NBK5294/ and https://osp.od.nih.gov/wp-content/uploads/NIH_Best_Practices_for_Controlled-Access_Data_Subject_to_the_NIH_GDS_Policy.pdf
Quality Control Errors: https://www.ncbi.nlm.nih.gov/gap/public_utils/messages/ and for the QC process: https://www.ncbi.nlm.nih.gov/gap/docs/submissionguide/#aqcchecks
FAQ: https://www.ncbi.nlm.nih.gov/books/NBK5295/

People

Davide Cirillo, subgroup spokesperson
María Morales
Alejandro Muñoz
Camila Pontes
Olivier Philippe

Quantitative analysis (EGA)

Useful links

API Metadata documentation: https://ega-archive.org/metadata/how-to-use-the-api
Policy documentation: https://ega-archive.org/submission/dac/documentation
Submitter Portal: https://ega-archive.org/submission/tools/submitter-portal
Quality Control Reports https://ega-archive.org/about/quality-control-reports
Implementation of the EU General Data Protection Regulation (GDPR): https://ega-archive.org/privacy-notice
Data Access: https://ega-archive.org/access/data-access
Download Client V3: https://ega-archive.org/download/downloader-quickguide-APIv3
Metadata Rest Endpoints: https://ega-archive.org/metadata/how-to-use-the-api

People

Aina Jené, subgroup spokesperson
Babita Singh
Mauricio Moldes
Victoria Ruiz
Diego Saby

Name		Name	Last commit message	Last commit date
Latest commit History 275 Commits
code_figures		code_figures
dbgap		dbgap
ega		ega
figures		figures
qualitative_analysis		qualitative_analysis
.Rhistory		.Rhistory
.gitignore		.gitignore
FAIRX-35.bib		FAIRX-35.bib
LICENSE		LICENSE
README.md		README.md
paper.bib		paper.bib
paper.html		paper.html
paper.md		paper.md
paper.pdf		paper.pdf
paper_OLD.bib		paper_OLD.bib
paper_OLD.md		paper_OLD.md
test.txt		test.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Project 35: FAIRX: Quantitative bias assessment in ELIXIR biomedical data resources

Abstract

Topics

Team

Lead(s)

Expected outcomes

Expected audience

Qualitative analysis (policies & recommendations)

Quantitative analysis (dbGAP)

Useful links

People

Quantitative analysis (EGA)

Useful links

People

About

Releases

Packages

Contributors 12

Languages

License

social-link-analytics-group-bsc/biohackathon-project-35

Folders and files

Latest commit

History

Repository files navigation

Project 35: FAIRX: Quantitative bias assessment in ELIXIR biomedical data resources

Abstract

Topics

Team

Lead(s)

Expected outcomes

Expected audience

Qualitative analysis (policies & recommendations)

Quantitative analysis (dbGAP)

Useful links

People

Quantitative analysis (EGA)

Useful links

People

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 12

Languages

Packages