Jump to content

BGI Group

From Wikipedia, the free encyclopedia

This is an old revision of this page, as edited by 204.9.220.41 (talk) at 20:49, 6 October 2010. The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

BGI Group
IndustryGenome Sequencing
FoundedSeptember 9, 1999 at 9:09:09
Number of locations
Hong Kong, Hangzhou, Beijing, Boston USA, Copenhagen Denmark,
Production output
>15,000 human genomes a year by 2011
Websitewww.genomics.org.cn

BGI (Chinese: 华大基因; pinyin: Huádà Jīyīn), known as the Beijing Genomics Institute prior to 2008, is one of the world’s premier genome sequencing centers. Its sequencing output is expected to soon surpass the equivalent of more than 10,000 human genomes per year.

BGI will receive US$1.5 billion in “collaborative funds” over the next 10 years from China Development Bank.[1]

Key Achievements

BGI is producing de novo, resequenced genomes, RNA-Seq, Epigenomics, Metagenomics and Proteomics at such a high volume that genomes and research of significant impact are not listed here. For a complete reference of BGI's contribution to science, read a list of genomes sequenced at the bottom of this article.

  • First to de novo sequence and assemble mammalian [2] and human genomes with short-read sequencing (so-called "next generation sequencing") [3]
  • Sequenced the first ancient human’s genome [4]
  • Sequenced the first diploid genome an Asian individual [5], as part of the Yan Huang project
  • Sequenced the first Giant Panda genome [6], equal in size to the human genome, in less than 8 months [7]
  • Initiated building a sequence map of the human pan-genome, estimated to contain 19-40 million bases not in the human reference genome [8][9]
  • Contributed 10% of sequence information for the Human HapMap Project
  • Contributed 1% of the Human Genome Project’s reference genome and was the only institute in the developing world to contribute to the project
  • Produced proof-of-principle study for sequencing the microbiome of the human digestive system, an estimated 150 times larger than the human genome [10][11]
  • Key player in the Sino-British Chicken Genome Project
  • Key sequencing center in the 1000 Genomes Project
  • First Chinese institution to sequence the SARS virus, just hours after the first sequencing of the virus by Canadians [12]

Current projects

Yan Huang Project

Named after two Emperors believed to have founded China’s dominant ethnic group[13], BGI plans to sequence at least 100 Chinese individuals to produce a high-resolution map of Chinese genetic polymorphisms.[14] The first genome sequenced is of an anonymous Chinese billionaire who donated $10 million RMB to the project.[15] And the first YanHuang genome project is named "YanHuang 1",whose genome data is published on http://yh.genomics.org.cn .

Symbiont Genome Project

A jointly funded project announced March 19, 2010, BGI will collaborate with Sidney K. Pierce of University of South Florida and Charles Delwiche of the University of Maryland at College Park to sequence the genomes of the sea slug, Elysia chlorotica, and its algal food Vaucheria litorea. The sea slug uses genes from the algae to synthesize chlorophyll, the first interspecies of gene transfer discovered. Sequencing their genomes could elucidate the mechanism of that transfer. [16]

1,000 Plant and Animal Reference Project

BGI is leading an international collaboration to sequence 1,000 plants and animals of economic and scientific import within two years. It has pledged an initial US$100 million to start the program.[17]

BGI has already sequenced genomes of 20 species of animals and 9 species of plants—sometimes for multiple individuals, such as 40 silkworms 19713493, and has an equal number underway as of March 2010. Visit the project’s website to monitor progress, see the species planned to be sequenced, and apply to join the effort [1].

International Big Cats Genome Project

BGI, Beijing University, Heilongjiang Manchurian tiger forestry zoo, Kung Ming Institute of Zoology, San Diego Zoo’s Institute for Conversation Research in California, and others will sequence the Amur tiger, South China tiger, Bengal tiger, Asiatic lion, African lion, cloud leopard, snow leopard, and other felines. BGI will also sequence a liger and tigon, which may modify the genetic definition of “hybrid” and “species”.[18]

The project will significantly advance conservation research and was auspiciously announced for the Chinese year of the Tiger.[19]

Three Extreme-Environment Animal Genomes Project

http://www.genomics.cn/en/research.php?type=show&id=330

Diabetes-associated Genes and Variations Study (LUCAMP) Cancer Genome Project

Nine Danish universities and institutes will collaborate with BGI in this targeted resequencing project.

BGI explores associated genome and gene variation in complexes diseases in large-scale studies primarily using two methods: PCR-based resequencing of candidate genes and exon-capture-based whole exome resequencing.

10,000 Microbial Genomes Project

http://english.cas.cn/Ne/CASE/200908/t20090805_44705.shtml

1,000 Genomes Project

http://www.1000genomes.org

Bioinformatics Technology

De novo sequencing requires aligning billions of short strings of DNA sequence into a full genome, itself three billion base pairs long for humans.

BGI’s computational biologists developed the first successful algorithm, based on graph theory, for aligning billions of 25 to 75-base pair strings produced by next-generation sequencers, specifically Illumina’s Genome Analyzer, during de novo sequencing. The algorithm, called SOAPdenovo, can assemble a genome in two days[20] and has been used to sequence an array of plant and animal genomes.

BGI’s 500-node supercomputer processes 10 terabytes of ray sequencing data every 24 hours from its current 30 or so Genome Analyzers from Illumina. The annual budget for the computer center if US$9 million.[21]

SOAPdenovo is part of Short Oligonucleotide Analysis Package (SOAP), a suite of tools developed by BGI for de novo assembly of human-sized genomes, alignment, SNP detection, resequencing, indel finding, and structural variation analysis. Built for the Illumina sequencers' short reads, SOAPdenovo has been used to assemble multiple human genomes[22][23][24] (identifying an eight kilobase insertion not detected by mapping to the human reference genome[25]) and animals, like the giant panda [26]. To download SOAP, visit http://soap.genomics.org.cn/. A discussion group is hosted on Google: http://groups.google.com/group/bgi-soap.

History

Founded September 9, 1999 at 9:09AM, an auspicious time in Chinese superstition, the institution has outgrown building after building—now spanning across offices around the globe.

In 2007, in accordance with BGI’s goal for developing projects and platforms that are on the cutting edge of research and technologies, the organization’s headquarters relocated to Shenzhen and founded the first citizen-managed, non-profit research institution in China. In October of that same year, BGI successfully completed the First Asian Diploid Genome Project, which was followed, in 2008, by the launch of the 1000 Genomes Project and the Giant Panda Genome Project. On June 19, 2008, with the support and approval of the Shenzhen municipal party committee and government, BGI-Shenzhen was officially recognized as a state agency.

The Institute has both a private and a public character. It receives funds both from private investors and the Chinese government. The laboratory is also the Bioinformatics Center of the Chinese Academy of Sciences. Beijing Huada Genomics Research Center was the precursor of BGI.

In October 2003, The Beijing Genome Institute Hangzhou (Zhejiang) branch and Zhejiang University founded a new research institute, the James D. Watson Institute of Genome Sciences. The Watson Institute is intended to become a major center for research and education in East Asia modeled after the Cold Spring Harbor Laboratory.

BGI Shenzhen received certification for meeting ISO9001:2008 requirements for design and provision of high-throughput sequencing services.[27]

Wang Jun, one of the leaders of BGI, is only 33 years old in 2010, when BGI became the largest genome sequencing center in the world.

Purchase of 128 Illumina HiSeq 2000 Sequencers

For context, see a world map (does not yet include BGI's most recent sequencer purchases) of high throughput sequencers. http://pathogenomics.bham.ac.uk/hts/

Genomes Sequenced

Plants

  • Rice
  • Cucumber [28]
  • Papaya
  • Grape
  • Soybean
  • Sorghm
  • Maize
  • Palm
  • Thale cress

Animals

  • 40 silkworm genomes [29]
  • Honey bee
  • African malaria mosquito
  • Water flea
  • Cow (200,000 base pairs longer than a human genome)
  • Giant panda
  • Dog
  • Opossum
  • Chicken
  • Platypus
  • Horse
  • Pacific cod
  • Zebrafish
  • Humans
  • Chimpanzees
  • Mouse
  • Rat
  • Roundworm
  • Fruitfly
  • Anole lizard

Silkworm Genome Project

BGI sequenced 40 domesticated and wild silkworms, identifying 354 genes likely important in domestication.[30]

Giant Panda Genome Project

Sequencing revealed that the giant panda, Ailuropoda melanoleura, has a frameshift mutation in a gene involved in sensing savory flavors, T1R1. The mutation might be the genetic reason why the panda prefers bamboo over meat. However, the panda also lacks genes expected for bamboo digestion, so its microbiome might play a key role in metabolizing its main source of food.[31]

Putting Throughput in Perspective

At full capacity in 2011, BGI will be able to sequence about 1.2 petabases in a year, or about 10,000 human genomes at high coverage.

  • Only about 100 genomes (of any animal, most being much smaller than a human genome) have been sequenced to date
  • That's over 400,000 times the text expected to be on Twitter by 2011 (amount of text based on http://popacular.com/gigatweet/)

See also

  1. ^ "BGI to Receive $1.5B in 'Collaborative Funds' Over 10 Years from China Development Bank". Retrieved 29 March 2010. {{cite web}}: Text "GenomeWeb" ignored (help); Text "In Sequence" ignored (help); Text "Sequencing" ignored (help)
  2. ^ Li, R.; Fan, W.; Tian, G.; Zhu, H.; He, L.; Cai, J.; Huang, Q.; Cai, Q.; Li, B. (2010). "The sequence and de novo assembly of the giant panda genome". Nature. 463 (7279): 311–7. doi:10.1038/nature08696. PMID 20010809. {{cite journal}}: Unknown parameter |month= ignored (help)
  3. ^ Li, R.; Zhu, H.; Ruan, J.; Qian, W.; Fang, X.; Shi, Z.; Li, Y.; Li, S.; Shan, G. (2010). "De novo assembly of human genomes with massively parallel short read sequencing". Genome Res. 20 (2): 265–72. doi:10.1101/gr.097261.109. PMID 20019144. {{cite journal}}: Unknown parameter |month= ignored (help)
  4. ^ Rasmussen, M.; Li, Y.; Lindgreen, S.; Pedersen, JS.; Albrechtsen, A.; Moltke, I.; Metspalu, M.; Metspalu, E.; Kivisild, T. (2010). "Ancient human genome sequence of an extinct Palaeo-Eskimo". Nature. 463 (7282): 757–62. doi:10.1038/nature08835. PMID 20148029. {{cite journal}}: Unknown parameter |month= ignored (help)
  5. ^ Wang, J.; Wang, W.; Li, R.; Li, Y.; Tian, G.; Goodman, L.; Fan, W.; Zhang, J.; Li, J. (2008). "The diploid genome sequence of an Asian individual". Nature. 456 (7218): 60–5. doi:10.1038/nature07484. PMID 18987735. {{cite journal}}: Unknown parameter |month= ignored (help)
  6. ^ Li, R.; Fan, W.; Tian, G.; Zhu, H.; He, L.; Cai, J.; Huang, Q.; Cai, Q.; Li, B. (2010). "The sequence and de novo assembly of the giant panda genome". Nature. 463 (7279): 311–7. doi:10.1038/nature08696. PMID 20010809. {{cite journal}}: Unknown parameter |month= ignored (help)
  7. ^ Cyranoski, D. (2010). "Chinese bioscience: The sequence factory". Nature. 464 (7285): 22–4. doi:10.1038/464022a. PMID 20203579. {{cite journal}}: Unknown parameter |month= ignored (help)
  8. ^ Li, R.; Li, Y.; Zheng, H.; Luo, R.; Zhu, H.; Li, Q.; Qian, W.; Ren, Y.; Tian, G. (2010). "Building the sequence map of the human pan-genome". Nat Biotechnol. 28 (1): 57–63. doi:10.1038/nbt.1596. PMID 19997067. {{cite journal}}: Unknown parameter |month= ignored (help)
  9. ^ "To Start Building 'Human Pan-Genome,' BGI De Novo Assembles Two Genomes from Illumina Data". Retrieved 29 March 2010. {{cite web}}: Text "GenomeWeb" ignored (help); Text "In Sequence" ignored (help); Text "Sequencing" ignored (help)
  10. ^ Qin, J.; Li, R.; Raes, J.; Arumugam, M.; Burgdorf, KS.; Manichanh, C.; Nielsen, T.; Pons, N.; Levenez, F. (2010). "A human gut microbial gene catalogue established by metagenomic sequencing". Nature. 464 (7285): 59–65. doi:10.1038/nature08821. PMID 20203603. {{cite journal}}: Unknown parameter |month= ignored (help)
  11. ^ "International Team Catalogs Microbial Genes in the Human Gut". Retrieved 29 March 2010. {{cite web}}: Text "GenomeWeb Daily News" ignored (help); Text "GenomeWeb" ignored (help); Text "Sequencing" ignored (help)
  12. ^ Enserink, M. (2003). "SARS in China. China's missed chance". Science. 301 (5631): 294–6. doi:10.1126/science.301.5631.294. PMID 12869735. {{cite journal}}: Unknown parameter |month= ignored (help)
  13. ^ "Chinese scientists sequence 1st volunteer's genome". Retrieved 29 March 2010.
  14. ^ "BGI Offers Next-Gen Sequencing Service, Kicks Off 100-Genome Sequencing Project". Retrieved 29 March 2010. {{cite web}}: Text "GenomeWeb" ignored (help); Text "In Sequence" ignored (help); Text "Sequencing" ignored (help)
  15. ^ "BGI Offers Next-Gen Sequencing Service, Kicks Off 100-Genome Sequencing Project". Retrieved 29 March 2010. {{cite web}}: Text "GenomeWeb" ignored (help); Text "In Sequence" ignored (help); Text "Sequencing" ignored (help)
  16. ^ "BGI". Retrieved 29 March 2010.
  17. ^ Fox, J.; Kling, J. (2010). "Chinese institute makes bold sequencing play". Nat Biotechnol. 28 (3): 189–91. doi:10.1038/nbt0310-189c. PMID 20212469. {{cite journal}}: Unknown parameter |month= ignored (help)
  18. ^ "BGI to Sequence Tiger, Lion, and Leopard Species This Year". Retrieved 29 March 2010. {{cite web}}: Text "GenomeWeb" ignored (help); Text "In Sequence" ignored (help); Text "Sequencing" ignored (help)
  19. ^ "BGI". Retrieved 29 March 2010.
  20. ^ "To Start Building 'Human Pan-Genome,' BGI De Novo Assembles Two Genomes from Illumina Data". Retrieved 29 March 2010. {{cite web}}: Text "GenomeWeb" ignored (help); Text "In Sequence" ignored (help); Text "Sequencing" ignored (help)
  21. ^ Petsko, GA. (2010). "Rising in the East". Genome Biol. 11 (1): 102. doi:10.1186/gb-2010-11-1-102. PMID 20156314. {{cite journal}}: Unknown parameter |month= ignored (help)CS1 maint: unflagged free DOI (link)
  22. ^ Li, R.; Zhu, H.; Ruan, J.; Qian, W.; Fang, X.; Shi, Z.; Li, Y.; Li, S.; Shan, G. (2010). "De novo assembly of human genomes with massively parallel short read sequencing". Genome Res. 20 (2): 265–72. doi:10.1101/gr.097261.109. PMID 20019144. {{cite journal}}: Unknown parameter |month= ignored (help)
  23. ^ Rasmussen, M.; Li, Y.; Lindgreen, S.; Pedersen, JS.; Albrechtsen, A.; Moltke, I.; Metspalu, M.; Metspalu, E.; Kivisild, T. (2010). "Ancient human genome sequence of an extinct Palaeo-Eskimo". Nature. 463 (7282): 757–62. doi:10.1038/nature08835. PMID 20148029. {{cite journal}}: Unknown parameter |month= ignored (help)
  24. ^ Wang, J.; Wang, W.; Li, R.; Li, Y.; Tian, G.; Goodman, L.; Fan, W.; Zhang, J.; Li, J. (2008). "The diploid genome sequence of an Asian individual". Nature. 456 (7218): 60–5. doi:10.1038/nature07484. PMID 18987735. {{cite journal}}: Unknown parameter |month= ignored (help)
  25. ^ "BGI Uses New Short-Read Algorithm to Assemble Panda Genome as Proof of Concept for Human Genome". Retrieved 28 March 2010. {{cite web}}: Text "BioInform" ignored (help); Text "GenomeWeb" ignored (help); Text "Informatics" ignored (help)
  26. ^ Li, R.; Fan, W.; Tian, G.; Zhu, H.; He, L.; Cai, J.; Huang, Q.; Cai, Q.; Li, B. (2010). "The sequence and de novo assembly of the giant panda genome". Nature. 463 (7279): 311–7. doi:10.1038/nature08696. PMID 20010809. {{cite journal}}: Unknown parameter |month= ignored (help)
  27. ^ "BGI". Retrieved 29 March 2010.
  28. ^ Huang, S.; Li, R.; Zhang, Z.; Li, L.; Gu, X.; Fan, W.; Lucas, WJ.; Wang, X.; Xie, B. (2009). "The genome of the cucumber, Cucumis sativus L.". Nat Genet. 41 (12): 1275–81. doi:10.1038/ng.475. PMID 19881527. {{cite journal}}: Unknown parameter |month= ignored (help)
  29. ^ Xia, Q.; Guo, Y.; Zhang, Z.; Li, D.; Xuan, Z.; Li, Z.; Dai, F.; Li, Y.; Cheng, D. (2009). "Complete resequencing of 40 genomes reveals domestication events and genes in silkworm (Bombyx)". Science. 326 (5951): 433–6. doi:10.1126/science.1176620. PMID 19713493. {{cite journal}}: Unknown parameter |month= ignored (help)
  30. ^ Xia, Q.; Guo, Y.; Zhang, Z.; Li, D.; Xuan, Z.; Li, Z.; Dai, F.; Li, Y.; Cheng, D. (2009). "Complete resequencing of 40 genomes reveals domestication events and genes in silkworm (Bombyx)". Science. 326 (5951): 433–6. doi:10.1126/science.1176620. PMID 19713493. {{cite journal}}: Unknown parameter |month= ignored (help)
  31. ^ Li, R.; Fan, W.; Tian, G.; Zhu, H.; He, L.; Cai, J.; Huang, Q.; Cai, Q.; Li, B. (2010). "The sequence and de novo assembly of the giant panda genome". Nature. 463 (7279): 311–7. doi:10.1038/nature08696. PMID 20010809. {{cite journal}}: Unknown parameter |month= ignored (help)