The training courses on April 22 and 23 are free of charge for both registered and non-registered participants.
April 22, 2015 (Wednesday)
Conference Hall, Beijing Institute of Genomics, CAS
|09:00 - 10:00||
Wormbase literature curation workflow
Xiaodong Wang, WormBase & California Institute of Technology, USA
Synopsis: WormBase (http://www.wormbase.org) is a model organism database containing data about C. elegans and other nematodes. The WormBase literature curation workflow typically begins by downloading bibliographic information from PubMed followed by subsequent steps of pdf acquisition, data type flagging, entity recognition, and fact extraction. Using a combination of manual, semi-, and fully automated approaches including community curation, Perl scripts, Support Vector Machines (SVMs), and the Textpresso (http://textpresso.org) information retrieval system, WormBase curators annotate over 30 different data types to support C. elegans-based biomedical research.
|10:00 - 11:00||
PubChem: a case study for managing big data
Yanli Wang, National Center for Biotechnology Information, USA
Synopsis: Chemical genomics (Chemogenomics) systematically screens small molecule libraries to identify drug candidates and chemical probes to characterize protein and gene functions. Advances in RNAi screening technology have enabled genome-wide functional screens for discovering new cellular pathways and therapeutic targets. The PubChem BioAssay database (http://www.ncbi.nlm.nih.gov/pcassay/), hosted by the National Center for Biotechnology Information (NCBI) at NIH, serves as a public repository for information generated by Chemogenomics and RNAi research. The integration of PubChem with the rest resources at NCBI provides a unique annotation service for NCBIs genomic information. This presentation will describe how this effort enables the retrieval of drug and chemical modulators, as well as biological and therapeutic relevance for many GenBank records.
|11:00 - 12:00||
Overview of annotation tools and curation workflow for the GENCODE gene sets
Mark Thomas, Wellcome Trust Sanger Institute, UK
Synopsis: Combining computational analysis and manual annotation, with experimental validation, GENCODE provides the most comprehensive gene set for human and mouse. With the aim of identifying all gene features, we annotate coding genes, non-coding genes and pseudogenes, with an emphasis on alternative splicing. We utilize a wide range of next-generation sequencing sources including RNAseq data, CAGE and polyAseq analyses, together with proteomic data and comparative analysis. In this workshop, I will provide an insight into the annotation tools we use, highlighting the curational processes required to maintain the integrity of our data. We will discuss the merge process, detailed QC pipelines and the integration of controlled metadata, such as sequence ontology terms, extending to the different biotypes annotated.
April 23, 2015 (Thursday)
Function Room, Building 1, Beijing Friendship Hotel
|09:00 - 12:00||
Biocuration of GenBank & RefSeq
Ilene Mizrachi and Kim Pruitt, National Center for Biotechnology Information, USA
Synopsis: Curating sequence and literature data for RefSeq and Gene
Synopsis: NCBI Sequence Repositories: SRA and GenBank
|13:30 - 16:30||
Biocuration of UniProt & Proteomics Databases
Claire O'Donovan and Sandra Orchard, EMBL-EBI
Synopsis: An introduction to UniProt Knowledgebase curation
Synopsis: The importance of Controlled Vocabularies and Data Standards in Biocuration