new genomic variations and GWAS results, improved annotations of missense variations, integrated chromatin accessibility data for non-coding variations.
RiceVarMap v2.0 is a comprehensive database for rice genomic variation and its functional annotation. It provides curated information of 17,397,026 genomic variations (including 14,541,446 SNPs and 2,855,580 small INDELs ) from sequencing data of 4,726 rice accessions. These variations were identified using GATK software based on the assembly Os-Nipponbare-Reference-IRGSP-1.0. (Note: you can still access to RiceVarMap v1.0 for querying variations based on the old assembly Nipponbare MSU v6.1.)
High quality and complete genotype data. The genotypes of all accessions were imputed and evaluated, resulting in an overall missing data rate of <3% and an estimated accuracy greater than 99%. The SNP/INDEL genotypes of all accessions are available for online queries and download. To facilitate population genetic analysis, RiceVarMap also offers ancestral allele information and allele distribution data of subpopulations.
Comprehensive annotations of genomic variations. RiceVarMap now provides more precise variations and annotations. Software packages, snpEff, CooVar and PolyPhen-2, were used to evaluate the impact of missense variations based on haplotypes and conservation information. GWAS results were also integrated to curate the possible functions of variants. This information can be queried at this page.
Phenotype data and GWAS results. The database provides geographical details and phenotype images, agronomic and metabolic traits for some rice accessions. Plant scientists and breeders can also search for significant SNPs associated with various traits to develop useful molecular markers or pick up candidate genes.
Currently, we collected sequencing data from three sets of rice germplasms consisting of totally 4,726 accessions of cultivated rice (Oryza sativa L.):
The first set of germplasm consisted of 533 accessions selected to represent both the usefulness in rice improvement and the genetic diversity in the cultivated species. We sequenced the 533 accessions using the Illumina HiSeq 2000 in the form of 90-bp paired-end reads to generate high-quality sequences of more than one gigabase per accession (>2.5x per genome, total 6.7 billion reads). These raw data is available in NCBI with BioProject accession number PRJNA171289. We provide phenotype images, agronomic and metabolic traits for these accessions.
The second set of germplasm was 950 rice accessions sequenced by Huang et al. (2012, Nat Genet, 44:32-39) that were downloaded from the EBI European Nucleotide Archive (accession number ERP000106 and ERP000729), which consists of 4.6 billion 73-bp paired-end reads (~1x per genome).
The third set of germplasm was 3243 rice accessions from 3,000 Rice Genomes Project (2014, GigaScience, 3:7) that were downloaded from the EBI European Nucleotide Archive (accession number PRJEB6180), which has an average sequencing depth of 14x per genome.
Phenotype (GWAS) data:
At the moment RiceVarMap provides phenotype data and GWAS results for 13 agronomic traits (including heading date, plant height, and grain weight et al.) and 840 metabolite traits which were producted by our institute (Xie et al., 2015, Proc Natl Acad Sci USA, 112: E5411-E5419; Chen et al., 2014, Nat Genet, 46:714-721). Phenotype information can be queried at this page.
Data used in PolyPhen-2:
We extracted common missense SNPs (MAF >0.05) for PloyPhen-2 analysis, The searches of homologous proteins were performed against Uniport UniRef 100 using BLAST (e-value <1e-3, identity from 0.3 to 0.95).
The recommended browsers are Chrome, Firefox, Safari, and Edge ( IE8 and earlier have poorer support and may give a lesser experience).
Researchers who wish to use RiceVarMap are encouraged to refer to our publication or more:
Zhao H, Yao W, Ouyang Y, Yang W, Wang G, Lian X, Xing Y, Chen L, Xie W. RiceVarMap: a comprehensive database of rice genomic variations. Nucleic Acids Res, 2015, 43: D1018-1022