Currently, we collected sequencing data from two sets of rice germplasms consisting of totally 1,479 accessions of cultivated rice
(Oryza sativa L.):
The first set of germplasm consisted of 529 accessions selected to represent both the usefulness in
rice improvement and the genetic diversity in the cultivated species. We sequenced the 529 accessions using the Illumina HiSeq 2000 in
the form of 90-bp paired-end reads to generate high quality sequences of more than one gigabase per accession (>2.5x per genome, total
6.7 billion reads). These raw data is available in NCBI with BioProject accession number PRJNA171289. Actually, we sequenced 533
accessions in this project. After initial analysis, three accessions (C126, W196 and W232) were found with excessive heterozygosity and one (W190) with low
mapping rate, these four accessions were excluded in further analysis.
The second set of germplasm was 950 rice accessions sequenced by Huang et al. (2012, Nat. Genet. 44:32-39) that were downloaded from the EBI European Nucleotide Archive (accession number ERP000106 and ERP000729), which consisted of 4.6 billion 73-bp paired-end reads (~1x per genome).
Together these two sets of germplasms included both landraces and improved varieties from 73 countries.
These two sets of sequences provided approximately 2400-fold coverage of the rice genome.