Note: For variants that showed up as missing genotypes in a large number of high-coverage sequenced varieties (depth > 12×), we assigned the corresponding genotypes in these varieties as “DEL” for the situation indicates that these varieties probably have large deletions at these positions compared with the reference genome, which are difficult to identify using GATK. You can choose to download files with or without "DEL" depending on your needs.
With 'DEL':
Without 'DEL':
With 'DEL':
rice4k_geno_add_del.bed rice4k_geno_add_del.bim rice4k_geno_add_del.fam
Without 'DEL':
rice4k_geno_no_del.bed rice4k_geno_no_del.bim rice4k_geno_no_del.fam
The raw sequence data can be downloaded from the NCBI or EBI European Nucleotide Archive under accession numbers PRJNA171289, ERP000106, ERP000729 and PRJEB6180.
In this section, we present the annotation results of variant effects, which were generated using SnpEff, CooVar, PolyPhen-2, and SIFT. The data are stored in HDF5 format, with chromosomes designated as key values (e.g., “chr01”), and can be accessed using the pandas.read_hdf function in Python. For convenience, the results are also provided in CSV format to facilitate broader accessibility and downstream analyses.
chr01_snpeff_coovar_polyphen_sift_merge_anno.h5 chr01_snpeff_coovar_polyphen_sift_merge_anno.csv.gz
chr02_snpeff_coovar_polyphen_sift_merge_anno.h5 chr02_snpeff_coovar_polyphen_sift_merge_anno.csv.gz
chr03_snpeff_coovar_polyphen_sift_merge_anno.h5 chr03_snpeff_coovar_polyphen_sift_merge_anno.csv.gz
chr04_snpeff_coovar_polyphen_sift_merge_anno.h5 chr04_snpeff_coovar_polyphen_sift_merge_anno.csv.gz
chr05_snpeff_coovar_polyphen_sift_merge_anno.h5 chr05_snpeff_coovar_polyphen_sift_merge_anno.csv.gz
chr06_snpeff_coovar_polyphen_sift_merge_anno.h5 chr06_snpeff_coovar_polyphen_sift_merge_anno.csv.gz
chr07_snpeff_coovar_polyphen_sift_merge_anno.h5 chr07_snpeff_coovar_polyphen_sift_merge_anno.csv.gz
chr08_snpeff_coovar_polyphen_sift_merge_anno.h5 chr08_snpeff_coovar_polyphen_sift_merge_anno.csv.gz
chr09_snpeff_coovar_polyphen_sift_merge_anno.h5 chr09_snpeff_coovar_polyphen_sift_merge_anno.csv.gz
chr10_snpeff_coovar_polyphen_sift_merge_anno.h5 chr10_snpeff_coovar_polyphen_sift_merge_anno.csv.gz
chr11_snpeff_coovar_polyphen_sift_merge_anno.h5 chr11_snpeff_coovar_polyphen_sift_merge_anno.csv.gz
chr12_snpeff_coovar_polyphen_sift_merge_anno.h5 chr12_snpeff_coovar_polyphen_sift_merge_anno.csv.gz
STRs consisting of repetitive 1–6 bp DNA sequence motifs represent a significant fraction of polymorphic variations in eukaryotic genomes. STRs are typically characterized as extremely unstable and hypervariable, with average mutation rates approximately 10 to 104-fold higher than the estimated rates in other parts of the genome. The vast majority of STR mutations are length polymorphisms that are thought to arise primarily due to replication-associated strand slippage. Due to their unique characteristics, STRs have been extensively used as molecular markers for population genetic analysis and genetic mapping.