Complete Human Genome Sequencing of 40 Samples across 9 Different Populations
Rick Tearle, Complete Genomics, Mountain View, California, USA, rtearle@completegenomics.com
- Abstract
Complete Genomics recently released complete genome sequence data from
a diversity panel comprising 60 normal samples representing 9
different populations. Samples in this publicly released dataset are
from the NHGRI repositories. The purpose of the diversity panel is to
provide an unbiased, high coverage - approximately 47x - complete
genome data across a spectrum of populations. In particular, the
panel includes individuals of European, Asian, African, and American
decent. There is also an even balance of gender across the sample set.
Sequencing was performed using the Complete Genomics service which
uses a non-sequential, unchained read technology to generate 70mer
paired-end reads. Assembly was performed using the Complete Genomics
Assembler which uses a hybrid approach of initial fast alignment to a
reference followed by de novo assembly in those regions of the genome
that appear to contain variations. Both alleles were called in
95.03-97.43 % of the genomes, with only one allele being called in
0.25-1% of the genome. For homozygous reference calls, concordance
with Hapmap3 is 99.94%. For SNPs, considering zygocity, the
concordance is 99.4%. The variations have been classified into SNPs,
insertions, deletions and substitutions. Given the relatively small
sample sizes of each ethnicity subtype, only novelty trends have been
reported. These trends however show a high degree of overlap with the
previously published 1000 Genomes Project data. Detailed
characterization of the population panel, including concordance with
other published data will be discussed.
Date: Feb 2011
HTML generated by org-mode 7.01h in emacs 22