The Human Genome Project, started in started in 1990 to identify genes, has finally come to a close with the publication of the last chromosome sequence in the Human Genome, chromosome 1.
It took 150 scientists 10 years to complete the sequence of chromosome 1, which contains nearly twice as many genes as the average chromosome and makes up eight percent of the human genetic code. It has 3,141 genes and is linked to almost 350 illnesses.
The finished sequence comprises 223.6 million base-pairs (Mbp), determined to an accuracy of >99.99%, and includes the centromere and a large non-coding region (heterochromatin) in the centre of the chromosome. The sequence of chromosome 1 published today includes 99.4% of the gene coding (euchromatin) regions of the chromosome amenable to sequencing with current technologies. Gaps within the sequence (most are due to repetitive sequence) comprise about 1.3 Mbp.
The entire human genome has 20,000 to 25,000 genes, with the sequencing of chromosome 1 identifying more than 1,000 new genes. They also identified 4,500 new single nucleotide polymorphisms (SNPs).
Almost 4500 single-letter changes in the genetic code (called SNPs) were identified that could lead to changes in protein activity. In addition, 90 SNPs were found that would result in a shortened protein. Although some 15 SNPs are associated with already known protection from malaria and predisposition to porphyria, the function of these newly located SNPs is yet to be discovered.
Sequencing was carried out at the Wellcome Trust Sanger Institute and the University of Washington Genome Center contributed 13% of the sequence finishing. Analysis of the chromosome content was carried out by Wellcome Trust Sanger Institute.
The details of the sequence are published in Nature (Gregory SG et al. (2006) The DNA sequence and analysis of chromosome 1. Nature 441: 315-21).