What would happen if there was a mistake in the dna sequence?

An algorithm is described that can detect certain errors within coding regions of DNA sequences. The algorithm is based on the idea that an insertion or deletion error within a coding sequence would interrupt the reading frame and cause the correct translation of a DNA sequence to require one or more frameshifts. If the coding sequence shows similarity to a known protein sequence then such errors can be detected by comparing the conceptual translations of DNA sequences in all six reading frames with every sequence in a protein sequence data base. We have incorporated these ideas into a computer program, called DETECT, that can serve as an aid to the experimentalist who is determining new DNA sequences so that obvious errors may be located and corrected. The program has been tested using raw experimental data and against sequences from the European Molecular Biology Laboratory data base, annotated as containing frameshifts. We have also tested it using unidentified open reading frames that flank known, annotated genes in the GenBank data base. Many potential errors are apparent and in some cases functions can be suggested for the "corrected" versions of these reading frames leading to the identification of new genes. As more sequences are determined the power of this method will increase substantially.

Full text is available as a scanned copy of the original print version. Get a printable copy (PDF file) of the complete article (1.0M), or click on a page image below to browse page by page. Links to PubMed are also available for Selected References.

These references are in PubMed. This may not be the complete list of references from this article.

  • Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. Basic local alignment search tool. J Mol Biol. 1990 Oct 5;215(3):403–410. [PubMed] [Google Scholar]
  • Gingeras TR, Rice P, Roberts RJ. A semi-automated method for the reading of nucleic acid sequencing gels. Nucleic Acids Res. 1982 Jan 11;10(1):103–114. [PMC free article] [PubMed] [Google Scholar]
  • Lautenberger JA. A program for reading DNA sequence gels using a small computer equipped with a graphics tablet. Nucleic Acids Res. 1982 Jan 11;10(1):27–30. [PMC free article] [PubMed] [Google Scholar]
  • Staden R. A computer program to enter DNA gel reading data into a computer. Nucleic Acids Res. 1984 Jan 11;12(1 Pt 2):499–503. [PMC free article] [PubMed] [Google Scholar]
  • West J. Automated sequence reading and analysis. Nucleic Acids Res. 1988 Mar 11;16(5):1847–1856. [PMC free article] [PubMed] [Google Scholar]
  • Sjöberg S, Carlsson P, Enerbäck S, Bjursell G. A compact, flexible and cheap system for acquiring sequence data from autoradiograms with a digitizer and transferring it to an arbitrary host computer. Comput Appl Biosci. 1989 Feb;5(1):41–46. [PubMed] [Google Scholar]
  • Gingeras TR, Milazzo JP, Sciaky D, Roberts RJ. Computer programs for the assembly of DNA sequences. Nucleic Acids Res. 1979 Sep 25;7(2):529–545. [PMC free article] [PubMed] [Google Scholar]
  • Grymes RA, Travers P, Engelberg A. GEL--a computer tool for DNA sequencing projects. Nucleic Acids Res. 1986 Jan 10;14(1):87–98. [PMC free article] [PubMed] [Google Scholar]
  • Staden R. The current status and portability of our sequence handling software. Nucleic Acids Res. 1986 Jan 10;14(1):217–231. [PMC free article] [PubMed] [Google Scholar]
  • Dayhoff MO, Barker WC, Hunt LT. Establishing homologies in protein sequences. Methods Enzymol. 1983;91:524–545. [PubMed] [Google Scholar]
  • Craigen WJ, Cook RG, Tate WP, Caskey CT. Bacterial peptide chain release factors: conserved primary structure and possible frameshift regulation of release factor 2. Proc Natl Acad Sci U S A. 1985 Jun;82(11):3616–3620. [PMC free article] [PubMed] [Google Scholar]
  • Atkins JF, Weiss RB, Gesteland RF. Ribosome gymnastics--degree of difficulty 9.5, style 10.0. Cell. 1990 Aug 10;62(3):413–423. [PMC free article] [PubMed] [Google Scholar]
  • Pearson WR, Lipman DJ. Improved tools for biological sequence comparison. Proc Natl Acad Sci U S A. 1988 Apr;85(8):2444–2448. [PMC free article] [PubMed] [Google Scholar]
  • Shaw JH, Clewell DB. Complete nucleotide sequence of macrolide-lincosamide-streptogramin B-resistance transposon Tn917 in Streptococcus faecalis. J Bacteriol. 1985 Nov;164(2):782–796. [PMC free article] [PubMed] [Google Scholar]
  • Mahillon J, Lereclus D. Structural and functional analysis of Tn4430: identification of an integrase-like protein involved in the co-integrate-resolution process. EMBO J. 1988 May;7(5):1515–1526. [PMC free article] [PubMed] [Google Scholar]
  • Mahillon J, Seurinck J. Complete nucleotide sequence of pGI2, a Bacillus thuringiensis plasmid containing Tn4430. Nucleic Acids Res. 1988 Dec 23;16(24):11827–11828. [PMC free article] [PubMed] [Google Scholar]
  • An FY, Clewell DB. Tn917 transposase. Sequence correction reveals a single open reading frame corresponding to the tnpA determinant of Tn3-family elements. Plasmid. 1991 Mar;25(2):121–124. [PubMed] [Google Scholar]
  • Glaser P, Sakamoto H, Bellalou J, Ullmann A, Danchin A. Secretion of cyclolysin, the calmodulin-sensitive adenylate cyclase-haemolysin bifunctional protein of Bordetella pertussis. EMBO J. 1988 Dec 1;7(12):3997–4004. [PMC free article] [PubMed] [Google Scholar]
  • Welch RA, Pellett S. Transcriptional organization of the Escherichia coli hemolysin genes. J Bacteriol. 1988 Apr;170(4):1622–1630. [PMC free article] [PubMed] [Google Scholar]
  • Barry EM, Weiss AA, Ehrmann IE, Gray MC, Hewlett EL, Goodwin MS. Bordetella pertussis adenylate cyclase toxin and hemolytic activities require a second gene, cyaC, for activation. J Bacteriol. 1991 Jan;173(2):720–726. [PMC free article] [PubMed] [Google Scholar]
  • Tran-Betcke A, Behrens B, Noyer-Weidner M, Trautner TA. DNA methyltransferase genes of Bacillus subtilis phages: comparison of their nucleotide sequences. Gene. 1986;42(1):89–96. [PubMed] [Google Scholar]
  • Pósfai J, Bhagwat AS, Pósfai G, Roberts RJ. Predictive motifs derived from cytosine methyltransferases. Nucleic Acids Res. 1989 Apr 11;17(7):2421–2435. [PMC free article] [PubMed] [Google Scholar]
  • Lauster R, Trautner TA, Noyer-Weidner M. Cytosine-specific type II DNA methyltransferases. A conserved enzyme core with variable target-recognizing domains. J Mol Biol. 1989 Mar 20;206(2):305–312. [PubMed] [Google Scholar]
  • Behrens B, Noyer-Weidner M, Pawlek B, Lauster R, Balganesh TS, Trautner TA. Organization of multispecific DNA methyltransferases encoded by temperate Bacillus subtilis phages. EMBO J. 1987 Apr;6(4):1137–1142. [PMC free article] [PubMed] [Google Scholar]
  • Henikoff S, Wallace JC. Detection of protein similarities using nucleotide sequence databases. Nucleic Acids Res. 1988 Jul 11;16(13):6191–6204. [PMC free article] [PubMed] [Google Scholar]
  • Simpson L, Shaw J. RNA editing and the mitochondrial cryptogenes of kinetoplastid protozoa. Cell. 1989 May 5;57(3):355–366. [PMC free article] [PubMed] [Google Scholar]
  • Shepherd JC. Method to determine the reading frame of a protein from the purine/pyrimidine genome sequence and its possible evolutionary justification. Proc Natl Acad Sci U S A. 1981 Mar;78(3):1596–1600. [PMC free article] [PubMed] [Google Scholar]
  • Fickett JW. Recognition of protein coding regions in DNA sequences. Nucleic Acids Res. 1982 Sep 11;10(17):5303–5318. [PMC free article] [PubMed] [Google Scholar]
  • Staden R. Computer methods to locate signals in nucleic acid sequences. Nucleic Acids Res. 1984 Jan 11;12(1 Pt 2):505–519. [PMC free article] [PubMed] [Google Scholar]
  • States DJ, Botstein D. Molecular sequence accuracy and the analysis of protein coding regions. Proc Natl Acad Sci U S A. 1991 Jul 1;88(13):5518–5522. [PMC free article] [PubMed] [Google Scholar]
  • Saiki RK, Gelfand DH, Stoffel S, Scharf SJ, Higuchi R, Horn GT, Mullis KB, Erlich HA. Primer-directed enzymatic amplification of DNA with a thermostable DNA polymerase. Science. 1988 Jan 29;239(4839):487–491. [PubMed] [Google Scholar]