Patent application title: SYSTEMS AND METHODS FOR DIAGNOSING A PREDISPOSITION TO DEVELOP COLON CANCER
Inventors:
Greg H. Enders (Villanova, PA, US)
Institute For Cancer Research D/b/a The Research Institute Of Fox Chase Cancer Center
Mark Andrake (Huntingdon Valley, PA, US)
Michael J. Hall (Glenside, PA, US)
Biao Luo (Princeton, NJ, US)
Timothy J. Yen (Haverford, PA, US)
Assignees:
Institute for Cancer Research d/b/a The Research Institute of Fox Chase Cancer Center
IPC8 Class: AC12Q168FI
USPC Class:
435 611
Class name: Measuring or testing process involving enzymes or micro-organisms; composition or test strip therefore; processes of forming such composition or test strip involving nucleic acid nucleic acid based assay involving a hybridization step with a nucleic acid probe, involving a single nucleotide polymorphism (snp), involving pharmacogenetics, involving genotyping, involving haplotyping, or involving detection of dna methylation gene expression
Publication date: 2014-03-20
Patent application number: 20140080124
Abstract:
Systems and methods for diagnosing or characterizing a predisposition to
colon cancer are provided. Cell nuclei may be evaluated for the presence
or quantity of gamma-H2AX foci. Nucleic acids may be evaluated for the
presence, type, or quantity of genomic instability or surrogates of dsDNA
breaks such as ataxia telangiectasia mutated (ATM), Rad3-related protein
(ATR), and Tumor suppressor p53-binding protein 1 (53BP1) in gamma-H2AX
foci. Nucleic acids comprising a germline nucleic acid sequence of the
ERCC6, WRN, TERT, and FAAP100 genes may be sequenced or probed to
determine if the nucleic acid sequence includes one or more alterations
that cause genomic instability, dsDNA breaks, or gamma-H2AX foci or
otherwise predispose a subject to develop colon cancer.Claims:
1. A method for diagnosing a predisposition to develop colon cancer,
comprising comparing a quantity of nuclear gamma-H2AX foci determined
from a cell or cell nucleus isolated from a subject with reference values
for a quantity of gamma-H2AX foci indicative of a predisposition to
develop colon cancer, and optionally with reference values for a quantity
of gamma-H2AX indicative of a lack of a predisposition to develop colon
cancer, with a processor programmed to compare determined quantities of
gamma-H2AX foci with reference values of a quantity of gamma-H2AX foci,
and diagnosing whether the subject has a predisposition to develop colon
cancer based on the comparison.
2. The method of claim 1, further comprising isolating a cell or cell nucleus from a subject and determining the quantity of nuclear gamma-H2AX foci in the cell or cell nucleus.
3. The method of claim 1, wherein the reference values for a quantity of gamma-H2AX foci indicative of a predisposition to develop colon cancer comprise one or more of reference values for a quantity of gamma-H2AX foci indicative of high probability to develop colon cancer, reference values for a quantity of gamma-H2AX foci indicative of moderate probability to develop colon cancer, or reference values for a quantity of gamma-H2AX foci indicative of low probability to develop colon cancer, and the reference values for a quantity of gamma-H2AX indicative of a lack of a predisposition to develop colon cancer comprise one or more reference values of a healthy subject.
4. The method of claim 2, further comprising determining the quantity of one or more of phosphorylated ataxia telangiectasia mutated (ATM), Rad3-related protein (ATR), and Tumor suppressor p53-binding protein 1 (53BP1) in the gamma-H2AX foci, and comparing the determined quantity of one or more of ATM, ATR, and 53BP1 with reference values for a quantity of ATM, ATR, and 53BP1 in gamma-H2AX foci indicative of a predisposition to develop colon cancer, and optionally with reference values for a quantity of ATM, ATR, and 53BP1 in gamma-H2AX foci indicative of a lack of a predisposition to develop colon cancer, with a processor programmed to compare determined quantities of ATM, ATR, and 53BP1 with reference values of a quantity of ATM, ATR, and 53BP1, and diagnosing whether the subject has a predisposition to develop colon cancer based on both the comparison of the determined quantity of gamma-H2AX foci with reference values for a quantity of gamma-H2AX and the comparison of the determined quantity of ATM, ATR, and 53BP1 with reference values for a quantity of ATM, ATR, and 53BP1.
5. The method of claim 1, further comprising determining whether a nucleic acid obtained from the subject comprises a polymorphism selected from the group consisting of a polymorphism in the ERCC6 gene encoding a tyrosine at position 180 of the Cockayne Syndrome B protein, a polymorphism in the WRN gene encoding an isoleucine at position 705 of the Werner protein, a polymorphism in the WRN gene encoding a tyrosine at position 1292 of the Werner protein, a polymorphism in the TERT gene encoding an arginine at position 198 of the Telomerase Reverse Transcriptase protein, a polymorphism in the FAAP100 gene encoding a leucine at position 466 of the Fanconi anemia-associated protein of 100 kD, and combinations thereof, with a processor programmed to compare sequences of nucleic acids obtained from a subject with reference nucleic acid sequences comprising said polymorphisms, and diagnosing whether the subject has a predisposition to develop colon cancer based on both the comparison of the determined quantity of gamma-H2AX foci with reference values for a quantity of gamma-H2AX and the determination of whether the nucleic acid comprises a polymorphism.
6. The method of claim 5, wherein the nucleic acid obtained from the subject comprises genomic DNA, mRNA, or a cDNA obtained from mRNA.
7. The method of claim 1, further comprising determining whether the cell or cell nucleus isolated from the subject comprises chromosomal aneuploidy with a processor programmed to compare a determined type of chromosomal aneuploidy with reference types chromosomal anueploidy, and diagnosing whether the subject has a predisposition to develop colon cancer based on both the comparison of the quantity of determined gamma-H2AX foci with reference values for a quantity of gamma-H2AX and the determination of whether the subject has chromosomal aneuploidy.
8. The method of claim 7, wherein the chromosomal aneuploidy comprises a gain of chromosome 9 or a gain of chromosome 11.
9. The method of claim 1, wherein the subject is determined to have a predisoposition to develop colon cancer, and further comprising treating the subject with a treatment regimen capable of inhibiting the onset of of colon cancer.
10. The method of claim 1, wherein the subject is a human being.
11. A system for diagnosing a predisposition to develop cancer, comprising a data structure comprising one or more reference values for a quantity of gamma-H2AX foci indicative of a predisposition to develop colon cancer, one or more reference values for a quantity of gamma-H2AX indicative of a lack of a predisposition to develop colon cancer, a processor operably connected to the data structure, wherein the processor is programmed to compare a quantity of gamma-H2AX foci determined from a cell or cell nucleus isolated from a subject with the one or more reference values for a quantity of gamma-H2AX foci indicative of a predisposition to develop colon cancer and with the one or more reference values for a quantity of gamma-H2AX indicative of a lack of a predisposition to develop colon cancer, and optionally, an output for providing results of a comparison of a determined quantity with reference values to a user.
12. The system of claim 11, wherein the data structure further comprises one or more reference values for a quantity ataxia telangiectasia mutated (ATM), Rad3-related protein (ATR), and Tumor suppressor p53-binding protein 1 (53BP1) in the gamma-H2AX foci indicative of a predisposition to develop colon cancer and one or more reference values for a quantity of ATM, ATR, and 53BP1 in gamma-H2AX foci indicative of a lack of a predisposition to develop colon cancer, and the processor is programmed to compare a quantity of ATM, ATR, and 53BP1 determined from gamma-H2AX foci isolated from a cell or cell nucleus isolated from a subject with the one or more reference values for a quantity of in gamma-H2AX foci indicative of a predisposition to develop colon cancer and with the one or more reference values for a quantity of in gamma-H2AX foci indicative of a lack of a predisposition to develop colon cancer.
13. The system of claim 11, wherein the data structure further comprises one or more reference nucleic acid sequences encoding a tyrosine at position 180 of the Cockayne Syndrome B protein, one or more reference nucleic acid sequences encoding an isoleucine at position 705 of the Werner protein, one or more reference nucleic acid sequences encoding a tyrosine at position 1292 of the Werner protein, one or more reference nucleic acid sequences encoding an arginine at position 198 of the Telomerase Reverse Transcriptase protein, and one or more reference nucleic acid sequences encoding a leucine at position 466 of the Fanconi anemia-associated protein, and optionally further comprises one or more reference nucleic acid sequences that do not encode a tyrosine at position 180 of the Cockayne Syndrome B protein, one or more reference nucleic acid sequences that do not encode an isoleucine at position 705 of the Werner protein, one or more reference nucleic acid sequences that do not encode a tyrosine at position 1292 of the Werner protein, one or more reference nucleic acid sequences that do not encode an arginine at position 198 of the Telomerase Reverse Transcriptase protein, and one or more reference nucleic acid sequences that do not encode a leucine at position 466 of the Fanconi anemia-associated protein, and the processor is programmed to compare the sequence of a nucleic acid encoding the Cockayne Syndrome B protein, a nucleic acid encoding the Werner protein, a nucleic acid encoding the Telomerase Reverse Transcriptase protein, and a nucleic acid encoding the Fanconi anemia-associated protein determined from a nucleic acid isolated from a subject with the one or more reference nucleic acid sequences encoding an isoleucine at position 705 of the Werner protein, one or more reference nucleic acid sequences encoding a tyrosine at position 1292 of the Werner protein, one or more reference nucleic acid sequences encoding an arginine at position 198 of the Telomerase Reverse Transcriptase protein, one or more reference nucleic acid sequences encoding a leucine at position 466 of the Fanconi anemia-associated protein, one or more reference nucleic acid sequences that do not encode a tyrosine at position 180 of the Cockayne Syndrome B protein, one or more reference nucleic acid sequences that do not encode an isoleucine at position 705 of the Werner protein, one or more reference nucleic acid sequences that do not encode a tyrosine at position 1292 of the Werner protein, one or more reference nucleic acid sequences that do not encode an arginine at position 198 of the Telomerase Reverse Transcriptase protein, and one or more reference nucleic acid sequences that do not encode a leucine at position 466 of the Fanconi anemia-associated protein.
14. The system of claim 11, wherein the data structure further comprises one or more reference values for one or more types of genomic instability indicative of a predisposition to develop colon cancer and one or more reference values for the absence of genomic instability, and the processor is programmed to compare a type of genomic instability determined from a cell nucleus isolated from a subject with the one or more reference values for one or more types of genomic instabilty indicative of a predisposition to develop colon cancer and with the one or more reference values for the absence of genomic instability.
15. The system of claim 11, further comprising a computer network connection.
16. A computer readable medium, comprising executable code for causing a programmable processor to compare a quantity of gamma-H2AX foci determined from a cell or cell nucleus isolated from a subject with one or more reference values for a quantity of gamma-H2AX foci indicative of a predisposition to develop colon cancer and with one or more reference values for a quantity of gamma-H2AX indicative of a lack of a predisposition to develop colon cancer.
17. The computer readable medium of claim 16, further comprising executable code for causing a programmable processor to compare a quantity of ataxia telangiectasia mutated (ATM), Rad3-related protein (ATR), and Tumor suppressor p53-binding protein 1 (53BP1) in gamma-H2AX foci isolated from a cell or cell nucleus isolated from a subject with one or more reference values for a ATM, ATR, and 53BP1 in gamma-H2AX foci indicative of a predisposition to develop colon cancer and with one or more reference values for a quantity of ATM, ATR, and 53BP1 in gamma-H2AX foci indicative of a lack of a predisposition to develop colon cancer.
18. The computer readable medium of claim 16, further comprising executable code for causing a programmable processor to compare the nucleic acid sequence of one or more of a polynucleotide comprising the ERCC6 gene, a polynucleotide comprising the WRN gene, a polynucleotide comprising the TERT gene, or a polynucleotide comprising the FAAP100 gene obtained from a subject with one or more reference nucleic acid sequences encoding a tyrosine at position 180 of the Cockayne Syndrome B protein, one or more reference nucleic acid sequences encoding an isoleucine at position 705 of the Werner protein, one or more reference nucleic acid sequences encoding a tyrosine at position 1292 of the Werner protein, one or more reference nucleic acid sequences encoding an arginine at position 198 of the Telomerase Reverse Transcriptase protein, one or more reference nucleic acid sequences encoding a leucine at position 466 of the Fanconi anemia-associated protein, one or more reference nucleic acid sequences that do not encode a tyrosine at position 180 of the Cockayne Syndrome B protein, one or more reference nucleic acid sequences that do not encode an isoleucine at position 705 of the Werner protein, one or more reference nucleic acid sequences that do not encode a tyrosine at position 1292 of the Werner protein, one or more reference nucleic acid sequences that do not encode an arginine at position 198 of the Telomerase Reverse Transcriptase protein, and one or more reference nucleic acid sequences that do not encode a leucine at position 466 of the Fanconi anemia-associated protein.
19. The computer readable medium of claim 16, further comprising executable code for causing a programmable processor to compare a type of genomic instability determined from a cell nucleus isolated from a subject with one or more reference values for one or more types of genomic instabilty indicative of a predisposition to develop colon cancer and with one or more reference values for the absence of genomic instability.
20. The computer readable medium of claim 16, further comprising a computer processor.
Description:
CROSS REFERENCE TO RELATED APPLICATIONS
[0001] This application claims priority to U.S. Provisional Application No. 61/702,423, filed on Sep. 18, 2012, and U.S. Provisional Application No. 61/731,506, filed on Nov. 30, 2012, the contents of each application are incorporated by reference herein, in their entirety and for all purposes.
REFERENCE TO A SEQUENCE LISTING
[0002] This application includes a Sequence Listing submitted electronically as a text file named CC Genomic Instability_ST25.txt, created on Mar. 10, 2013 with a size of 180,000 bytes. The Sequence Listing is incorporated by reference herein.
FIELD OF THE INVENTION
[0003] The invention relates generally to the field of cancer diagnostics. More particularly, the invention relates to methods for diagnosing a predisposition to develop colon cancer. The invention also relates to arrays, systems, polynucleotides, and polypeptides, which may be used for practicing diagnostic methods.
BACKGROUND OF THE INVENTION
[0004] Various publications, including patents, published applications, accession numbers, technical articles and scholarly articles are cited throughout the specification. Each of these cited publications is incorporated by reference, in its entirety and for all purposes, in this document.
[0005] Colon cancer is the second most common fatal cancer in the United States. About one quarter of colon cancer appears to have an inherited predisposition in that families show a greater frequency of the disease than the general population (e.g., the cancer is familial), and/or the cancer manifests an early age of onset (less than age 50). In most such cases, the molecular cause of the predisposition to cancer is unknown.
[0006] Currently, in the absence of such insight, many patients who are suspected of a predisposition to develop colon cancer but do not carry an increased risk needlessly receive frequent invasive and expensive colon examinations, while others who harbor an unrecognized predisposition fail to receive potentially life-saving colon examinations. There is a need for better diagnostics for predicting patient risk factors for developing colon cancer that may aid in early detection, facilitate screening of patients at risk, and reduce the need for invasive tests on patients with reduced risk factors.
SUMMARY OF THE INVENTION
[0007] The invention features methods for diagnosing a predisposition to develop colon cancer. The methods may, for example, comprise determining the quantity of gamma-H2AX foci in a cell or cell nucleus sample obtained from a subject, comparing the determined quantity with reference values for a quantity of gamma-H2AX foci indicative of a predisposition to develop colon cancer, and optionally with reference values for a quantity of gamma-H2AX indicative of a lack of a predisposition to develop colon cancer, and diagnosing whether the subject has a predisposition to develop colon cancer based on the comparison. The methods may comprise determining genomic instability in a nucleic acid sample obtained from a subject, comparing the type of genomic instability with reference values for a type of genomic instability indicative of a predisposition to develop colon cancer, and optionally with reference values for a type of genomic instability indicative of a lack of a predisposition to develop colon cancer, and diagnosing whether the subject has a predisposition to develop colon cancer based on the comparison. The methods may comprise determining double stranded DNA breaks in a nucleic acid sample obtained from a subject, comparing the determined quantity of breaks with reference values for a quantity of breaks indicative of a predisposition to develop colon cancer, and optionally with reference values for a quantity or location of breaks indicative of a lack of a predisposition to develop colon cancer, and diagnosing whether the subject has a predisposition to develop colon cancer based on the comparison. The methods may comprise determining a surrogate of double stranded DNA breaks such as a quantity of one or more of phosphorylated ataxia telangiectasia mutated (ATM), Rad3-related protein (ATR), and Tumor suppressor p53-binding protein 1 (53BP1) in gamma-H2AX foci, comparing the determined quantity of ATM, ATR, and/or 53BP1 in the gamma-H2AX foci with reference values for a quantity of ATM, ATR, and/or 53BP1 in gamma-H2AX foci indicative of a predisposition to develop colon cancer, and optionally with reference values for a quantity of ATM, ATR, and/or 53BP1 in gamma-H2AX foci indicative of a lack of a predisposition to develop colon cancer, and diagnosing whether the subject has a predisposition to develop colon cancer based on the comparison.
[0008] The comparing step may be carried out using a processor programmed to compare determined quantities of gamma-H2AX foci with reference values of a quantity of gamma-H2AX foci, or programmed to compare determined types of genomic instability with reference values of types of genomic instability, programmed to compare a determined quantity or location of double stranded DNA breaks with reference values of quantities or locations of double stranded DNA breaks, or programmed to compare a determined quantity of ATM, ATR, and/or 53BP1 in gamma-H2AX foci with reference values for ATM, ATR, and/or 53BP1 in gamma-H2AX foci. The reference values may indicate a high, moderate, low, or no significant probability of a subject having a predisposition to develop colon cancer. The methods may further comprise determining variations in one or more of the ERCC6 gene, the WRN gene, the TERT gene, or the FAAP100 gene associated with causing genomic instability, a DNA damage response, or a predisposition to develop colon cancer. In some aspects, the variations may be any variation described or exemplified herein. Determining such gene variations may be carried out according to any method described or exemplified herein.
[0009] In some aspects, the methods comprise determining whether a nucleic acid comprising the ERCC6 gene obtained from a subject encodes a tyrosine at position 180 of the Cockayne Syndrome B protein, and diagnosing whether the subject has a predisposition to develop colon cancer based on the presence or absence of a nucleic acid sequence encoding tyrosine at position 180. In some aspects, the methods comprise determining whether a nucleic acid comprising the WRN gene obtained from a subject encodes an isoleucine at position 705 of the Werner protein, or encodes a tyrosine at position 1292 of the Werner protein, and diagnosing whether the subject has a predisposition to develop colon cancer based on the presence or absence of a nucleic acid sequence encoding isoleucine at position 705 or a nucleic acid sequence encoding tyrosine at position 1292. In some aspects, the methods comprise determining whether a nucleic acid comprising the TERT gene obtained from a subject encodes an arginine at position 198 of the Telomerase Reverse Transcriptase protein, and diagnosing whether the subject has a predisposition to develop colon cancer based on the presence or absence of a nucleic acid sequence encoding arginine at position 198. In some aspects, the methods comprise determining whether a nucleic acid comprising the FAAP100 gene obtained from a subject encodes a leucine at position 466 of the Fanconi anemia associated protein of 100 kD protein, and diagnosing whether the subject has a predisposition to develop colon cancer based on the presence or absence of a nucleic acid sequence encoding leucine at position 466.
[0010] The determining step may comprise determining the sequence of the nucleic acid comprising the ERCC6 gene, comparing the determined sequence with one or more reference nucleic acid sequences encoding a tyrosine at position 180 of the Cockayne Syndrome B protein and optionally one or more reference nucleic acid sequences that do not encode a tyrosine at position 180 of the Cockayne Syndrome B protein, and determining whether the determined sequence encodes a tyrosine at position 180 based on the comparison. The determining step may comprise determining the sequence of the nucleic acid comprising the WRN gene, comparing the determined sequence with one or more reference nucleic acid sequences encoding an isoleucine at position 705 of the Werner protein or one or more reference nucleic acid sequences encoding a tyrosine at position 1292 of the Werner protein, and optionally one or more reference nucleic acid sequences that do not encode an isoleucine at position 705 or a tyrosine at position 1292 of the Werner protein, and determining whether the determined sequence encodes an isoleucine at position 705 or a tyrosine at position 1292 based on the comparison. The determining step may comprise determining the sequence of the nucleic acid comprising the TERT gene, comparing the determined sequence with one or more reference nucleic acid sequences encoding an arginine at position 198 of the Telomerase Reverse Transcriptase protein and optionally one or more reference nucleic acid sequences that do not encode an arginine at position 198 of the Telomerase Reverse Transcriptase protein, and determining whether the determined sequence has the alteration based on the comparison. The determining step may comprise determining the sequence of the nucleic acid comprising the FAAP100 gene, comparing the determined sequence with one or more reference nucleic acid sequences encoding a leucine at position 466 of the Fanconi anemia associated protein of 100 kD and optionally one or more reference nucleic acid sequences that do not encode a leucine at position 466 of the Fanconi anemia associated protein of 100 kD, and determining whether the determined sequence encodes a leucine at position 466 based on the comparison. The comparing step may be carried out using a processor programmed to compare determined nucleic acid sequences and reference nucleic acid sequences.
[0011] The determining step may comprise contacting the nucleic acid obtained from a subject with one or more polynucleotide probes having a nucleic acid sequence complementary to a nucleic acid sequence encoding a tyrosine at position 180 of the Cockayne Syndrome B protein under stringent conditions, and optionally contacting the nucleic acid obtained from a subject with one or more reference polynucleotide probes having a nucleic acid sequence complementary to a nucleic acid sequence that does not encode a tyrosine at position 180 of the Cockayne Syndrome B protein under stringent conditions, determining whether the one or more probes, and optionally, whether the one or more reference polynucleotide probes, have hybridized with the nucleic acid obtained from the subject, and determining whether the subject has a predisposition to develop colon cancer based on the determination of whether the probes or reference probes have hybridized with the nucleic acid. The determining step may comprise contacting the nucleic acid obtained from a subject with one or more polynucleotide probes having a nucleic acid sequence complementary to a nucleic acid sequence encoding an isoleucine at position 705 of the Werner protein under stringent conditions, or one or more polynucleotide probes having a nucleic acid sequence complementary to a nucleic acid sequence encoding a tyrosine at position 1292 of the Werner protein under stringent conditions, and optionally contacting the nucleic acid obtained from a subject with one or more polynucleotide probes having a nucleic acid sequence complementary to a nucleic acid sequence that does not encode an isoleucine at position 705 or a tyrosine at position 1292 of the Werner protein, determining whether the one or more probes, and optionally, whether the one or more reference probes, have hybridized with the nucleic acid obtained from the subject, and determining whether the subject has a predisposition to develop colon cancer based on the determination of whether the probes have hybridized with the nucleic acid. The determining step may comprise contacting the nucleic acid obtained from a subject with one or more polynucleotide probes having a nucleic acid sequence complementary to a nucleic acid sequence encoding an arginine at position 198 of the Telomerase Reverse Transcriptase protein under stringent conditions, and optionally contacting the nucleic acid obtained from a subject with one or more polynucleotide probes having a nucleic acid sequence complementary to a nucleic acid sequence that does not encode an arginine at position 198 of the Telomerase Reverse Transcriptase protein under stringent conditions, determining whether the one or more probes, and optionally, whether the one or more reference probes, have hybridized with the nucleic acid obtained from the subject, and determining whether the subject has a predisposition to develop colon cancer based on the determination of whether the probes have hybridized with the nucleic acid. The determining step may comprise contacting the nucleic acid obtained from a subject with one or more polynucleotide probes having a nucleic acid sequence complementary to a nucleic acid sequence encoding a leucine at position 466 of the Fanconi anemia associated protein of 100 kD under stringent conditions, and optionally contacting the nucleic acid obtained from a subject with one or more reference polynucleotide probes having a nucleic acid sequence complementary to a nucleic acid sequence that does not encode a leucine at position 466 of the Fanconi anemia associated protein of 100 kD under stringent conditions, determining whether the one or more probes, and optionally, whether the one or more reference polynucleotide probes, have hybridized with the nucleic acid obtained from the subject, and determining whether the subject has a predisposition to develop colon cancer based on the determination of whether the probes or reference probes have hybridized with the nucleic acid. The nucleic acid may be comprised within a cell, and the method may comprise contacting the nucleic acid in the cell with the one or more polynucleotide probes, and optionally with the one or more reference polynucleotide probes. If more than one probe was contacted with the nucleic acid, the method may comprise the step of identifying which of the probes hybridized with the nucleic acid.
[0012] The methods may further comprise determining the presence or absence of genomic instability in subjects determined to have one or more of the ERCC6, WRN, TERT, or FAAP100 gene alterations described or exemplified herein. Genomic instability may comprise aneuploidy or polyploidy among the subject's chromosomes. Genomic instability may comprise one or more of chromosomal translocations, chromosomal inversions, chromosome deletions, broken DNA chains, or abnormal DNA structure. Genomic instability may comprise double stranded DNA breaks. Determining the presence or absence of genomic instability may be carried out using any methodology suitable in the art, including those described or exemplified herein. Such methods include, without limitation, karyotyping, metaphase spreads, flow cytometry of propidium iodide-stained cells, immunofluorescence, immunohistochemistry, and determination of the activation of a DNA damage response.
[0013] A nucleic acid sequence encoding tyrosine at position 180 may comprise an A to T substitution in the codon encoding asparagine at position 180 of the Cockayne Syndrome B protein. The A to T substitution may occur at a position corresponding to position number 50,408,777 in the ERCC6 gene locus of human chromosome number 10. The Cockayne Syndrome B protein may comprise the amino acid sequence of SEQ ID N0:5.
[0014] A nucleic acid sequence encoding an isoleucine at position 705 of the Werner protein may comprise a C to T substitution in the codon encoding threonine at position 705 of the Werner protein. The C to T substitution may occur at a position corresponding to position number 31,088,698 in the WRN gene locus of human chromosome number 8. A nucleic acid sequence encoding a tyrosine at position 1292 of the Werner protein may comprises a C to A substitution in the codon encoding serine at position 1292 of the Werner protein. The C to A substitution may occur at a position corresponding to position number 31,134,481 in the WRN gene locus of human chromosome number 8. The Werner protein may comprise the amino acid sequence of SEQ ID N0:10.
[0015] A nucleic acid sequence encoding arginine at position 198 may comprise a G to C substitution in the codon encoding glycine at position 198 of the Telomerase Reverse Transcriptase protein. The G to C substitution may occur at a position corresponding to position number 1,347,409 in the TERT gene locus of human chromosome number 5. The Telomerase Reverse Transcriptase protein may comprise the amino acid sequence of SEQ ID NO:19.
[0016] A nucleic acid sequence encoding leucine at position 466 may comprise a C to T substitution in the codon encoding serine at position 466 of the Fanconi anemia associated protein of 100 kD. The C to T substitution may occur at a position corresponding to position number 77,124,711 in the FAAP100 gene locus of human chromosome number 17. The Fanconi anemia associated protein of 100 kD may comprise the amino acid sequence of SEQ ID NO:24.
[0017] The invention also features isolated polynucleotides. The polynucleotides may be affixed to a support, including an array. In some aspects, an isolated polynucleotide comprises the ERCC6 gene comprising a nucleic acid sequence encoding a tyrosine at position 180 of the Cockayne Syndrome B protein. The ERCC6 gene may comprise an A to T substitution at a position corresponding to position number 50,408,777 in the ERCC6 gene locus of human chromosome number 10. The nucleic acid sequence may comprise SEQ ID NO:1. The nucleic acid sequence may encode the amino acid sequence of SEQ ID NO:4.
[0018] In some aspects, an isolated polynucleotide comprises the WRN gene comprising a nucleic acid sequence encoding an isoleucine at position 705 of the Werner protein. The WRN gene may comprises a C to T substitution at a position corresponding to position number 31,088,698 in the WRN gene locus of human chromosome number 8. The nucleic acid sequence may comprise SEQ ID NO:6. The nucleic acid sequence may encode the amino acid sequence of SEQ ID NO:9.
[0019] In some aspects, an isolated polynucleotide comprises the WRN gene comprising a nucleic acid sequence encoding a tyrosine at position 1292 of the Werner protein. The WRN gene may comprise a C to A substitution at a position corresponding to position number 31,134,481 in the WRN gene locus of human chromosome number 8. The nucleic acid sequence may comprise SEQ ID NO:13. The nucleic acid sequence may encode the amino acid sequence of SEQ ID NO:14. The nucleic acid sequence may comprise SEQ ID NO:11. The nucleic acid sequence may encode the amino acid sequence of SEQ ID NO:12.
[0020] In some aspects, an isolated polynucleotide comprises the TERT gene comprising a nucleic acid sequence encoding an arginine at position 198 of the Telomerase Reverse Transcriptase protein. The TERT gene may comprise a G to C substitution at a position corresponding to position number 1,347,409 in the TERT gene locus of human chromosome number 5. The nucleic acid sequence may comprise SEQ ID NO:15. The nucleic acid sequence may encode the amino acid sequence of SEQ ID NO:18.
[0021] In some aspects, an isolated polynucleotide comprises the FAAP100 gene comprising a nucleic acid sequence encoding a leucine at position 466 of the Fanconi anemia associated protein of 100 kD. The FAAP100 gene may comprises a C to T substitution at a position corresponding to position number 77,124,711 in the FAAP100 gene locus of human chromosome number 17. The nucleic acid sequence may comprise SEQ ID NO:20. The nucleic acid sequence may encode the amino acid sequence of SEQ ID NO:23.
BRIEF DESCRIPTION OF THE DRAWINGS
[0022] FIG. 1A shows a metaphase spread (right panel) from patient 120713, showing a gain of chromosome 9 as identified on the ordered array (left panel). Phytohemaglutinen (PHA)-stimulated peripheral blood lymphocytes were used as a source of the chromosomes. FIG. 1B shows a second metaphase spread (right panel) from patient 120713, showing a gain of chromosome 11 as identified on the ordered array (left panel). Two other gains were identified among 50 well-separated metaphase spreads in patient 120713 (not shown).
[0023] FIG. 2 shows an increased S phase fraction in patient 118294 compared to its matched control (102615) and all other samples, and shows an increase G2/M fraction in patient 120713 compared to its matched control (237313) and all other samples. The plots were obtained from flow cytometry analysis of propidium iodide (PO-stained, PHA-stimulated lymphocytes. S phase fractions are marked by hatching. G2/M phase fractions are shown in dark grey on the right-hand side of the plot.
[0024] FIG. 3 shows multiple reads of the ERCC6 variant from patient 120713, affirming validity of the N180Y change in the CSB protein sequence.
[0025] FIG. 4 shows the location of the CSB protein variant N180Y (arrow) in patient 120713 within a highly conserved region predicted to be a surface-exposed region of the protein, and therefore functionally significant. Amino acids predicted to be functionally significant are designated by bold typeface.
[0026] FIG. 5 shows multiple reads of the WRN gene sequence variants from patients 120713 and 118294, affirming validity of the T7051 (top panel) and S1292Y (bottom panel) changes in the Werner protein sequence.
[0027] FIG. 6 shows greater DDR foci in patient 120713 than its control. Lymphocytes were treated with 4 J/m2 UV or 3 μM aphidicolin for 2 h and fixed 5 h (UV) or 1 h later (aph). γH2AX IF foci were scored in blinded fashion. Results shown are the percent of cells (y-axis) with ≧10 nuclear foci (total of 1,117 cells scored). Insert: 120713 cell foci after UV. Comparison with (no) Rx: P=0.08.
[0028] FIG. 7 shows greater gamma-H2AX foci in lymphocytes from patient 120713 than the control in another experiment. The foci are higher at the baseline (no Rx), as well as in response to amphidicolin, camptothecin, and etopside treatments.
DETAILED DESCRIPTION OF THE INVENTION
[0029] Various terms relating to aspects of the present invention are used throughout the specification and claims. Such terms are to be given their ordinary meaning in the art, unless otherwise indicated. Other specifically defined terms are to be construed in a manner consistent with the definition provided in this document.
[0030] As used throughout, the singular forms "a," "an," and "the" include plural referents unless expressly stated otherwise.
[0031] A molecule such as a polynucleotide has been "isolated" if it has been removed from its natural environment and/or altered by the hand of a human being.
[0032] A nucleotide in a nucleic acid sequence such as but not limited to a cDNA, mRNA, or derivative thereof may correspond to a nucleotide in the genomic nucleic acid sequence. In this respect, corresponding to comprises a positional relationship of nucleotides in the genomic DNA gene sequence relative to nucleotides in a polynucleotide sequence (e.g., cDNA, mRNA) obtainable from the genomic DNA sequence.
[0033] The terms subject and patient are used interchangeably. A subject may be any animal, and preferably is a mammal. A mammalian subject may be a farm animal (e.g., sheep, horse, cow, pig), a companion animal (e.g., cat, dog), a rodent or laboratory animal (e.g., mouse, rat, rabbit), or a non-human primate (e.g., old world monkey, new world monkey). Human beings are highly preferred.
[0034] It has been observed in accordance with the invention that certain variations, which include deletions, substitutions, rearrangements, and combinations thereof, in the germline nucleic acid sequence of one or more of the Excision Repair Cross-Complementing Rodent Repair Deficiency Complementation Group 6 (ERCC6) gene, the Werner Syndrome RecQ Helicase-like (WRN) gene, the Telomerase Reverse Transcriptase (TERT) gene, and the Fanconi anemia associated protein of 100 kD (FAAP100) predispose subjects having such variations to genomic instability, double stranded DNA breaks, and/or extensive phosphorylation of the histone H2AX, forming gamma-H2AX foci proximal to the DNA breaks. It has also been observed that certain DNA damage response proteins such as phosphorylated ataxia telangiectasia mutated (ATM), Rad3-related protein (ATR), and Tumor suppressor p53-binding protein 1 (53BP1) are recruited into such foci. Without intending to be limited to any particular theory or mechanism of action, it is believed that such genomic instability, double stranded DNA breaks, and/or enhanced gamma-H2AX foci are markers of a predisposition to develop colon cancer. Accordingly, the invention features methods for diagnosing a predisposition to develop colon cancer. Any of the methods may be carried out in vivo, in vitro, or in situ.
[0035] In general, the methods comprise determining genomic instability and/or double stranded DNA breaks in a nucleic acid sample obtained from a subject, and/or determining gamma-H2AX foci in a cell or cell nucleus sample obtained from a subject. Determining genomic instability, double stranded DNA breaks, and/or gamma-H2AX foci may be carried out according to any suitable method, including the methods described or exemplified herein. The determined genomic instability, double stranded DNA breaks, and/or gamma-H2AX foci may be compared with quantitative or qualitative reference values for genomic instability, double stranded DNA breaks, and/or gamma-H2AX foci associated with a predisposition to develop colon cancer, and optionally with quantitative or qualitative reference values for genomic instability, double stranded DNA breaks, and/or gamma-H2AX foci not associated with a predisposition to develop colon cancer, for example, reference values of a healthy subject or a subject not at risk to develop colon cancer based on these markers. The reference values may, for example, comprise values indicative of a high risk for developing colon cancer, values indicative of a moderate risk for developing colon cancer, and/or values indicative of a low risk for developing colon cancer. The comparing step may be carried out using a processor programmed to compare determined quantitative or qualitative values for genomic instability, double stranded DNA breaks, and/or gamma-H2AX foci with quantitative or qualitative reference values for such markers.
[0036] The methods for diagnosing a predisposition to develop colon cancer may further comprise (e.g., in addition to determining genomic instability, double stranded DNA breaks, and/or gamma-H2AX foci), or comprise in the alternative (e.g., without determining genomic instability, double stranded DNA breaks, and/or gamma-H2AX foci), identifying germline nucleic acid sequence alterations in the ERCC6, WRN, TERT, and/or FAAP100 genes that predispose a subject to develop colon cancer. In some aspects, the methods comprise determining whether a nucleic acid comprising the ERCC6, WRN, TERT, and/or FAAP100 gene obtained from a subject comprises an alteration in the nucleic acid sequence that has been associated with predisposing a subject to develop colon cancer. In some detailed aspects, the methods comprise comparing nucleic acid sequences. For example, such methods may comprise the steps of comparing the sequence of a nucleic acid comprising the ERCC6, WRN, TERT, and/or FAAP100 gene obtained from a tissue sample obtained from a subject with one or more reference nucleic acid sequences comprising one or more alterations in the ERCC6, WRN, TERT, and/or FAAP100 germline sequence that predispose a subject to genomic instability, and determining whether the ERCC6, WRN, TERT, and/or FAAP100 gene sequence obtained from the subject has the alteration based on the comparison. The comparing step may be carried out using a processor programmed to compare nucleic acid sequences, for example, to compare the nucleic acid sequences obtained from the subject and the reference nucleic acid sequences. The methods may optionally include the step of determining the sequence of the nucleic acid comprising the ERCC6, WRN, TERT, and/or FAAP100 gene obtained from the subject. The methods may comprise the step of diagnosing whether the subject has a predisposition to genomic instability and/or has a predisposition to develop colon cancer based on the presence or absence of an alteration associated with a predisposition to genomic instability and/or to develop colon cancer in the nucleic acid comprising the ERCC6, WRN, TERT, and/or FAAP100 gene obtained from the subject.
[0037] From the subject, the sample may be from any tissue or cell in which genomic DNA or a genomic DNA sequence may be obtained. Non-limiting examples include blood, hair, and buccal tissue or cells. The methods may include the step of obtaining the tissue sample, and may include the step of obtaining the nucleic acid, and may include the step of obtaining a cell nucleus. The nucleic acid may be any nucleic acid that has, or from which may be determined, the presence and/or quantity of genomic instability or double stranded DNA breaks, and the cell or nucleus may be any cell or nucleus that has, or from which may be determined, the presence and/or quantity of gamma-H2AX foci. The nucleic acid may be any nucleic acid that has, or from which may be obtained, the germline nucleic acid sequence of the ERCC6, WRN, TERT, and/or FAAP100 genes, or the complement thereof, or any portion thereof. For example, the nucleic acid may be chromosomal or genomic DNA, may be mRNA, or may be a cDNA obtained from the mRNA. The sequence of the nucleic acid may be determined using any sequencing method suitable in the art.
[0038] In some detailed aspects, the methods comprise hybridizing nucleic acids. For example, such methods may comprise the steps of contacting (preferably under stringent conditions), a nucleic acid comprising the ERCC6, WRN, TERT, and/or FAAP100 gene obtained from the subject with one or more polynucleotide probes that have a nucleic acid sequence complementary to an ERCC6, WRN, TERT, and/or FAAP100 nucleic acid sequence having one or more alterations that predispose a subject to develop colon cancer, and determining whether the one or more probes hybridized with the nucleic acid comprising the ERCC6, WRN, TERT, and/or FAAP100 gene obtained from the subject. The methods may comprise the step of diagnosing whether the subject has a predisposition to develop colon cancer based on whether the probes have hybridized with the nucleic acid.
[0039] The probes may comprise a detectable label. The nucleic acid obtained from a subject may be labeled with a detectable label. Detectable labels may be any suitable chemical label, metal label, enzyme label, fluorescent label, radiolabel, or combination thereof. The methods may comprise detecting the detectable label on probes hybridized with the nucleic acid comprising the ERCC6, WRN, TERT, and/or FAAP100 gene. The probes may be affixed to a support, such as an array. For example, a labeled nucleic acid obtained from a subject may be contacted with an array of probes affixed to a support. The probes may include any probes described or exemplified herein.
[0040] In some detailed aspects, the hybridization may be carried out in situ, for example, in a cell obtained from the subject. For example, the methods may comprise contacting (preferably under stringent conditions) a cell comprising a nucleic acid comprising the ERCC6, WRN, TERT, and/or FAAP100 gene obtained from the subject, or contacting (preferably under stringent conditions) a nucleic acid in the cell, with one or more polynucleotide probes comprising a nucleic acid sequence complementary to a ERCC6, WRN, TERT, and/or FAAP100 germline nucleic acid sequence having one or more alterations that predispose a subject to develop colon cancer and determining whether the one or more probes hybridized with the nucleic acid comprising the ERCC6, WRN, TERT, and/or FAAP100 gene in the cell. The methods may comprise the step of diagnosing whether the subject has a predisposition to develop colon cancer based on whether the probes have hybridized with the nucleic acid. The probes may comprise a detectable label, and the method may comprise detecting the detectable label on probes hybridized with the nucleic acid comprising the ERCC6, WRN, TERT, and/or FAAP100 gene. Detectable labels may be any suitable chemical label, metal label, enzyme label, fluorescent label, radiolabel, or combination thereof.
[0041] In any of the hybridization assays, the probes may be DNA or RNA, are preferably single stranded, and may have any length suitable for avoiding cross-hybridization of the probe with a second target having a similar sequence with the desired target. Suitable lengths are recognized in the art as from about 20 to about 60 nucleotides optimal for many hybridization assays (for example, see the Resequencing Array Design Guide available from Affymetrix: http://www.affymetrix.com/support/technical/byproduct.affx?product=cseq), though any suitable length may be used, including shorter than 20 or longer than 60 nucleotides. It is preferred that the probes hybridize under stringent conditions to the ERCC6, WRN, TERT, and/or FAAP100 nucleic acid sequence of interest. It is preferred that the probes have 100% complementary identity with the target sequence.
[0042] The methods described herein, including the hybridization assays, whether carried out in vitro, on an array, or in situ, may be used to determine any alteration in the ERCC6, WRN, TERT, and/or FAAP100 germline nucleic acid sequence that has a known or suspected association with predisposing a subject to genomic instability and/or to develop colon cancer, including any of those described or exemplified herein. In any of the methods described herein, the alterations may be, for example, a mutation or variation in the germline nucleic acid sequence relative to a germline nucleic acid sequence that has no known or suspected association with predisposing a subject to develop colon cancer. The alteration may comprise one or more nucleotide substitutions, an addition of one or more nucleotides in one or more locations, a deletion of one or more nucleotides in one or more locations, an inversion or other DNA rearrangement, or any combination thereof. A substitution may, but need not, change the amino acid sequence of the protein encoded by the ERCC6, WRN, TERT, and/or FAAP100 gene. Any number of substitutions, additions, or deletions of nucleotides are possible. The alteration may occur in an intron, an exon, or both.
[0043] The one or more alterations in the ERCC6 gene may be located in human chromosome 10, for example, at segment 10q11.2. One non-limiting example of a particular alteration that may predispose a subject to develop colon cancer includes an A to T substitution in exon 3. The substitution may occur at position 50,408,777 of human chromosome 10, and may comprise an A to T substitution at this position. The substitution may comprise a polynucleotide having the nucleic acid sequence of SEQ ID NO:1. The polynucleotide having the substitution may comprise SEQ ID NO:1, or a portion thereof. The substitution may occur in the polynucleotide at the position corresponding to position 537 of SEQ ID NOs:1 or 2, and may comprise an A to T substitution at this position. The substitution may occur in the polynucleotide at the position corresponding to position 692 of the mRNA nucleic acid sequence of Accession No. NM--000124 (SEQ ID NO: 3), and may comprise an A to T substitution at this position.
[0044] The ERCC6 gene encodes the Cockayne Syndrome B protein (CSB protein). Thus, in some aspects, one or more alterations in the ERCC6 gene may change the amino acid sequence of the CSB protein. One non-limiting example of a particular amino acid alteration that may predispose a subject to develop colon cancer includes an asparagine to tyrosine substitution at position 180 in the CSB protein. The amino acid alteration may comprise a polypeptide having the amino acid sequence of SEQ ID NO:4. The amino acid alteration may comprise a substitution of tyrosine with asparagine in the position corresponding to position 180 in the CSB protein sequence of SEQ ID NO:5. In some aspects, nucleic acid alterations in the ERCC6 gene encode a tyrosine at position 180 in the CSB protein. Thus, the methods may comprise determining whether a nucleic acid comprising the ERCC6 gene obtained from the subject encodes a tyrosine at position 180 of the CSB protein.
[0045] The one or more alterations in the WRN gene may be located in human chromosome 8, for example, at segment 8p12. One non-limiting example of a particular alteration that may predispose a subject to develop colon cancer includes a C to T substitution in exon 19. The substitution may occur at position 31,088,698 of human chromosome 8, and may comprise a C to T substitution at this position. The substitution may comprise a polynucleotide having the nucleic acid sequence of SEQ ID NO:6. The polynucleotide having the substitution may comprise SEQ ID NO:6, or a portion thereof. The substitution may occur in the polynucleotide at the position corresponding to position 2113 of SEQ ID NOs:6 or 7, and may comprise a C to T substitution at this position. The substitution may occur in the polynucleotide at the position corresponding to position 2902 of the mRNA nucleic acid sequence of Accession No. NM--000553 (SEQ ID NO:8), and may comprise a C to T substitution at this position.
[0046] Another non-limiting example of a particular WRN gene alteration that may predispose a subject to develop colon cancer includes a C to A substitution in exon 19. The substitution may occur at position 31,134,481 of human chromosome 8, and may comprise a C to A substitution at this position. The substitution may comprise a polynucleotide having the nucleic acid sequence of SEQ ID NO:11. The polynucleotide having the substitution may comprise SEQ ID NO:11, or a portion thereof. The substitution may occur in the polynucleotide at the position corresponding to position 3875 of SEQ ID NOs:11 or 7, and may comprise a C to A substitution at this position. The substitution may occur in the polynucleotide at the position corresponding to position 4663 of the mRNA nucleic acid sequence of Accession No. NM--000553 (SEQ ID NO:8), and may comprise a C to A substitution at this position.
[0047] In some aspects, the WRN gene may include both the C to T alteration at position 31,088,698 of human chromosome 8 and the C to A alteration at position 31,134,481 of human chromosome 8. The dual substitution may comprise a polynucleotide having the nucleic acid sequence of SEQ ID NO:13. The polynucleotide having the substitution may comprise SEQ ID NO:13, or a portion thereof. The dual substitution may occur in the polynucleotide at the position corresponding to position 2113 and position 3875 of SEQ ID NO:6, 7, or 11, and may comprise a C to T substitution at position 2113 and a C to A substitution at position 3875. The dual substitution may occur in the polynucleotide at the position corresponding to position 2902 and the position corresponding to position 4663 of the mRNA nucleic acid sequence of Accession No. NM--000553 (SEQ ID NO:8), and may comprise a C to T substitution at position 2902 and a C to A substitution at position 4663.
[0048] The WRN gene encodes the Werner protein. Thus, in some aspects, one or more alterations in the WRN gene may change the amino acid sequence of the Werner protein. One non-limiting example of a particular amino acid alteration that may predispose a subject to develop colon cancer includes a threonine to isoleucine substitution at position 705 in the Werner protein. The amino acid alteration may comprise a polypeptide having the amino acid sequence of SEQ ID NO:9. The amino acid alteration may comprise a substitution of threonine with isoleucine in the position corresponding to position 705 in the Werner protein sequence of SEQ ID NO:10. In some aspects, nucleic acid alterations in the WRN gene encode an isoleucine at position 705 in the Werner protein. Thus, the methods may comprise determining whether a nucleic acid comprising the WRN gene obtained from the subject encodes an isoleucine at position 705 of the Werner protein.
[0049] Another non-limiting example of a particular amino acid alteration that may predispose a subject to develop colon cancer includes a serine to tyrosine substitution at position 1292 in the Werner protein. The amino acid alteration may comprise a polypeptide having the amino acid sequence of SEQ ID NO:12. The amino acid alteration may comprise a substitution of serine with tyrosine in the position corresponding to position 1292 in the Werner protein sequence of SEQ ID NO:10. In some aspects, nucleic acid alterations in the WRN gene encode a tyrosine at position 1292 in the Werner protein. Thus, the methods may comprise determining whether a nucleic acid comprising the WRN gene obtained from the subject encodes a tyrosine at position 1291 of the Werner protein.
[0050] In some aspects, two or more alterations in the Werner protein amino acid sequence may predispose a subject to develop colon cancer. For example, the altered Werner protein amino acid sequence may comprise a threonine to isoleucine substitution at position 705 and a serine to tyrosine substitution at position 1292 of the Werner protein. The amino acid alteration may comprise a polypeptide having the amino acid sequence of SEQ ID NO:14. The amino acid alteration may comprise a substitution of threonine with isoleucine at position 705 and a substitution of serine with tyrosine at position 1292 in the Werner protein sequence of SEQ ID NO:10. In some aspects, nucleic acid alterations in the WRN gene encode both an isoleucine at position 705 and a tyrosine at position 1292 in the Werner protein. Thus, the methods may comprise determining whether a nucleic acid comprising the WRN gene obtained from the subject encodes an isoleucine at position 705 of the Werner protein and determining whether a nucleic acid comprising the WRN gene obtained from the subject encodes a tyrosine at position 1292 of the Werner protein.
[0051] The one or more alterations in the TERT gene may be located in human chromosome 5, for example, at segment 5p15.3. One non-limiting example of a particular alteration that may predispose a subject to develop colon cancer includes a G to C substitution in exon 2. The substitution may occur at position 1,347,409 of human chromosome 5, and may comprise a G to C substitution at this position. The substitution may comprise a polynucleotide having the nucleic acid sequence of SEQ ID NO:15. The polynucleotide having the substitution may comprise SEQ ID NO:15, or a portion thereof. The substitution may occur in the polynucleotide at the position corresponding to position 591 of SEQ ID NOs:15 or 16, and may comprise a G to C substitution at this position. The substitution may occur in the polynucleotide at the position corresponding to position 650 of the mRNA nucleic acid sequence of Accession No. NM--198253 (SEQ ID NO:17), and may comprise a G to C substitution at this position.
[0052] The TERT gene encodes the Telomerase Reverse Transcriptase protein. Thus, in some aspects, one or more alterations in the TERT gene may change the amino acid sequence of the Telomerase Reverse Transcriptase protein. One non-limiting example of a particular amino acid alteration that may predispose a subject to develop colon cancer includes an glycine to arginine substitution at position 198 in the Telomerase Reverse Transcriptase protein. The amino acid alteration may comprise a polypeptide having the amino acid sequence of SEQ ID NO:18. The amino acid alteration may comprise a substitution of glycine with arginine in the position corresponding to position 198 in the amino acid sequence of SEQ ID NO:19. In some aspects, nucleic acid alterations in the TERT gene encode an isoleucine at position 198 in the Telomerase Reverse Transcriptase protein. Thus, the methods may comprise determining whether a nucleic acid comprising the TERT gene obtained from the subject encodes an arginine at position 198 of the Telomerase Reverse Transcriptase protein.
[0053] The reference nucleic acid sequences used in nucleic acid sequence comparison aspects of the methods may comprise one or more of SEQ ID NO:1, SEQ ID NO:6, SEQ ID NO:11, SEQ ID NO:13, and SEQ ID NO:15, or portion thereof having one or more alterations associated with a predisposition/risk of developing colon cancer. The reference nucleic acid sequences may also include nucleic acid sequences that do not have any nucleotide alterations that are associated with a predisposition/risk of developing colon cancer to serve as controls in the comparison, or for determinations that the subject does not have a germline nucleic acid sequence alteration that predisposes to develop colon cancer. Non-limiting examples of nucleic acid sequences without such alterations include SEQ ID NO:2, SEQ ID NO:3, SEQ ID NO:7, SEQ ID NO:8, SEQ ID NO:16, and SEQ ID NO:17. Reference nucleic acid sequences having any portion of the sequence of these sequence identifiers may be used.
[0054] The FAAP100 gene (also known as C17Orf70) encodes the Fanconi anemia-associated protein of 100 kD. Thus, in some aspects, one or more alterations in the FAAP100 gene may change the amino acid sequence of the Fanconi anemia-associated protein of 100 kD. One non-limiting example of a particular amino acid alteration that may predispose a subject to develop colon cancer includes a serine to leucine substitution at position 466 in the Fanconi anemia-associated protein of 100 kD. The amino acid alteration may comprise a polypeptide having the amino acid sequence of SEQ ID NO:23. The amino acid alteration may comprise a substitution of serine with leucine in the position corresponding to position 466 in the Fanconi anemia-associated protein of 100 kD sequence of SEQ ID NO:24. In some aspects, nucleic acid alterations in the FAAP100 gene encode a leucine at position 466 in the Fanconi anemia-associated protein of 100 kD. Thus, the methods may comprise determining whether a nucleic acid comprising the FAAP100 gene obtained from the subject encodes a leucine at position 466 of the Fanconi anemia-associated protein of 100 kD.
[0055] The one or more alterations in the FAAP100 gene may be located in human chromosome 17, for example, at segment 77124711. One non-limiting example of a particular alteration that may predispose a subject to develop colon cancer includes a C to T substitution in exon 4. The substitution may occur at position 77,124,711 of human chromosome 17, and may comprise a C to T substitution at this position. The substitution may comprise a polynucleotide having the nucleic acid sequence of SEQ ID NO:20. The polynucleotide having the substitution may comprise SEQ ID NO:20, or a portion thereof. The substitution may occur in the polynucleotide at the position corresponding to position 1397 of SEQ ID NO:20, and may comprise a C to T substitution at this position. The substitution may occur in the polynucleotide at the position corresponding to position 1443 of the mRNA nucleic acid sequence of Accession No. BC--117141 (SEQ ID NO:22), and may comprise a C to T substitution at this position.
[0056] The polynucleotide probes used in nucleic acid hybridization aspects may comprise a portion of one or more of SEQ ID NO:1, SEQ ID NO:6, SEQ ID NO:11, SEQ ID NO:13, SEQ ID NO:15, and SEQ ID NO:20, the portion containing the genomic instability and/or colon cancer risk-associated alteration. The nucleic acid sequence of the probes may be complementary to the relevant portion of SEQ ID NO:1, SEQ ID NO:6, SEQ ID NO:11, SEQ ID NO:13, SEQ ID NO:15, or SEQ ID NO:20.
[0057] Polynucleotide probes having a nucleic acid sequence without any alterations associated with a predisposition to develop genomic instability and/or colon cancer may be used to serve as controls in hybridization assays, or for determinations that the subject does not have a germline nucleic acid sequence alteration that predisposes to genomic instablity or colon cancer. Non-limiting examples of nucleic acid sequences without an alteration, from which such probes may be derived, include SEQ ID NO:2, SEQ ID NO:3, SEQ ID NO:7, SEQ ID NO:8, SEQ ID NO:16, SEQ ID NO:17, and SEQ ID NO:21, and the probes may be obtained from the regions of these sequences where the respective alteration is located. The probe nucleic acid sequence may be complementary to the appropriate portion of SEQ ID NO:2, SEQ ID NO:3, SEQ ID NO:7, SEQ ID NO:8, SEQ ID NO:16, SEQ ID NO:17, and SEQ ID NO:20.
[0058] The methods for diagnosing, whether based on sequence comparison or probe hybridization, may further comprise the steps of treating the subject with a regimen capable of inhibiting the onset of colon cancer. These steps may be included, for example, if it is determined that the subject has a predisposition to develop colon cancer. In some aspects, the treatment regimen may comprise administering to the subject an effective amount of the CSB, Werner, Telomerase Reverse Transcriptase protein, or Fanconi anemia associated protein of 100 kD or genes that encode these proteins in vectors that can integrate and express in tissue stem cells. In some aspects, the treatment regimen comprises administering to the subject an effective amount of a compound or pharmaceutical composition capable of delaying or inhibiting the onset of colon cancer. In some aspects, the treatment regimen comprises one or more of diet management, vitamin supplementation, nutritional supplementation, exercise, psychological counseling, social counseling, education, and regimen compliance management. In some aspects, the treatment regimen comprises administering to the subject an effective amount of a compound or pharmaceutical composition that enhances the activity of one or more of the CSB protein, the Werner protein, the Telomerase Reverse Transcriptase protein, and the Fanconi anemia associated protein of 100 kD.
[0059] In the diagnostic methods, the tissue sample obtained from the subject may be from any tissue in which replicating cells and/or a genomic DNA sequence may be obtained. Non-limiting examples include blood, hair, and buccal tissue. Blood may comprise peripheral blood lymphocytes (PBLs). The methods may include the step of obtaining the tissue sample, and may include the step of obtaining the nucleic acid. The nucleic acid may be any nucleic acid that has, or from which may be obtained, the germline nucleic acid sequence for the ERCC6, WRN, TERT, and/or FAAP100 genes, or the complement thereof, or any portion thereof. For example, the nucleic acid may be chromosomal or genomic DNA, may be mRNA, or may be a cDNA obtained from the mRNA.
[0060] The diagnostic methods are preferably based on determining alterations in the germline nucleic acid sequences of the ERCC6, WRN, TERT, and FAAP100 genes that predispose a subject having such alterations to develop colon cancer, including any of the alterations described or exemplified herein. The reference nucleic acid sequences and the probes are thus based on alterations that predispose to develop colon cancer, and based on control sequences that do not have alterations that predispose to develop colon cancer.
[0061] The invention also provides isolated polynucleotides comprising a nucleic acid comprising the ERCC6, WRN, TERT, and/or FAAP100 gene and having one or more alterations that predispose a subject to develop colon cancer. The invention also provides isolated polynucleotides comprising a probe having a nucleic acid sequence complementary to a nucleic acid sequence having one or more alterations in the ERCC6, WRN, TERT, and/or FAAP100 gene that predispose a subject to develop colon cancer. Probes may have any suitable number of nucleotide bases. The one or more alterations may be any of the alterations described or exemplified herein. The probes preferably hybridize to a nucleic acid comprising the ERCC6, WRN, TERT, and/or FAAP100 gene under stringent conditions
[0062] Polynucleotides include polyribonucleotides and polydeoxyribonucleotide, which may be unmodified RNA or DNA or modified RNA or DNA, and include single- and double-stranded DNA, DNA that is a mixture of single- and double-stranded regions, single- and double-stranded RNA, and RNA that is mixture of single- and double-stranded regions, hybrid molecules comprising DNA and RNA that may be single-stranded or, more typically, double-stranded or a mixture of single- and double-stranded regions. Polynucleotides may have triple-stranded regions comprising RNA or DNA or both RNA and DNA, modified bases, unusual bases such as inosine, modified backbones, and enzymatic or metabolic modifications.
[0063] The alterations may comprise, for example, a nucleic acid sequence encoding a tyrosine at position 180 of the CSB protein. The CSB protein may comprise SEQ ID NO:4. A nucleic acid sequence encoding a tyrosine at position 180 of the CSB protein may comprise an A to T substitution in the codon encoding an asparagine at position 180 of the CSB protein, and the A to T substitution may occur at a position corresponding to position number 50,408,777 in the ERCC6 gene locus on human chromosome number 10.
[0064] The alterations may comprise, for example, a nucleic acid sequence encoding an isoleucine at position 705 of the Werner protein. The Werner protein may comprise SEQ ID NO:9. A nucleic acid sequence encoding an isoleucine at position 705 of the Werner protein may comprise a C to T substitution in the codon encoding a threonine at position 705 of the Werner protein, and the C to T substitution may occur at a position corresponding to position number 31,008,698 in the WRN gene locus on human chromosome number 8. In addition to, or in the alternative to a nucleic acid sequence encoding an isoleucine at position 705 of the Werner protein, the alteration may comprise a nucleic acid sequence encoding a tyrosine at position 1292 of the Werner protein. The Werner protein may comprise SEQ ID NO:12 or SEQ ID NO:14. A nucleic acid sequence encoding a tyrosine at position 1292 of the Werner protein may comprise a C to A substitution in the codon encoding a serine at position 1292 of the Werner protein, and the C to A substitution may occur at a position corresponding to position number 31,134,481 in the WRN gene locus on human chromosome number 8.
[0065] The alterations may comprise, for example, a nucleic acid sequence encoding an arginine at position 198 of the Telomerase Reverse Transcriptase protein. The Telomerase Reverse Transcriptase protein may comprise SEQ ID NO:18. A nucleic acid sequence encoding an arginine at position 198 of the Telomerase Reverse Transcriptase protein may comprise a G to C substitution in the codon encoding a serine at position 198 of the Telomerase Reverse Transcriptase protein, and the G to C substitution may occur at a position corresponding to position number 1,347,409 in the TERT gene locus on human chromosome number 5.
[0066] The alterations may comprise, for example, a nucleic acid sequence encoding a leucine at position 466 of the Fanconi anemia associated protein of 100 kD. The Fanconi anemia associated protein of 100 kD may comprise SEQ ID NO:23. A nucleic acid sequence encoding a leucine at position 466 of the Fanconi anemia associated protein of 100 kD may comprise a C to T substitution in the codon encoding a serine at position 466 of the Fanconi anemia associated protein of 100 kD, and the C to T substitution may occur at a position corresponding to position number 77,124,711 in the FAAP100 gene locus on human chromosome number 17.
[0067] The invention also features a support comprising a plurality of polynucleotides comprising a nucleic acid sequence, or portion thereof, comprising the ERCC6, WRN, TERT, and/or FAAP100 genes and having one or more alterations in the nucleic acid sequence that predispose a subject to develop colon cancer, and optionally, a plurality of polynucleotides comprising a nucleic acid sequence, or portion thereof, comprising the ERCC6, WRN, TERT, and/or FAAP100 genes and not having any alterations in the nucleic acid sequence that are known to predispose a subject to develop colon cancer. The support may comprise an array. The polynucleotides may be probes. The probes may comprise a portion of the nucleic acid sequence of SEQ ID NO:1, SEQ ID NO:6, SEQ ID NO:11, SEQ ID NO:15, or SEQ ID NO:20 comprising an alteration associated with predisposing a subject to genomic instability and/or to develop colon cancer, and the alteration may comprise any alteration described or exemplified herein. The probes may comprise the complement of the portion of the nucleic acid sequence of SEQ ID NO:1, SEQ ID NO:6, SEQ ID NO:11, SEQ ID NO:15, or SEQ ID NO:20 comprising an alteration associated with predisposing a subject to genomic instability and/or to develop colon cancer.
[0068] The invention also features isolated polypeptides, including isolated proteins comprising a polypeptide having an amino acid sequence encoded by a polynucleotide comprising one or more alterations that predispose a subject to develop colon cancer. Polypeptides include polymers of amino acid residues, one or more artificial chemical mimetic of a corresponding naturally occurring amino acid, as well as to naturally occurring amino acid polymers and non-naturally occurring amino acid polymers.
[0069] The polypeptides may comprise the CSB protein comprising a tyrosine at position 180. The polypeptides may comprise the Werner protein comprising an isoleucine at position 705. The polypeptides may comprise the Werner protein comprising a tyrosine at position 1292. The polypeptides may comprise the Werner protein comprising an isoleucine at position 705 and a tyrosine at position 1292. The polypeptides may comprise the Telomerase Reverse Transcriptase protein comprising an arginine at position 198. The polypeptides may comprise the Fanconi anemia associated protein of 100 kD comprising a leucine at position 466. The polypeptides may comprise an amino acid sequence encoded by the nucleic acid sequence of SEQ ID NO:1, SEQ ID NO:6, SEQ ID NO:11, SEQ ID NO:15, or SEQ ID NO:20. The polypeptides may comprise the amino acid sequence of SEQ ID NO:4, SEQ ID NO:9, SEQ ID NO:12, SEQ ID NO:18, or SEQ ID NO:23.
[0070] The invention also features systems for diagnosing a predisposition to develop colon cancer. In general, the systems comprise a data structure comprising one or more reference nucleic acid sequences having one or more alterations in ERCC6, WRN, TERT, and/or FAAP100 gene associated with predisposing a subject to develop colon cancer, and a processor operably connected to the data structure. Optionally, the data structure may comprise one or more reference nucleic acid sequences that do not have any alterations in the ERCC6, WRN, TERT, and/or FAAP100 genes associated with a predisposition of a subject to develop colon cancer. The processor is preferably capable of comparing, and preferably programmed to compare determined nucleic acid sequences (for example, those determined from nucleic acids obtained from a subject) with reference nucleic acid sequences.
[0071] The reference nucleic acid sequences may comprise the one or more alterations described or exemplified herein. For example, the alterations may comprise a nucleic acid sequence encoding a tyrosine at position 180 of the CSB protein. The alterations may comprise a nucleic acid sequence encoding an isoleucine at position 705 of the Werner protein and/or a nucleic acid sequence encoding a tyrosine at position 1292 of the Werner protein. The alterations may comprise a nucleic acid sequence encoding an arginine at position 198 of the Telomerase Reverse Transcriptase protein. The alterations may comprise a nucleic acid encoding a leucine at position 466 of the Fanconi anemia associated protein of 100 kD. The reference nucleic acid sequences may comprise the nucleic acid sequence of one or more of SEQ ID NO:1, SEQ ID NO:2, SEQ ID NO: 6, SEQ ID NO:7, SEQ ID NO:11, SEQ ID NO:13, SEQ ID NO:15, SEQ ID NO:16, SEQ ID NO:20, or SEQ ID NO:21.
[0072] Optionally, the system may comprise an input for accepting determined nucleic acid sequences obtained from tissue samples from a subject. Optionally, the system may comprise an output for providing results of a sequence comparison to a user such as the subject, or a technician, or a medical practitioner. Optionally, the system may comprise a sequencer for determining the sequence of a nucleic acid such as a nucleic acid obtained from a subject. Optionally, the system may comprise a detector for detecting a detectable label on a nucleic acid.
[0073] Optionally, the system may comprise computer readable media comprising executable code for causing a programmable processor to determine a diagnosis of the subject, for example whether the subject has a predisposition to develop colon based on whether or not a nucleic acid obtained from the subject includes a sequence alteration associated with a predisposition to develop colon cancer. The diagnosis may be based on the comparison of determined nucleic acid sequences with reference nucleic acid sequences. The diagnosis may be based on a determination of hybridization of a nucleic acid probe with a nucleic acid obtained from the subject. Thus, the system may comprise an output for providing a diagnosis to a user such as the subject, or a technician, or a medical practitioner. Optionally, the system may comprise computer readable media that comprises executable code for causing a programmable processor to recommend a treatment regimen for the subject, for example, a treatment regimen for preventing, inhibiting, or delaying the onset of colon cancer.
[0074] In any of the systems, a computer may comprise the processor or processors used for determining information, comparing information and determining results. The computer may comprise computer readable media comprising executable code for causing a programmable processor to determine a diagnosis of the subject. The systems may comprise a computer network connection, including an Internet connection.
[0075] The invention also provides computer-readable media. In some aspects, the computer-readable media comprise executable code for causing a programmable processor to compare the nucleic acid sequence of the ERCC6, WRN, TERT, and/or FAAP100 gene determined from a nucleic acid obtained from a tissue sample obtained from a subject with one or more reference nucleic acid sequences having one or more alterations in the ERCC6, WRN, TERT, and/or FAAP100 gene sequence associated with predisposing a subject to develop genomic instability and/or to develop colon cancer. The alterations may be any alteration described or exemplified herein. Optionally, the computer-readable media comprise executable code for causing a programmable processor to compare the nucleic acid sequence of the ERCC6, WRN, TERT, and/or FAAP100 gene determined from a nucleic acid obtained from a tissue sample obtained from a subject with one or more reference nucleic acid sequences that do not have any alterations in the ERCC6, WRN, TERT, and/or FAAP100 gene sequence associated with predisposing a subject to genomic instability and/or to develop colon cancer. The computer readable media may comprise a processor, which may be a computer processor.
[0076] The reference nucleic acid sequences may comprise any of the one or more alterations described or exemplified herein. For example, the alterations may comprise a nucleic acid sequence encoding a tyrosine at position 180 of the CSB protein. The alterations may comprise a nucleic acid sequence encoding an isoleucine at position 705 of the Werner protein and/or a nucleic acid sequence encoding a tyrosine at position 1292 of the Werner protein. The alterations may comprise a nucleic acid sequence encoding an arginine at position 198 of the Telomerase Reverse Transcriptase protein. The alterations may comprise a nucleic acid encoding a leucine at position 466 of the Fanconi anemia associated protein of 100 kD. The reference nucleic acid sequences may comprise the nucleic acid sequence of one or more of SEQ ID NO:1, SEQ ID NO:2, SEQ ID NO: 6, SEQ ID NO:7, SEQ ID NO:11, SEQ ID NO:13, SEQ ID NO:15 SEQ ID NO:16, SEQ ID NO:20, or SEQ ID NO:21.
[0077] The systems and computer readable media may be used in any of the methods described or exemplified herein, for example, methods for diagnosing a predisposition to develop colon cancer. For example, the systems and computer readable media may be used to facilitate comparisons of gene sequences, or to facilitate a diagnosis.
[0078] The methods, systems, and computer readable media comprise various reference values. For example, the reference values comprise certain quantities such as a quantity of gamma-H2Ax or a quantity of double stranded DNA breaks, and comprise certain qualities such as the presence or absence of a type of polymorphism in a gene sequence or the presence or absence of a type of genomic instability such as chromosomal aneuploidy. In general, such reference values may be established according to studies of individuals and/or studies of populations. It is contemplated that, over time, as more and more individuals and larger populations are studied, the reference values, particularly the quantitative reference values, may become more precise or established to have a greater confidence. Reference value quantities may comprise quantities based on available information for any given period of time.
[0079] The following examples are provided to describe the invention in greater detail. They are intended to illustrate, not to limit, the invention.
Example 1
Identification of Sequence Alterations Relevant to Colon Cancer Predisposition
[0080] It is believed that dysfunction of genes that maintain genome stability underlies a substantial fraction of familial colorectal carcinoma (FCRC). Based on this hypothesis, preliminary studies utilized colorectal carcinoma (CRC) patients in an in-house Gastrointestinal Cancer Risk Assessment Program who met the following criteria: (1) they developed CRC before the age of 50 and/or had a first degree relative with colon cancer and, (2) had tested negative for Lynch Syndrome/Hereditary Non-Polyposis Colorectal Cancer (HNPCC) by standard tests of their tumor for microsatellite instability and/or immunohistochemistry for levels of mismatch repair proteins and tested negative for Familial Adenomatous Polyposis coli by having fewer than five polyps detected by colonoscopy. Many of these patients had clinical features atypical for MUTYH polyposis.
[0081] All patients in the Program had donated peripheral blood from which buffy coat white blood cells (WBCs) were frozen in dimethylsulfoxide (DMSO) and used to prepare genomic DNA and had signed a broad consent for research, and Controls were selected from the same BioSample Repository that had no personal history of cancer or cancer in a first degree relative, and were matched by sex and age. Lymphocytes were cultured from eight independent patients, using stimulation with phytohemagluttinin (PHA) and Interleukin 2 (IL-2). Seven samples yielded enough cells to generate metaphase spreads, and several of these yielded enough cells to evaluate by flow cytometry.
[0082] Metaphase spreads were generated from proliferating cultures by addition of colcemid, swelling in hypotonic buffer, and dropping from height onto a slide. Chromosomes were stained with Giemsa stain to identify them. At least 50 well-separated spreads with condensed chromosomes from all 7 patients and 3 controls were scored by standard clinical cytogenetics criteria for any notable abnormality, including premature chromatid separation, aneuploidy, and chromosomal rearrangements. One patient, number 120713, showed 4 out of 50 spreads with chromosomal gains (8%; gains are viewed as more reliable than losses), each different (FIG. 1A and FIG. 1B, which show 2 of the 4 spreads). This was an unusually high degree of aneuploidy.
[0083] Only one chromosomal gain was seen in the remaining 6 cases examined (0.2%) and only 3 of 3 controls (1%), consistent with a published mode frequency of gains in normal lymphocytes of 1.3% (Cimino M C et al. (1986) Mutat. Res. 167:107-22). A second patient, number 118294, showed a complex chromosomal rearrangement. Flow cytometry of propidium-iodide-stained cells show the highest level of S phase in this patient, among 3 other cases and 5 controls (FIG. 2). Flow cytometry of cells from patient 120713 showed the highest level of G2/M phase, among the cases and controls. On this basis, patients 120713 and 118294 were selected for further analysis.
[0084] Exome sequencing was performed on their peripheral blood DNAs by SeqWright Services (Texas). The library size was good and >85% of target sequences had >20× coverage. All sequence variants were initially screened by eye for potential involvement in cell replication, DNA repair, cell cycle checkpoints or mitosis and the severity of the molecular change. The following uninformative sequence variants were found: (1) non-sense changes in 120713: EFCAB3, C22orf30, SELP, C2orf65, PRAMEEF1, ULK4, and ZNF571; and non-sense changes in 118294: FAM83A, ZNF5858, C17orf58, and ALKBH4; and (2) internal deletions or splice site changes in 120713: FAM113A, C14orf13, MED13L, PDGFD, and HERPUD2; and internal deletions or splice site changes in 118294: TRPM3, FAM113A, FU41603, SPEN, PASK, GAPVD1, SOX1. None of these changes affected proteins with known roles in cell replication and/or genome stability. In addition, patient 120713 displayed 265 missense variants, and patient 118292 displayed 262 missense variants. Among these affected genes, several had roles in replication and/or genome stability: for patient 120713: ERCC6, WRN, CDKN1a, and DUB3, and for patient 118294: TERT, WRN, and EXO1.
[0085] Among the missense variants in these latter genes, it is believed that the variants in ERCC6 and TERT had not been previously reported in publicly available single nucleotide polymorphism (SNP) databases (National Heart Lung and Blood Institute Exome Sequencing Project server (http://evs.gs.washington.edu/EVS/). The sequence quality in these regions was verified as excellent by direct inspection, and the reads were unambiguously assigned (FIG. 3).
[0086] ERCC6 is a chromatin remodeling protein (CSB protein) that is implicated in transcription-coupled DNA repair. Homozygous inherited inactivating proteins in ERCC6 cause Cockayne syndrome, a growth disorder associated with sensitivity to ultraviolet light. Polymorphisms within the ERCC6 gene have been statistically associated with head and neck tumors, bladder carcinoma, and lung cancer in other studies. The likelihood that these variants degrade protein function was evaluated using Consurf (University of Massachusetts) and PolyPhen2 (PP2; Harvard University) software.
[0087] The ERCC6 (CSB) protein variant, N180Y, represents in a tyrosine substitution for asparagine, a non-conservative change in a residue that is completely conserved among vertebrate ERCC6-encoded proteins. Furthermore, this residue is within a stretch of 9 highly conserved residues (FIG. 4) that comprise a coiled-coil motif.
[0088] The PP2 program predicted the variant to be probably damaging to function of the protein, with its highest possible confidence score of 1.0. It is believed that the coiled-coil motif currently has no known ascribed function. However, the amino-terminal 400 amino acids have been implicated in three important biochemical function of ERCC6: intramolecular inhibition of ATPase activity, inhibition of non-specific DNA binding, and interaction with the transcription complex. It is believed that the coiled-coil motif is the strongest region of sequence conservation in this region of the protein. These motifs are thought to mediate protein-protein interactions. Therefore, this motif is a logical candidate region to mediate one or more of these biochemical functions.
[0089] Examination of the clinical history of patient 120713 revealed four characteristics that may be caused by ERCC6 deficiency: (1) This patient has a history of colon cancer at age 48, without the polyposis of APC or MUTYH diseases or the microsatellite instability or mismatch repair protein expression abnormalities of Lynch. Somatic ERCC6 gene mutations have recently been found in genome-wide sequencing studies in 6% of CRCs (Wood L D et al. (2007) Science 318:1108-13; and Network CGA. (2012) Nature 487:330-7). This frequency was notable, but did not reach statistical significance, and ERCC6 was not classified as a `driver`. It is believed that ERCC6 may contribute to the development of both sporadic CRC and FCRC. (2) The patient developed basal cell carcinoma (BCC) at the unusually early age of 23. The patient's brother developed BCC at age 50 and both the patient's mother and father had multiple BCCs in their 40 s. Although Cockayne Syndrome patients do not develop BCC, their cells are particularly sensitive to UV light; BCC is believed to be a highly UV-driven tumor. Moreover, mice with inherited ERCC6 mutations are prone to UV-induced skin tumors. (3) The patient developed macular degeneration (MD) at an unusually early age (in her 40's). A sequence polymorphism in ERCC6 has been linked to MD, although this association was not confirmed in two follow-up studies. (4) The patient's father developed bladder carcinoma at age 62. Somatic ERCC6 mutations have recently been reported in some bladder cancers in Southeast Asia. Thus, there are potential links to ERCC6 dysfunction in the patient's history of colon cancer, BCC, MD, and family history of bladder cancer. These observations are suggestive of an inherited constitutional predisposition to cancer and degenerative disease with features of ERCC6 dysfunction.
[0090] WRN is a helicase that plays an important role in DNA repair, although the mechanisms of repair remain under active investigation. Mutations in WRN cause Werner's syndrome, a growth disorder associated with features of premature aging. Regions of the protein that help form the helicase domain have been mapped. The variant in the Werner protein from patient 120713 is T705I. This variant is within the helicase domain, and was predicted by the PolyPhen2 program to be probably damaging, with a high confidence score (>0.9). The WRN variant from patient 118294 is 51292Y. This variant was scored by the PP2 program as being possibly damaging.
[0091] TERT is required to maintain telomeres at chromosome ends, thereby preventing them from causing chromosomal rearrangements and being recognized as damaged DNA. TERT mutations cause progressive diseases including a plastic anemia and pulmonary fibrosis. Progress has been made in identifying regions of TERT that contribute to its RNA-directed DNA polymerase activity and its interaction with protein partners. The TERT variant from patient 118294 is G198R. It was predicted by PolyPehn2 to be possibly damaging.
[0092] The ERCC6 N180Y, WRN T705I, TERT G198R, and FAAP100 S466L (see more below) variants were each confirmed by direct polymerase chain reaction (PCR) amplification of patient DNA and Sanger DNA sequencing (FIG. 5). The other potential variants were excluded on the basis of sequence changes that were either common (e.g., >1/1000) in SNP databases and/or predicted to represent benign changes in the encoded proteins.
[0093] This analysis of variants was repeated more rigorously by identifying all genes in Gene Ontology (GO) consortium databases to be associated with the terms DNA replication, DNA repair, checkpoint, mitosis, or mitotic. Thirty four variants in patient 118294 and nineteen variants in patient 120713 were associated with these GO terms. These variants were loaded into the PP2 program by batch methods; analysis was by a somewhat more stringent version of the program. Variants were then excluded that were present at a frequency less than 20% in the exome sequencing reads (and, therefore, unreliably constitutional), present in the NHLBI SNP databases at frequencies >1/1000, or predicted to likely be benign by the PP2 program.
[0094] ERCC6 N180Y and WRN T7051 were again the two leading candidate variants that emerged from this analysis, with PP2 scores >0.99. A new top-tier candidate variant emerged from this analysis: FAAP100/C17Orf70 S466L, with a PP@ score >0.98. Te FAAP100 protein was recently identified as an essential component of the Fanconi's Anemia DNA repair pathway (see below). Additional candidates emerged from this analysis which were designated as `second tier` because they manifested higher SNP frequencies, lower PP2 scores, and/or carried less evidence for direct involvement in genome stability. From patient 120713: TRERF1, a transcription factor that may regulate the mitotic spindly checkpoint (1/13005 in SNP databases, PP2 score of 0.99); DYNC1H1, a protein implicated in mitotic spindle organization (not present in SNP databases, PP2 score 0.93 (probably damaging)); TRPM1, a transcription factor implicated in the DNA damage checkpoint (not present in SNP databases, PP2 score 0.90 (possibly damaging)), and SMC1B, a mediator of chromosomal condensation (not present in SNP databases, PP2 score possibly damaging).
[0095] The GO gene analysis from patient 118294 demoted the TERT variant to probably benign by PP2 analysis and yielded three new candidate variants: PTPRT, a protein tyrosine phosphatase that is mutated somatically in a fraction of CRCs (not present in SNP databases, probably damaging by PP2); TBRG4, protein that drives yeast cells into the cell cycle (5/13005 in SNP databases, probably damaging by PP2), and CDC14A, a phosphatase implicated in mitotic anaphase (not present in SNP databases, possibly damaging by PP2). Thus, none of the variants in patient 118294, including TERT, are believed to be top-tier.
[0096] Given that each of the second tier variants from patient 120713 and the CDC14 variant from patient 118294 has a direct or indirect role in regulating mitosis, the next stages of investigation will include an interrogation of the efficiency of mitosis in cells from each patient. Isolated cells will be infected with a retrovirus encoding a green fluorescent protein (GFP)-histone H2B fusion protein, and chromosome dynamics during mitosis will be observed in living cells. These experiments are somewhat technically challenging, given the small size of lymphocytes and the fact that they generally do not adhere to tissue culture dish bottoms, but preliminary experiments are underway.
[0097] In summary, eight independent FCRC cases were screened for constitutional genomic instability (CGI) by analyzing metaphase spreads and flow cytometry-generated cell cycle profiles of cultured peripheral lymphocytes. Two patients showed evidence of CGI in the form of aneuploidy (patient #120713), a chromosomal rearrangement (patient #118294), and/or increased fractions of cells within replicative phases (both patients). Exome sequencing revealed novel or rare heterozygous sequence variants in relevant genes. 120713 has a novel variant in ERCC6/CSB, a nucleotide excision repair gene. The variant is a strong candidate for being causal: it encodes a non-conservative change in a highly conserved residue in a region of the protein with biochemically-defined functions. The patient harboring this allele has three other clinical conditions consistent with ERCC6 dysfunction. Each patient also has a rare sequence variant in WRN, a DNA repair helicase. 120713 also carries a rare sequence variant in FAAP100, a scaffolding protein of the Fanconi's anemia DNA repair pathway. These observations provide evidence that ERCC6 and possibly WRN contribute to CGI and colon cancer in these FCRC cases.
Example 2
Follow Up Studies
[0098] The studies described in Example 1 suggest that constitutional genomic instability is more widespread than currently recognized. It is believed that heterozygous mutations will be functionally important, due to haplo-insufficiency and/or dominant negative effects. Currently recognized FCRC syndromes are autosomal dominant at the organismic level, but are thought to be largely recessive at the cellular level. The following describes additional experiments to be undertaken.
[0099] Studies will evaluate whether the sequence variants of the ERCC6, WRN, TERT, and FAAP100 genes described in this specification inactivate the function of the proteins they encode. It is believed that dysfunction of genes that maintain genome stability underlies a substantial fraction of FCRC. These studies will proceed along the following basic outline: (1) Test whether the sequence variants inactivate protein function by (a) introducing the sequence variants into expression vectors by site-directed mutagenesis, (b) testing whether the variant proteins fail to rescue cellular deficiencies in the respective proteins, and (c) testing whether the variant proteins exert dominant negative effects; (2) Further define the nature and severity of CGI in the FCRC patients by (a) repeating metaphase spread and flow cytometry assays on primary cells, (b) performing assays for activation of the DNA damage response on primary cells, (c) establishing immortalized lymphocytes from the patients and assess their expression of the variant proteins and CGI, (d) testing whether patient cells are hypersensitive to exogenous DNA damage, and (e) test whether cell phenotypes can be rescued by exogenous expression of candidate genes; and, (3) Screen 30 additional FCRC patients for CGI and relevant sequence variants by (a) examining metaphase spreads, cell cycle profiles, and DNA damage foci in peripheral lymphocytes, and (b) perform exome sequencing in patients with evidence for CGI.
[0100] It is believed that these studies will provide new molecular insights into causes of FCRC and CGI and functional elements of DNA repair proteins while offering new methods to screen for predisposition to colon cancer and to diagnose affected members of FCRC families in pre-clinical stages. This capability should allow intensive colon cancer screening by endoscopy to be focused on those patients who should benefit strongly and to be avoided in those who will not. Related clinical conditions, such as predisposition to basal cell carcinoma, macular degeneration, and bladder cancer, may also be better managed.
[0101] (1) Testing Whether the Sequence Variants Inactivate Protein Function.
(a) Introduce the Sequence Variants into Expression Vectors by Site-Directed Mutagenesis.
[0102] The investigation will begin with introducing the sequence variants into expression vectors encoding the wild type proteins. The vectors have already been prepared, and expression experiments are underway.
(b) Test Whether the Variant Proteins Fail to Rescue Cellular Deficiencies in the Respective Proteins.
[0103] ERCC6 deficient cells have been established from patients with Cockayne's syndrome and are being maintained in culture. These cells are sensitive to UV treatment, consistent with the known role of ERCC6 in DNA repair. This phenotype can be rescued by expression of the wild type protein, providing a convenient assay system for protein function. As an initial test of ERCC6 function, the wild type and variant protein from patient 120713 will be expressed in parallel in the cognate deficient cells, and these proteins will be assayed to determine whether the variant fails to restore resistance to UV irradiation. The ability of ERCC6 to complement UV sensitivity likely integrates several biochemical activities of the protein and provides a good screen for functionally important defects. To further define the molecular defect, the wild type and variant proteins will be expressed in mammalian cells, and nuclear extracts will be prepared from these cells. These extracts will then be incubated with chromatin prepared from untreated or UV-irradiated cells. The UV-induced chromatin binding of the proteins will be compared. The protein will also be expressed in bacteria with an epitope tag, and the purified protein will be assayed for ATPase activity on DNA templates. Additional experiments may be suggested by these assays. These biochemical assays might also reveal a defect that failed to be detected during overexpression of the protein in the assays of UV sensitivity.
[0104] WRN-deficient cells have been established from patients with Werner's syndrome and are being maintained in culture. However, the most straightforward test of WRN function is to test its helicase activity, the activity central to WRN function in DNA repair. This activity is most readily tested by purifying the protein from bacterial extracts and incubating it with short double-stranded oligonucleotides with single-stranded 5' ends. WRN will unwind these templates, an activity readily detected by a shift in mobility on non-denaturing gel electrophoresis. The activities of wild type and variant WRN protein will be tested in this assay.
[0105] Most primary cells are TERT-deficient and can be infected with the retroviral vector. The wild type and variant TERT protein will be expressed in parallel, and telomerase activity will be evaluated in vitro using a standard assay.
[0106] FAAP100 acts as a scaffold upon which BRCA1 and other DNA repair proteins concentrate at lesions, to activate Chk1 and degrade Ccdc25A, among other functions. We will compare the ability of wild type and variant FAAP100 proteins to perform these actions.
(c) Test Whether the Variant Proteins Exert Dominant Negative Effects.
[0107] Defective proteins that occupy limited sites where the protein must normally act can exert dominant negative effects. It is believed that in some cases, expression of a defective protein disrupts function of the remaining wild type protein. Such sites may be homo- or hetero-multimeric complexes involving the protein. There is some evidence that ERCC6 multimerizes. This is also true for WRN. TERT must function as a complex with a small RNA that templates synthesis of telomeric DNA. In addition, TERT interacts with a small set of proteins that protect telomeres from recognition by the DNA damage pathway. As a scaffolding protein, FAAP100 may sequester other proteins involved in DNA damage responses, including DNA repair and cell cycle arrest.
[0108] These experiments will test whether expression of the ERCC6 variant protein confers sensitivity to UV irradiation. The variant will be titrated in co-transfections with limiting amounts of vector that rescues UV sensitivity of Cockayne syndrome cells, and the extent to which expression of the variant restores sensitivity or is inert will be assessed.
[0109] As well, whether the WRN variant confers sensitivity to the topoisomerase I poison camptothecin will be investigated. WRN syndrome cells do not show increased sensitivity to UV, but demonstrate distinctly increased apoptosis during S phase following exposure to this drug. The detailed mechanism is unknown, but the drug is known to trap topo I on DNA and to involve inhibition of transcription during S phase. It is thought to potentially reflect an inability of the WRN helicase to resolve and repair collisions between RNA polymerase complexes and/or DNA polymerase complexes and protein-modified DNA, with resulting double strand DNA breaks. Camptothecin does of 20-50 nM cause S phase delay and a 5-6-fold increase in apoptosis of Werner cells.
[0110] It is believed that the ERCC6 N180Y variant will disrupt protein function, given the constellation of clinical findings in patient 120713 consistent with ERCC6 dysfunction, the evidence that the variant residue is likely damaging, and the critical roles played by amino-terminal region the protein. The variant is anticipated to help unravel the function of the central motif in this region, the coiled-coil domain motif.
[0111] For example, follow-up studies will compare intramolecular and extrinsic protein-protein interactions mediated by this domain and disrupted by the variant (e.g., with the carboxy-terminal protein and transcription complex, by `pull-down` assays, etc.) and will test whether the variant exhibits the marked conformational change thought to occur with lesion-induced activation of ATPase activity. Most extant ERCC6 mutations in Cockayne's syndrome and engineered mutations compromise the ATPase activity of ERCC6.
[0112] Whether the variant may be haploinsufficient or dominant negative is more difficult to predict. It is evident that patient 120713 did not have full-blown Cockayne syndrome, so the variant does not entirely inactivate ERCC6 function. Cockayne syndrome carriers are heterozygous for ERCC6 mutations. There is some evidence for phenotypes in their cells, such as modest UV sensitivity, but little clinical data addressing relevant diseases. If the variant ablates inhibition of ATPase activity of the protein, it may bind more indiscriminately and remodel chromatin structure in deleterious ways. It may, thereby, potentially alter transcription and/or divert repair factors, exerting dominant negative effects not seen with standard Cockayne syndrome mutations that inactivate ATPase activity. This molecular mechanism provides a possible alternative explanation for potential dominant negative effects of the variant without compromise of an ERCC6 homopolymeric complex.
[0113] It is believed that the WRN variant in patient 120713 will also inactivate protein function, and is predicted to be probably damaging. This variant may therefore compromise DNA repair is a second way in patient 120713, with additive or synergistic effects. Neoplasia is present in both maternal and paternal lineages of the patient, suggesting that there may be independent gene variants that predispose to neoplasia in the pedigree. However, if cell lines may be established, they will be tested for whether they exhibit major ongoing genetic instability and whether complementation with wild type ERCC6, WRN, or both are needed to restore genome stability.
[0114] (2) Further Examining Cells for Evidence of CGI.
[0115] The presence of 4 chromosomal gains in 50 metaphase spreads (8%), from patient 120713 is unlikely to represent a chance occurrence in normal cells. This rate of gains greatly exceeds the published rate of gains seen in normal stimulated lymphocytes (mode 0.4%) and the rate observed in the rest of the case and control samples in this study (0.7%). Gains are considered more reliable than losses, as the latter are sometimes artifacts of chromosome spreading. However, gains in well-separated spreads such as these are typically not technical artifacts. The spreads were generated by an in-house Genomics Facility, which has extensive experience with this method and performs it routinely for clinical analysis. Nonetheless, these assays will be repeated on cell lines established from patient 120713 and controls, to further validate the CGI and more accurately determine its level.
[0116] Patient 118294 exhibited a complex chromosomal rearrangement. This event cannot be artifactual, as it must be formed within the cell and is a rare event in normal cells. However, it is desired to gauge more accurately the rate of such events in cells from this patient. This patient also demonstrated the highest S phase fraction of any sample tested. The difference (14% above the mean S phase fraction in control samples) is well beyond the normal technical variation in S phase fraction in such samples (ca. 2-3%).
[0117] Generation of metaphase spreads and flow cytometry cell cycle profiles is useful for screening patients for CGI. However, the nature and severity of CGI in such cells have not been fully defined. Most GI is associated with double strand DNA breaks. Low levels of such lesions are difficult to detect directly. Nonetheless, their presence can often be detected indirectly by detecting activation of the DNA damage response (DDR). This response involves the concentration of repair proteins around the lesions, forming what is termed DNA damage foci. These foci are commonly visualized by immunofluorescence. Markers of DDR will be tested to identify this response in patients 120713 and 118294, by immunofluorescence (IF; most sensitive), immunohistochemistry (IHC; readily performed in most clinical pathology labs), and immunoblotting (IB; most specific for histone variant γH2AX).
(a) Repeating Metaphase Spread and Flow Cytometry Assays on Excess Primary Cells.
[0118] These experiments will verify and better quantitate the rate of generation of chromosomal and cell cycle abnormalities in patients 118294 relative to controls. Cultured cells will be stimulated with PHA. Some will then be treated with the mitotic spindle poison colchicine, permeabilized, dropped onto slides to generate spreads, and stained with Giemsa, to stain chromosomal bands and allow identification of individual chromosomes. At least 50 well-separated chromosome spreads per patient will be scored for aneuploidy and chromosomal rearrangements in triplicate. A portion of each PHA-stimulated culture (at least 100,00 cells) will be fixed in ethanol, stained with propidium iodide, and analyzed by flow cytometry, for DNA content in triplicate. The fraction of cells with S and G2/M phases, respectively, will be compared.
(b) Establishing Immortalized Lymphocytes from the Patients and CGI Assays.
[0119] A retroviral TERT vector has been transfected into a packaging cell line, and high titer viral supernatants have been generated. These will be used to infect control cells, to verify the method, and then samples from 120713 and 118294 will be used. T lymphocyte growth will be fostered by addition of IL-2. These polyclonal cultures will be expanded and aliquots frozen in DMSO. Other portions will be used to repeat the metaphase spread and flow cytometry analyses. Finally, a portion of each PHA-stimulate primary cell culture will be infected with retrovirus expressing SV40 large T antigen. These polyclonal cultures will be expanded and frozen in DMSO. In addition, we are preparing Epstein Barr Virus-transformed B lymphocyte cell lines form patient 120713 and controls.
(c) Performing Assays for DNA Damage Markers.
[0120] Primary cells, if available, or immortalized cells will be pelleted by low-speed centrifugation, embedded in histogel, fixed in paraformaldehyde (PFA) or formaldehyde, respectively, and sectioned as per a tissue block. The PFA-fixed material will be subjected to IF for DDR markers. The formalin-fixed material will be subjected to immunohistochemistry for DDR markers. Protein extracts will be prepared from other cells and subjected to immunoblotting for γH2AX. DNA will be damaged in samples of normal cells, as positive controls, using UV- and X-irradiation and treatment with camptothecin.
[0121] Given that there is a TERT gene variant in patient 118294, and defective telomerase activity has been linked to ds DNA breaks and genomic instability as well as intestinal tumorigenesis, telomere integrity will be evaluated in this patient. Telomere length will be estimated by in situ hybridization using a probe complementary to the TERT repeat and high-resolution fluorescence microscopy. Telomere-associated DNA damage foci will be assayed in cells fixed with paraformaldehyde by co-immunofluorescence for the telomere protein TRAP1 or TRF1 and DNA damage response markers γH2AX and 53BP1.
(d) Testing Whether Patient Cells are Hypersensitive to Exogenous DNA Damage.
[0122] Cockayne syndrome patients and their cells are hypersensitive to UV-irradiation. Patient 120713 has a personal and family history of basal cell carcinoma, a UV-associated tumor, and a history of macular degeneration, thought to be in part a UV-driven disease. Exogenous damage may elicit a sensitivity that is less apparent in un-treated cells. Cells will be exposed to 4 J/m2 joules of UV-irradiation from a UV lamp and examined for DDR foci. Cells will also be assayed for their long-term proliferative capacity by the colony-outgrowth assay. Similar assays will be performed following X-irradiation and treatment with cisplatin, respectively, as controls for more general defects in cells from patient 120713 and to detect other potential defects in DNA repair and/or the DDR in patient 118294.
(e) Testing Whether Cell Phenotypes can be Rescued by Exogenous Expression of Candidate Genes.
[0123] Whether observed patient cell phenotypes of GI, UV sensitivity, camptothecin sensitivity, and telomeric DNA damage foci can be rescued by overexpression of the respective wild types proteins will be tested. It is believed that the repeat assays of CGI will confirm it in the patients and help determine its severity. The results will also clarify whether the CGI differs qualitatively in the two patients. For example, it will be determined whether or not the CGI in patient 120713 primarily causes aneuploidy, without chromosomal rearrangement and whether or not the reverse is true to patient 118194. Although ERCC6 has primarily been implicated in nucleotide excision repair of bulky lesions, which do not necessarily form double strand DNA breaks, bulky lesions or their partially repaired intermediates are thought to often be converted to ds breaks when encountered by replication forks. In addition, ERCC6 has been implicated to lesser degrees in other forms of DNA repair, including homologous recombination, a favored route for repair of ds breaks. It is believed that cells from patient 120713 will be hypersensitive to UV-irradiation. In this case, whether this phenotype can be rescued by overexpression of ERCC6 wild-type more effectively than the variant allele will be investigated. If the WNR allele from this patient also appears to be defective, whether exogenous WRN expression can reduce sensitivity will be investigated.
[0124] (3) Screen 30 Additional FCC Patients for CGI and Sequence Variants in Related Genes.
[0125] These proposed studies will triple the previous patient set and allow for the setting of initial bounds on the frequency of CGI in FCC patients. In addition, candidate genes responsible for the observed CGI have been identified. At this point, each represents a sample size of one. Examination of additional patients will provide for a determination of whether the responsible gene set is small or large. If the current experience can be extrapolated to the additional 30 patients, it is anticipated that more patients with CGI will be identified. These data can be used subsequently to design larger clinical studies to more accurately assess the frequency of involved genes and to assess the practicality of determining the underlying lesions by targeted sequencing of candidate genes, rather than exome sequencing.
Example 3
FAAP100 S466L
[0126] An additional candidate disease-causing variant in patient 120713 was identified. To systematically analyze the list of gene variants derived from the exome sequencing results, Gene Ontology (GO) consortium databases were used to focus on variant genes associated with the terms DNA replication, DNA repair, checkpoint, mitosis, or mitotic. Thirty four variants in patient 118294 and 19 variants in patient 120713 were associated. Variants were identified that represented >40% of the sequencing reads (and were, therefore, likely to be at least heterozygous), absent from NHLBI SNP databases or present at frequencies <1/1000 (thereby reducing type 1 errors), and predicted by the PolyPhen2 program (Sunyaev, Harvard University) to be probably damaging to protein function. A few were excluded that appeared to not be directed related to CGI, on the basis of being expressed primarily outside the nucleus and/or in a severely restricted tissue pattern. From this analysis, patient 118294 did not yield a strong candidate variant. However, 3 good candidate missense variants were found in patient 120713. In addition to the previously recognized variants ERCC6/CSB N180Y and WRN T705I, C170Orf70/FAAP100 S466L was identified as a strong candidate disease-causing variant.
[0127] FAAP100 is an understudied but essential component of the Fanconi's anemia (FA) DNA repair complex. FA is a rare recessive syndrome associated with bone marrow failure, genetic instability, and cancer. It involves a failure to prevent DNA double strand (ds) breaks during DNA replication. FA cells fail to mono-ubiquitinate FANCD2, the central outcome of the pathway, and are very sensitive to DNA cross-linking agents such as mitomycin C. It has recently been established that FANCD is the breast and ovarian cancer tumor suppressor BRCA2, and the complex interacts with BRCA1. FAAP100 acts as a scaffolding protein for the ubiquitin ligase FANCL, but has few defined motifs, and its functional elements have not been mapped. This gene is a potential link to the history of two paternal cousins with early onset breast and ovarian cancers, respectively. If the heterozygous variant compromises the FA pathway, this variant could account for or help account for the patient's apparent defective DNA repair (see next advance), genetic instability, and predisposition to colon cancer.
[0128] The FAAP100 variant represents a C to T change (G to A on the opposite strand) at nucleotide 1443 of accession number BC117141 (SEQ ID NO:22). This nucleotide is at position 77124711 on human chromosome 17. The change results in substitution of leucine for serine at amino acid 466 of the protein (SEQ ID NO:23). This substitution is predicted by the PolyPhen2 program to be probably damaging to protein function with high confidence (0.98 score out of 1.00).
Example 4
Increase in Double Stranded Breaks and Gamma-H2AX Foci
[0129] It was determined that patient 120713 exhibited an exaggerated response to DNA damage, likely reflecting increased double stranded (ds) DNA breaks. Ds breaks are thought to be a major cause of instability of chromosome structure. The ds break also serves as a nidus for detection of DNA damage responses (DDRs) to a variety of damage, including bulky DNA adducts, intra- and inter-strand cross-links, and collapse of replication forks. Recent data suggest that many ds breaks are formed by replicative events, such as reverse branch migration of Holiday junctions when movement of the DNA replication fork is impaired. Thus, many repair events can result in a ds break. At such breaks, the alternate histone H2AX undergoes extensive phosphorylation, forming `γH2AX` foci visible by immunofluorescence (IF). Other DDR proteins such as phosphorylated ATM/ATR and 53BP1 are recruited into such foci. During work for the project, an in-house Cell Culture Facility worked out conditions under which IL-2, anti-T-cell receptor, and anti-CD3 antibodies stimulate robust growth of primary T-lymphocytes from peripheral blood lymphocytes. In preliminary studies, lymphocytes were treated with ultraviolet light (UV) or the DNA polymerase inhibitor aphidicolin. Aphidicolin is commonly used to reveal DNA repair defects. It generates replicative stress, with collapse of stalled replication forks and generation of ds breaks. The cells were then allowed to adhere to poly-lysine-coated slides, fixed with paraformaldehyde, and stained for γH2AX. Flow cytometry confirmed equivalent fractions of replicating cells in patient 120713 and the control. It was observed that cells from patient 120713 showed substantially greater γH2AX foci in response to treatment with UV or aphidicolin when compared to its age- and sex-matched normal control (FIG. 6; each P<0.001, by Fisher's exact test).
[0130] Additional data showed further evidence of a greater DNA damage response, marked by gamma-H2AX foci scored in a blinded fashion, from patient 120713 (FIG. 7). The data show ongoing DNA damage response at the baseline in the patient's lymphocytes (No Rx), as well as in response to treatment with aphidicolin (aph), camptothecin (Campto), and etoposide (Etop). The graph shows that the levels of gamma-H2AX foci are higher in patient 120713 (dark grey) relative to a control subject (light grey).
[0131] These findings provide further evidence for a DNA repair defect in patient 120713. Moreover, they offer the prospect that assaying the DDR in normal lymphocytes from at-risk individuals may help identify those with a predisposition to colon cancer. This assay might take the form of immunofluorescence staining for γH2AX, as shown here, or immunohistochemistry, immunoblotting, enzyme-linked immunosorbant assays (ELISAs), or flow cytometry.
[0132] The invention is not limited to the embodiments described and exemplified above, but is capable of variation and modification within the scope of the appended claims.
Sequence CWU
1
1
2414479DNAHomo sapiens 1atgccaaatg agggaatccc ccactcaagt caaactcagg
agcaagactg tttacagagt 60caacctgtca gtaataatga agaaatggca atcaagcaag
aaagtggtgg tgatggggag 120gtggaggagt acctctcctt tcgttctgtg ggtgacgggc
tgtccacctc tgctgtgggg 180tgcgcatcag cagctccgag gagagggcca gccctgctgc
acatcgaccg acatcagatc 240caggcagtag agcctagcgc ccaggccctt gagctgcagg
gtttgggtgt ggacgtctat 300gaccaggacg tgctggaaca gggagtgctt cagcaggtgg
acaatgccat ccatgaggcc 360agccgtgcct cccagctcgt tgacgtggag aaggagtatc
ggtcggtcct ggatgacctc 420acgtcatgta cgacatccct aaggcaaatc aataaaatta
ttgaacagct tagccctcaa 480gctgccacca gcagagacat caacaggaaa ctagattctg
taaaacgaca gaagtattat 540aaggaacaac agctaaaaaa gatcactgca aaacaaaagc
atctccaggc catccttgga 600ggagcagagg tgaaaattga actagatcac gccagtctgg
aggaggatgc agagccgggg 660ccatccagtc ttggcagcat gctcatgcct gtccaggaga
ctgcctggga agagctcatc 720cgcactggcc agatgacacc ttttggtacc cagatccctc
agaaacagga gaaaaagccc 780agaaaaatca tgcttaatga agcatcaggc ttcgaaaagt
atttggcaga tcaagcaaaa 840ctgtcttttg aaaggaagaa gcaaggttgt aataaaagag
cagctagaaa agctccagcc 900ccagtcacgc ctccagcccc agtgcaaaat aaaaacaaac
caaacaagaa agccagagtt 960ctgtccaaaa aagaggagcg tttgaaaaag cacatcaaga
aactccagaa gagggctttg 1020cagttccagg ggaaagtggg attgccaaag gcaaggagac
cttgggagtc agacatgagg 1080ccagaggcag agggagactc tgagggtgaa gagtctgagt
atttccccac agaggaggag 1140gaagaggagg aagatgacga ggtggagggg gcagaggcgg
acctgtctgg agatggtact 1200gactatgagc tgaagcctct gcccaagggc gggaaacggc
agaagaaagt gccagtgcag 1260gagattgatg atgacttttt cccaagttct ggggaagaag
ctgaagctgc ttctgtagga 1320gaaggaggag gaggaggtcg gaaagtggga agataccgag
atgatggaga tgaagattat 1380tataagcagc ggttaaggag atggaataaa ctgagactgc
aggacaaaga gaaacgtctg 1440aagctggagg acgattctga ggaaagtgat gctgaatttg
acgaaggttt taaagtgcca 1500ggttttctgt tcaaaaagct ttttaagtac cagcagacag
gtgttaggtg gctgtgggaa 1560ttgcactgcc agcaggcagg aggaattctg ggagatgaaa
tgggattggg caagaccatc 1620cagataattg ccttcttggc aggtctgagc tacagcaaga
tcaggactcg tggttcaaat 1680tacaggtttg aggggttggg tccaactgta attgtctgtc
caacaacagt gatgcatcag 1740tgggtgaagg aatttcacac gtggtggcct ccgttcagag
tggcaattct acatgaaacc 1800ggttcctata cccacaaaaa ggagaaacta attcgagatg
ttgctcattg tcatggaatt 1860ttgatcacat cttactccta cattcgattg atgcaggatg
acattagcag gtatgactgg 1920cactatgtga tcttggacga aggacacaaa attcgaaatc
caaatgctgc tgtcaccctt 1980gcttgcaaac agtttcgcac ccctcatcgg atcattctgt
ctggctcacc gatgcaaaat 2040aacctccgag agctgtggtc gctctttgac ttcatcttcc
cgggaaagtt aggcacgttg 2100cctgtgttta tggagcagtt ctccgtcccc atcaccatgg
ggggatattc aaatgcttcc 2160ccagtacagg tcaaaactgc ttacaagtgt gcatgtgtct
tacgagatac cataaatcca 2220tacctactgc ggagaatgaa gtcagatgtc aagatgagcc
tttctttgcc agataaaaat 2280gaacaggtct tattttgccg tcttacagat gagcagcata
aagtctacca aaatttcgtt 2340gattccaaag aagtttacag gattctcaat ggagagatgc
agattttctc cggacttata 2400gccctaagaa aaatttgcaa ccaccctgat ctcttttctg
gaggtcccaa gaatctcaaa 2460ggtcttcctg atgatgaact agaagaagat cagtttgggt
actggaaacg ttctgggaaa 2520atgattgttg ttgagtcttt gttgaaaata tggcacaagc
agggtcagcg agtattgctg 2580ttttctcagt caaggcagat gctggacata cttgaagtat
tccttagagc ccaaaagtat 2640acctatctca agatggatgg taccactaca atagcttcaa
gacagccact gattacgaga 2700tacaatgagg acacatccat atttgtgttt cttctgacca
cgcgggtggg cggcttaggt 2760gtcaacctga cgggggcaaa cagagttgtc atctatgacc
cagactggaa cccaagcacg 2820gacacgcagg cccgggagcg agcatggaga ataggccaga
agaagcaagt gactgtgtac 2880aggctcctga ctgcgggcac cattgaagaa aagatctacc
accgacaaat cttcaagcag 2940tttttgacaa atagagtgct aaaagaccca aaacaaaggc
ggtttttcaa atccaatgat 3000ctctatgagc tatttactct gactagtcct gatgcatccc
agagcactga aacaagtgca 3060atttttgcag gaactggatc agatgttcag acacccaaat
gccatctaaa aagaaggatt 3120caaccagcct ttggagcaga ccatgatgtt ccaaaacgca
agaagttccc tgcttctaac 3180atatctgtaa atgatgccac atcatctgaa gagaaatctg
aggctaaagg agctgaagta 3240aatgcagtaa cttctaatcg aagtgatcct ttgaaagatg
accctcacat gagtagtaat 3300gtaactagca atgataggct tggagaagag acaaatgcag
tatctggacc agaagagttg 3360tcagtgatta gtggaaatgg ggaatgttca aattcttcag
gaacaggcaa aacttctatg 3420ccatctggtg atgaaagcat tgatgaaaag ttaggtcttt
cttacaaaag agaaagaccc 3480agccaggctc aaacagaagc tttttgggag aataaacaaa
tggaaaataa tttttataag 3540cacaagtcaa aaacaaaaca tcatagtgtg gcagaagaag
agaccctgga gaaacatctg 3600agaccaaagc aaaagcctaa gaactctaag cattgcagag
acgccaagtt tgaaggaact 3660cgaattccac acctggtgaa gaaaaggcgt taccagaagc
aagacagtga aaacaagagt 3720gaggccaagg aacagagcaa tgacgattat gttttggaaa
agcttttcaa aaaatcagtt 3780ggcgtgcaca gtgtcatgaa gcacgatgcc atcatggatg
gagccagccc agattatgta 3840ctggtggagg cagaagccaa ccgagtggcc caggatgccc
tgaaagcact gaggctctct 3900cgtcagcggt gtctgggagc agtgtctggt gttcccacct
ggactggcca cagggggatt 3960tctggtgcac cagcaggaaa aaagagtaga tttggtaaga
aaaggaattc taacttctct 4020gtgcagcatc cttcatcaac atctccaaca gagaagtgcc
aggatggcat catgaaaaag 4080gagggaaaag ataatgtccc tgagcatttt agtggaagag
cagaagatgc agactcttca 4140tccgggcccc tcgcttcctc ctcactcttg gctaaaatga
gagctagaaa ccacctgatt 4200ctgccagagc gtttagaaag tgaaagcggg cacctgcagg
aagcttctgc cctgctgccc 4260accacagaac acgatgacct tctggtggag atgagaaact
tcatcgcttt ccaggcccac 4320actgatggcc aggccagcac cagggagata ctgcaggagt
ttgaatccaa gttatctgca 4380tcacagtctt gtgtcttccg agaactattg agaaatctgt
gcactttcca tagaacttct 4440ggtggtgaag gaatttggaa actcaagcca gaatactgc
447924479DNAHomo sapiens 2atgccaaatg agggaatccc
ccactcaagt caaactcagg agcaagactg tttacagagt 60caacctgtca gtaataatga
agaaatggca atcaagcaag aaagtggtgg tgatggggag 120gtggaggagt acctctcctt
tcgttctgtg ggtgacgggc tgtccacctc tgctgtgggg 180tgcgcatcag cagctccgag
gagagggcca gccctgctgc acatcgaccg acatcagatc 240caggcagtag agcctagcgc
ccaggccctt gagctgcagg gtttgggtgt ggacgtctat 300gaccaggacg tgctggaaca
gggagtgctt cagcaggtgg acaatgccat ccatgaggcc 360agccgtgcct cccagctcgt
tgacgtggag aaggagtatc ggtcggtcct ggatgacctc 420acgtcatgta cgacatccct
aaggcaaatc aataaaatta ttgaacagct tagccctcaa 480gctgccacca gcagagacat
caacaggaaa ctagattctg taaaacgaca gaagtataat 540aaggaacaac agctaaaaaa
gatcactgca aaacaaaagc atctccaggc catccttgga 600ggagcagagg tgaaaattga
actagatcac gccagtctgg aggaggatgc agagccgggg 660ccatccagtc ttggcagcat
gctcatgcct gtccaggaga ctgcctggga agagctcatc 720cgcactggcc agatgacacc
ttttggtacc cagatccctc agaaacagga gaaaaagccc 780agaaaaatca tgcttaatga
agcatcaggc ttcgaaaagt atttggcaga tcaagcaaaa 840ctgtcttttg aaaggaagaa
gcaaggttgt aataaaagag cagctagaaa agctccagcc 900ccagtcacgc ctccagcccc
agtgcaaaat aaaaacaaac caaacaagaa agccagagtt 960ctgtccaaaa aagaggagcg
tttgaaaaag cacatcaaga aactccagaa gagggctttg 1020cagttccagg ggaaagtggg
attgccaaag gcaaggagac cttgggagtc agacatgagg 1080ccagaggcag agggagactc
tgagggtgaa gagtctgagt atttccccac agaggaggag 1140gaagaggagg aagatgacga
ggtggagggg gcagaggcgg acctgtctgg agatggtact 1200gactatgagc tgaagcctct
gcccaagggc gggaaacggc agaagaaagt gccagtgcag 1260gagattgatg atgacttttt
cccaagttct ggggaagaag ctgaagctgc ttctgtagga 1320gaaggaggag gaggaggtcg
gaaagtggga agataccgag atgatggaga tgaagattat 1380tataagcagc ggttaaggag
atggaataaa ctgagactgc aggacaaaga gaaacgtctg 1440aagctggagg acgattctga
ggaaagtgat gctgaatttg acgaaggttt taaagtgcca 1500ggttttctgt tcaaaaagct
ttttaagtac cagcagacag gtgttaggtg gctgtgggaa 1560ttgcactgcc agcaggcagg
aggaattctg ggagatgaaa tgggattggg caagaccatc 1620cagataattg ccttcttggc
aggtctgagc tacagcaaga tcaggactcg tggttcaaat 1680tacaggtttg aggggttggg
tccaactgta attgtctgtc caacaacagt gatgcatcag 1740tgggtgaagg aatttcacac
gtggtggcct ccgttcagag tggcaattct acatgaaacc 1800ggttcctata cccacaaaaa
ggagaaacta attcgagatg ttgctcattg tcatggaatt 1860ttgatcacat cttactccta
cattcgattg atgcaggatg acattagcag gtatgactgg 1920cactatgtga tcttggacga
aggacacaaa attcgaaatc caaatgctgc tgtcaccctt 1980gcttgcaaac agtttcgcac
ccctcatcgg atcattctgt ctggctcacc gatgcaaaat 2040aacctccgag agctgtggtc
gctctttgac ttcatcttcc cgggaaagtt aggcacgttg 2100cctgtgttta tggagcagtt
ctccgtcccc atcaccatgg ggggatattc aaatgcttcc 2160ccagtacagg tcaaaactgc
ttacaagtgt gcatgtgtct tacgagatac cataaatcca 2220tacctactgc ggagaatgaa
gtcagatgtc aagatgagcc tttctttgcc agataaaaat 2280gaacaggtct tattttgccg
tcttacagat gagcagcata aagtctacca aaatttcgtt 2340gattccaaag aagtttacag
gattctcaat ggagagatgc agattttctc cggacttata 2400gccctaagaa aaatttgcaa
ccaccctgat ctcttttctg gaggtcccaa gaatctcaaa 2460ggtcttcctg atgatgaact
agaagaagat cagtttgggt actggaaacg ttctgggaaa 2520atgattgttg ttgagtcttt
gttgaaaata tggcacaagc agggtcagcg agtattgctg 2580ttttctcagt caaggcagat
gctggacata cttgaagtat tccttagagc ccaaaagtat 2640acctatctca agatggatgg
taccactaca atagcttcaa gacagccact gattacgaga 2700tacaatgagg acacatccat
atttgtgttt cttctgacca cgcgggtggg cggcttaggt 2760gtcaacctga cgggggcaaa
cagagttgtc atctatgacc cagactggaa cccaagcacg 2820gacacgcagg cccgggagcg
agcatggaga ataggccaga agaagcaagt gactgtgtac 2880aggctcctga ctgcgggcac
cattgaagaa aagatctacc accgacaaat cttcaagcag 2940tttttgacaa atagagtgct
aaaagaccca aaacaaaggc ggtttttcaa atccaatgat 3000ctctatgagc tatttactct
gactagtcct gatgcatccc agagcactga aacaagtgca 3060atttttgcag gaactggatc
agatgttcag acacccaaat gccatctaaa aagaaggatt 3120caaccagcct ttggagcaga
ccatgatgtt ccaaaacgca agaagttccc tgcttctaac 3180atatctgtaa atgatgccac
atcatctgaa gagaaatctg aggctaaagg agctgaagta 3240aatgcagtaa cttctaatcg
aagtgatcct ttgaaagatg accctcacat gagtagtaat 3300gtaactagca atgataggct
tggagaagag acaaatgcag tatctggacc agaagagttg 3360tcagtgatta gtggaaatgg
ggaatgttca aattcttcag gaacaggcaa aacttctatg 3420ccatctggtg atgaaagcat
tgatgaaaag ttaggtcttt cttacaaaag agaaagaccc 3480agccaggctc aaacagaagc
tttttgggag aataaacaaa tggaaaataa tttttataag 3540cacaagtcaa aaacaaaaca
tcatagtgtg gcagaagaag agaccctgga gaaacatctg 3600agaccaaagc aaaagcctaa
gaactctaag cattgcagag acgccaagtt tgaaggaact 3660cgaattccac acctggtgaa
gaaaaggcgt taccagaagc aagacagtga aaacaagagt 3720gaggccaagg aacagagcaa
tgacgattat gttttggaaa agcttttcaa aaaatcagtt 3780ggcgtgcaca gtgtcatgaa
gcacgatgcc atcatggatg gagccagccc agattatgta 3840ctggtggagg cagaagccaa
ccgagtggcc caggatgccc tgaaagcact gaggctctct 3900cgtcagcggt gtctgggagc
agtgtctggt gttcccacct ggactggcca cagggggatt 3960tctggtgcac cagcaggaaa
aaagagtaga tttggtaaga aaaggaattc taacttctct 4020gtgcagcatc cttcatcaac
atctccaaca gagaagtgcc aggatggcat catgaaaaag 4080gagggaaaag ataatgtccc
tgagcatttt agtggaagag cagaagatgc agactcttca 4140tccgggcccc tcgcttcctc
ctcactcttg gctaaaatga gagctagaaa ccacctgatt 4200ctgccagagc gtttagaaag
tgaaagcggg cacctgcagg aagcttctgc cctgctgccc 4260accacagaac acgatgacct
tctggtggag atgagaaact tcatcgcttt ccaggcccac 4320actgatggcc aggccagcac
cagggagata ctgcaggagt ttgaatccaa gttatctgca 4380tcacagtctt gtgtcttccg
agaactattg agaaatctgt gcactttcca tagaacttct 4440ggtggtgaag gaatttggaa
actcaagcca gaatactgc 447937006DNAHomo sapiens
3agcagaagtc ggagtcgctg ttgggggcgg tgtctatggt tgagctgagg gcgcaggcgc
60cacggcccgt cgagctgggt tccaaggcgg ctggcggcgg tagcgtctct gtttccttgt
120gggcgctcgc gcggccctgg gtagtctgta gagaatgcca aatgagggaa tcccccactc
180aagtcaaact caggagcaag actgtttaca gagtcaacct gtcagtaata atgaagaaat
240ggcaatcaag caagaaagtg gtggtgatgg ggaggtggag gagtacctct cctttcgttc
300tgtgggtgac gggctgtcca cctctgctgt ggggtgcgca tcagcagctc cgaggagagg
360gccagccctg ctgcacatcg accgacatca gatccaggca gtagagccta gcgcccaggc
420ccttgagctg cagggtttgg gtgtggacgt ctatgaccag gacgtgctgg aacagggagt
480gcttcagcag gtggacaatg ccatccatga ggccagccgt gcctcccagc tcgttgacgt
540ggagaaggag tatcggtcgg tcctggatga cctcacgtca tgtacgacat ccctaaggca
600aatcaataaa attattgaac agcttagccc tcaagctgcc accagcagag acatcaacag
660gaaactagat tctgtaaaac gacagaagta taataaggaa caacagctaa aaaagatcac
720tgcaaaacaa aagcatctcc aggccatcct tggaggagca gaggtgaaaa ttgaactaga
780tcacgccagt ctggaggagg atgcagagcc ggggccatcc agtcttggca gcatgctcat
840gcctgtccag gagactgcct gggaagagct catccgcact ggccagatga caccttttgg
900tacccagatc cctcagaaac aggagaaaaa gcccagaaaa atcatgctta atgaagcatc
960aggcttcgaa aagtatttgg cagatcaagc aaaactgtct tttgaaagga agaagcaagg
1020ttgtaataaa agagcagcta gaaaagctcc agccccagtc acgcctccag ccccagtgca
1080aaataaaaac aaaccaaaca agaaagccag agttctgtcc aaaaaagagg agcgtttgaa
1140aaagcacatc aagaaactcc agaagagggc tttgcagttc caggggaaag tgggattgcc
1200aaaggcaagg agaccttggg agtcagacat gaggccagag gcagagggag actctgaggg
1260tgaagagtct gagtatttcc ccacagagga ggaggaagag gaggaagatg acgaggtgga
1320gggggcagag gcggacctgt ctggagatgg tactgactat gagctgaagc ctctgcccaa
1380gggcgggaaa cggcagaaga aagtgccagt gcaggagatt gatgatgact ttttcccaag
1440ttctggggaa gaagctgaag ctgcttctgt aggagaagga ggaggaggag gtcggaaagt
1500gggaagatac cgagatgatg gagatgaaga ttattataag cagcggttaa ggagatggaa
1560taaactgaga ctgcaggaca aagagaaacg tctgaagctg gaggacgatt ctgaggaaag
1620tgatgctgaa tttgacgaag gttttaaagt gccaggtttt ctgttcaaaa agctttttaa
1680gtaccagcag acaggtgtta ggtggctgtg ggaattgcac tgccagcagg caggaggaat
1740tctgggagat gaaatgggat tgggcaagac catccagata attgccttct tggcaggtct
1800gagctacagc aagatcagga ctcgtggttc aaattacagg tttgaggggt tgggtccaac
1860tgtaattgtc tgtccaacaa cagtgatgca tcagtgggtg aaggaatttc acacgtggtg
1920gcctccgttc agagtggcaa ttctacatga aaccggttcc tatacccaca aaaaggagaa
1980actaattcga gatgttgctc attgtcatgg aattttgatc acatcttact cctacattcg
2040attgatgcag gatgacatta gcaggtatga ctggcactat gtgatcttgg acgaaggaca
2100caaaattcga aatccaaatg ctgctgtcac ccttgcttgc aaacagtttc gcacccctca
2160tcggatcatt ctgtctggct caccgatgca aaataacctc cgagagctgt ggtcgctctt
2220tgacttcatc ttcccgggaa agttaggcac gttgcctgtg tttatggagc agttctccgt
2280ccccatcacc atggggggat attcaaatgc ttccccagta caggtcaaaa ctgcttacaa
2340gtgtgcatgt gtcttacgag ataccataaa tccataccta ctgcggagaa tgaagtcaga
2400tgtcaagatg agcctttctt tgccagataa aaatgaacag gtcttatttt gccgtcttac
2460agatgagcag cataaagtct accaaaattt cgttgattcc aaagaagttt acaggattct
2520caatggagag atgcagattt tctccggact tatagcccta agaaaaattt gcaaccaccc
2580tgatctcttt tctggaggtc ccaagaatct caaaggtctt cctgatgatg aactagaaga
2640agatcagttt gggtactgga aacgttctgg gaaaatgatt gttgttgagt ctttgttgaa
2700aatatggcac aagcagggtc agcgagtatt gctgttttct cagtcaaggc agatgctgga
2760catacttgaa gtattcctta gagcccaaaa gtatacctat ctcaagatgg atggtaccac
2820tacaatagct tcaagacagc cactgattac gagatacaat gaggacacat ccatatttgt
2880gtttcttctg accacgcggg tgggcggctt aggtgtcaac ctgacggggg caaacagagt
2940tgtcatctat gacccagact ggaacccaag cacggacacg caggcccggg agcgagcatg
3000gagaataggc cagaagaagc aagtgactgt gtacaggctc ctgactgcgg gcaccattga
3060agaaaagatc taccaccgac aaatcttcaa gcagtttttg acaaatagag tgctaaaaga
3120cccaaaacaa aggcggtttt tcaaatccaa tgatctctat gagctattta ctctgactag
3180tcctgatgca tcccagagca ctgaaacaag tgcaattttt gcaggaactg gatcagatgt
3240tcagacaccc aaatgccatc taaaaagaag gattcaacca gcctttggag cagaccatga
3300tgttccaaaa cgcaagaagt tccctgcttc taacatatct gtaaatgatg ccacatcatc
3360tgaagagaaa tctgaggcta aaggagctga agtaaatgca gtaacttcta atcgaagtga
3420tcctttgaaa gatgaccctc acatgagtag taatgtaact agcaatgata ggcttggaga
3480agagacaaat gcagtatctg gaccagaaga gttgtcagtg attagtggaa atggggaatg
3540ttcaaattct tcaggaacag gcaaaacttc tatgccatct ggtgatgaaa gcattgatga
3600aaagttaggt ctttcttaca aaagagaaag acccagccag gctcaaacag aagctttttg
3660ggagaataaa caaatggaaa ataattttta taagcacaag tcaaaaacaa aacatcatag
3720tgtggcagaa gaagagaccc tggagaaaca tctgagacca aagcaaaagc ctaagaactc
3780taagcattgc agagacgcca agtttgaagg aactcgaatt ccacacctgg tgaagaaaag
3840gcgttaccag aagcaagaca gtgaaaacaa gagtgaggcc aaggaacaga gcaatgacga
3900ttatgttttg gaaaagcttt tcaaaaaatc agttggcgtg cacagtgtca tgaagcacga
3960tgccatcatg gatggagcca gcccagatta tgtactggtg gaggcagaag ccaaccgagt
4020ggcccaggat gccctgaaag cactgaggct ctctcgtcag cggtgtctgg gagcagtgtc
4080tggtgttccc acctggactg gccacagggg gatttctggt gcaccagcag gaaaaaagag
4140tagatttggt aagaaaagga attctaactt ctctgtgcag catccttcat caacatctcc
4200aacagagaag tgccaggatg gcatcatgaa aaaggaggga aaagataatg tccctgagca
4260ttttagtgga agagcagaag atgcagactc ttcatccggg cccctcgctt cctcctcact
4320cttggctaaa atgagagcta gaaaccacct gattctgcca gagcgtttag aaagtgaaag
4380cgggcacctg caggaagctt ctgccctgct gcccaccaca gaacacgatg accttctggt
4440ggagatgaga aacttcatcg ctttccaggc ccacactgat ggccaggcca gcaccaggga
4500gatactgcag gagtttgaat ccaagttatc tgcatcacag tcttgtgtct tccgagaact
4560attgagaaat ctgtgcactt tccatagaac ttctggtggt gaaggaattt ggaaactcaa
4620gccagaatac tgctaaacaa cattgcttcc taaactttca agtccctttt tctaacgggc
4680atttctgatt attaatttat tattaataat catgtttgtc aatggaagtt ggctgcactt
4740gatgtttgtt tgcatgatgt ctacctcaga attaaaactt taaggaagaa gaaactcttc
4800tctgaaagtt aaaagtttta ataatgctag ctaaaggaga aaatacttgg attgattttt
4860ttttttttgg caatctaatt atattgtaaa tcaggtacct aacagttact ccttggagca
4920catttgttcc tttacccaaa agatgctgtc agggagcaca gttagaagtt tgcagaacag
4980aaatctcaat attttttttt attggtgcta aaaacaggtc ttacattcag tcagacctgt
5040tcaataagtt catcaatatc tgataacagc attattttga tgcttaaact ttaaacattt
5100atatttacca tttgccaccc acaaaggtca ggtttgttat ttgttgtttg ataattatat
5160taattttctt ggaaagatcc tcttttcaag gtactggtaa attggtgagt atttttatta
5220gtaaagcatg aaatagtatg gtaataaatg ataagacatg tatttgtgga aagctgtagg
5280gtattcagtt taccctggct ttcctttaag cagagggcat ctttttctct cctacagtca
5340caaaatgtgt tatcattaaa aaaaatcaaa ttaaagccaa aagtaggtac ataaaaacca
5400cacacatgca tgcacacaaa catcactgca gcccacagca gacccagccg ttgttaccat
5460gaagtgacac cactccaggc ctctcttgtc tgcaggctgg caggctgtct tctctccagt
5520tgccttcgtc ttgcgcctgc ctttgcattc cttgcgacgg gctttcttgt ttctgcggtt
5580tggattccag ccaaggctgt ttgtatctca ctactgttta tgtgtttgtg gttctgtgat
5640ggtgttgctt tgatcctcag tttattttct tacccatgtt tttcttgttt ccttctcagg
5700atgattttat catctcatct ttgaagtgtt gttttccgaa attcatcgta ttcctgaaat
5760ttcttcttag ctgtcttagt gcagtttgtt tcttggattt gtattctctg gcatgctctt
5820ttcctctctc tcatttttct gtagtatgcc tgccctccta ccctgctatt tctttacatc
5880tctctcatgc ttaacatgga tagctgtgtc cagatcttct gtctgctcat ccatgtgact
5940cagagaggag ggttctgggc aggggggcct tgccggactg catgagagga catgagtttt
6000gctttctctg ctctaatatt ttgcttaagc caagaatcct tttcttagag atgttctata
6060tgattcctgt caggattttc tagttttttt tggattatag cttgttcatt tcttttgttt
6120ttagtttggt ttatatataa tgagggaaga agatgattac attatttttg tcactttgcc
6180atcattgttt agaagtcata gaaagaattt ttaaataggc caataagtct taaacttgag
6240tacttggctt agaagaaagt caaaactcct tcctttttga ctaagtggtt tgtttctggg
6300gagctcttaa tttctatttt tataatcatt agcctataag gaaattgtgt cttccttgtt
6360ctcagggtga tctgctgacc ttgttcactc atgaagcatt tgggtatcat acttatagtg
6420tctgaaacat aaactgtatt gagctagaca aggtatagcc tcctcttcaa gtagcaaata
6480ctatcaaaag ctataatgca gtaggagcaa ggtggtcctt gttccagttt ttgtctcagt
6540tctgctgctg atgtaccatg atcttgggaa ggtggtgtct cagtgtggag atctgacaca
6600ttgttaccgt gcctcctggc tggagggact tggagaacaa tgcagttaag tagaatggtt
6660ttaacaatac agagaaattt attcatttag ataaaaatct gatttttaga actttaaaag
6720ctttgtacag tgtaaataga tttaatgtat ttaacatgct ttatcagcac aaataaagga
6780ttttaaaatt ttgtcaaaaa attaaatgtt aatactatca ccattaaaaa tgttcaagca
6840atagtctgcc tccccacccc cacaccatct tgcacctgtt ccacagctaa gtacagccct
6900aggtttggtg tgtattctcc atgcatttag agaatcacat gacacagact gctgctataa
6960tgtcattttc ccattcttcc tttactaata aaatttttga gtttta
700641493PRTHomo sapiens 4Met Pro Asn Glu Gly Ile Pro His Ser Ser Gln Thr
Gln Glu Gln Asp 1 5 10
15 Cys Leu Gln Ser Gln Pro Val Ser Asn Asn Glu Glu Met Ala Ile Lys
20 25 30 Gln Glu Ser
Gly Gly Asp Gly Glu Val Glu Glu Tyr Leu Ser Phe Arg 35
40 45 Ser Val Gly Asp Gly Leu Ser Thr
Ser Ala Val Gly Cys Ala Ser Ala 50 55
60 Ala Pro Arg Arg Gly Pro Ala Leu Leu His Ile Asp Arg
His Gln Ile 65 70 75
80 Gln Ala Val Glu Pro Ser Ala Gln Ala Leu Glu Leu Gln Gly Leu Gly
85 90 95 Val Asp Val Tyr
Asp Gln Asp Val Leu Glu Gln Gly Val Leu Gln Gln 100
105 110 Val Asp Asn Ala Ile His Glu Ala Ser
Arg Ala Ser Gln Leu Val Asp 115 120
125 Val Glu Lys Glu Tyr Arg Ser Val Leu Asp Asp Leu Thr Ser
Cys Thr 130 135 140
Thr Ser Leu Arg Gln Ile Asn Lys Ile Ile Glu Gln Leu Ser Pro Gln 145
150 155 160 Ala Ala Thr Ser Arg
Asp Ile Asn Arg Lys Leu Asp Ser Val Lys Arg 165
170 175 Gln Lys Tyr Tyr Lys Glu Gln Gln Leu Lys
Lys Ile Thr Ala Lys Gln 180 185
190 Lys His Leu Gln Ala Ile Leu Gly Gly Ala Glu Val Lys Ile Glu
Leu 195 200 205 Asp
His Ala Ser Leu Glu Glu Asp Ala Glu Pro Gly Pro Ser Ser Leu 210
215 220 Gly Ser Met Leu Met Pro
Val Gln Glu Thr Ala Trp Glu Glu Leu Ile 225 230
235 240 Arg Thr Gly Gln Met Thr Pro Phe Gly Thr Gln
Ile Pro Gln Lys Gln 245 250
255 Glu Lys Lys Pro Arg Lys Ile Met Leu Asn Glu Ala Ser Gly Phe Glu
260 265 270 Lys Tyr
Leu Ala Asp Gln Ala Lys Leu Ser Phe Glu Arg Lys Lys Gln 275
280 285 Gly Cys Asn Lys Arg Ala Ala
Arg Lys Ala Pro Ala Pro Val Thr Pro 290 295
300 Pro Ala Pro Val Gln Asn Lys Asn Lys Pro Asn Lys
Lys Ala Arg Val 305 310 315
320 Leu Ser Lys Lys Glu Glu Arg Leu Lys Lys His Ile Lys Lys Leu Gln
325 330 335 Lys Arg Ala
Leu Gln Phe Gln Gly Lys Val Gly Leu Pro Lys Ala Arg 340
345 350 Arg Pro Trp Glu Ser Asp Met Arg
Pro Glu Ala Glu Gly Asp Ser Glu 355 360
365 Gly Glu Glu Ser Glu Tyr Phe Pro Thr Glu Glu Glu Glu
Glu Glu Glu 370 375 380
Asp Asp Glu Val Glu Gly Ala Glu Ala Asp Leu Ser Gly Asp Gly Thr 385
390 395 400 Asp Tyr Glu Leu
Lys Pro Leu Pro Lys Gly Gly Lys Arg Gln Lys Lys 405
410 415 Val Pro Val Gln Glu Ile Asp Asp Asp
Phe Phe Pro Ser Ser Gly Glu 420 425
430 Glu Ala Glu Ala Ala Ser Val Gly Glu Gly Gly Gly Gly Gly
Arg Lys 435 440 445
Val Gly Arg Tyr Arg Asp Asp Gly Asp Glu Asp Tyr Tyr Lys Gln Arg 450
455 460 Leu Arg Arg Trp Asn
Lys Leu Arg Leu Gln Asp Lys Glu Lys Arg Leu 465 470
475 480 Lys Leu Glu Asp Asp Ser Glu Glu Ser Asp
Ala Glu Phe Asp Glu Gly 485 490
495 Phe Lys Val Pro Gly Phe Leu Phe Lys Lys Leu Phe Lys Tyr Gln
Gln 500 505 510 Thr
Gly Val Arg Trp Leu Trp Glu Leu His Cys Gln Gln Ala Gly Gly 515
520 525 Ile Leu Gly Asp Glu Met
Gly Leu Gly Lys Thr Ile Gln Ile Ile Ala 530 535
540 Phe Leu Ala Gly Leu Ser Tyr Ser Lys Ile Arg
Thr Arg Gly Ser Asn 545 550 555
560 Tyr Arg Phe Glu Gly Leu Gly Pro Thr Val Ile Val Cys Pro Thr Thr
565 570 575 Val Met
His Gln Trp Val Lys Glu Phe His Thr Trp Trp Pro Pro Phe 580
585 590 Arg Val Ala Ile Leu His Glu
Thr Gly Ser Tyr Thr His Lys Lys Glu 595 600
605 Lys Leu Ile Arg Asp Val Ala His Cys His Gly Ile
Leu Ile Thr Ser 610 615 620
Tyr Ser Tyr Ile Arg Leu Met Gln Asp Asp Ile Ser Arg Tyr Asp Trp 625
630 635 640 His Tyr Val
Ile Leu Asp Glu Gly His Lys Ile Arg Asn Pro Asn Ala 645
650 655 Ala Val Thr Leu Ala Cys Lys Gln
Phe Arg Thr Pro His Arg Ile Ile 660 665
670 Leu Ser Gly Ser Pro Met Gln Asn Asn Leu Arg Glu Leu
Trp Ser Leu 675 680 685
Phe Asp Phe Ile Phe Pro Gly Lys Leu Gly Thr Leu Pro Val Phe Met 690
695 700 Glu Gln Phe Ser
Val Pro Ile Thr Met Gly Gly Tyr Ser Asn Ala Ser 705 710
715 720 Pro Val Gln Val Lys Thr Ala Tyr Lys
Cys Ala Cys Val Leu Arg Asp 725 730
735 Thr Ile Asn Pro Tyr Leu Leu Arg Arg Met Lys Ser Asp Val
Lys Met 740 745 750
Ser Leu Ser Leu Pro Asp Lys Asn Glu Gln Val Leu Phe Cys Arg Leu
755 760 765 Thr Asp Glu Gln
His Lys Val Tyr Gln Asn Phe Val Asp Ser Lys Glu 770
775 780 Val Tyr Arg Ile Leu Asn Gly Glu
Met Gln Ile Phe Ser Gly Leu Ile 785 790
795 800 Ala Leu Arg Lys Ile Cys Asn His Pro Asp Leu Phe
Ser Gly Gly Pro 805 810
815 Lys Asn Leu Lys Gly Leu Pro Asp Asp Glu Leu Glu Glu Asp Gln Phe
820 825 830 Gly Tyr Trp
Lys Arg Ser Gly Lys Met Ile Val Val Glu Ser Leu Leu 835
840 845 Lys Ile Trp His Lys Gln Gly Gln
Arg Val Leu Leu Phe Ser Gln Ser 850 855
860 Arg Gln Met Leu Asp Ile Leu Glu Val Phe Leu Arg Ala
Gln Lys Tyr 865 870 875
880 Thr Tyr Leu Lys Met Asp Gly Thr Thr Thr Ile Ala Ser Arg Gln Pro
885 890 895 Leu Ile Thr Arg
Tyr Asn Glu Asp Thr Ser Ile Phe Val Phe Leu Leu 900
905 910 Thr Thr Arg Val Gly Gly Leu Gly Val
Asn Leu Thr Gly Ala Asn Arg 915 920
925 Val Val Ile Tyr Asp Pro Asp Trp Asn Pro Ser Thr Asp Thr
Gln Ala 930 935 940
Arg Glu Arg Ala Trp Arg Ile Gly Gln Lys Lys Gln Val Thr Val Tyr 945
950 955 960 Arg Leu Leu Thr Ala
Gly Thr Ile Glu Glu Lys Ile Tyr His Arg Gln 965
970 975 Ile Phe Lys Gln Phe Leu Thr Asn Arg Val
Leu Lys Asp Pro Lys Gln 980 985
990 Arg Arg Phe Phe Lys Ser Asn Asp Leu Tyr Glu Leu Phe Thr
Leu Thr 995 1000 1005
Ser Pro Asp Ala Ser Gln Ser Thr Glu Thr Ser Ala Ile Phe Ala 1010
1015 1020 Gly Thr Gly Ser Asp
Val Gln Thr Pro Lys Cys His Leu Lys Arg 1025 1030
1035 Arg Ile Gln Pro Ala Phe Gly Ala Asp His
Asp Val Pro Lys Arg 1040 1045 1050
Lys Lys Phe Pro Ala Ser Asn Ile Ser Val Asn Asp Ala Thr Ser
1055 1060 1065 Ser Glu
Glu Lys Ser Glu Ala Lys Gly Ala Glu Val Asn Ala Val 1070
1075 1080 Thr Ser Asn Arg Ser Asp Pro
Leu Lys Asp Asp Pro His Met Ser 1085 1090
1095 Ser Asn Val Thr Ser Asn Asp Arg Leu Gly Glu Glu
Thr Asn Ala 1100 1105 1110
Val Ser Gly Pro Glu Glu Leu Ser Val Ile Ser Gly Asn Gly Glu 1115
1120 1125 Cys Ser Asn Ser Ser
Gly Thr Gly Lys Thr Ser Met Pro Ser Gly 1130 1135
1140 Asp Glu Ser Ile Asp Glu Lys Leu Gly Leu
Ser Tyr Lys Arg Glu 1145 1150 1155
Arg Pro Ser Gln Ala Gln Thr Glu Ala Phe Trp Glu Asn Lys Gln
1160 1165 1170 Met Glu
Asn Asn Phe Tyr Lys His Lys Ser Lys Thr Lys His His 1175
1180 1185 Ser Val Ala Glu Glu Glu Thr
Leu Glu Lys His Leu Arg Pro Lys 1190 1195
1200 Gln Lys Pro Lys Asn Ser Lys His Cys Arg Asp Ala
Lys Phe Glu 1205 1210 1215
Gly Thr Arg Ile Pro His Leu Val Lys Lys Arg Arg Tyr Gln Lys 1220
1225 1230 Gln Asp Ser Glu Asn
Lys Ser Glu Ala Lys Glu Gln Ser Asn Asp 1235 1240
1245 Asp Tyr Val Leu Glu Lys Leu Phe Lys Lys
Ser Val Gly Val His 1250 1255 1260
Ser Val Met Lys His Asp Ala Ile Met Asp Gly Ala Ser Pro Asp
1265 1270 1275 Tyr Val
Leu Val Glu Ala Glu Ala Asn Arg Val Ala Gln Asp Ala 1280
1285 1290 Leu Lys Ala Leu Arg Leu Ser
Arg Gln Arg Cys Leu Gly Ala Val 1295 1300
1305 Ser Gly Val Pro Thr Trp Thr Gly His Arg Gly Ile
Ser Gly Ala 1310 1315 1320
Pro Ala Gly Lys Lys Ser Arg Phe Gly Lys Lys Arg Asn Ser Asn 1325
1330 1335 Phe Ser Val Gln His
Pro Ser Ser Thr Ser Pro Thr Glu Lys Cys 1340 1345
1350 Gln Asp Gly Ile Met Lys Lys Glu Gly Lys
Asp Asn Val Pro Glu 1355 1360 1365
His Phe Ser Gly Arg Ala Glu Asp Ala Asp Ser Ser Ser Gly Pro
1370 1375 1380 Leu Ala
Ser Ser Ser Leu Leu Ala Lys Met Arg Ala Arg Asn His 1385
1390 1395 Leu Ile Leu Pro Glu Arg Leu
Glu Ser Glu Ser Gly His Leu Gln 1400 1405
1410 Glu Ala Ser Ala Leu Leu Pro Thr Thr Glu His Asp
Asp Leu Leu 1415 1420 1425
Val Glu Met Arg Asn Phe Ile Ala Phe Gln Ala His Thr Asp Gly 1430
1435 1440 Gln Ala Ser Thr Arg
Glu Ile Leu Gln Glu Phe Glu Ser Lys Leu 1445 1450
1455 Ser Ala Ser Gln Ser Cys Val Phe Arg Glu
Leu Leu Arg Asn Leu 1460 1465 1470
Cys Thr Phe His Arg Thr Ser Gly Gly Glu Gly Ile Trp Lys Leu
1475 1480 1485 Lys Pro
Glu Tyr Cys 1490 51493PRTHomo sapiens 5Met Pro Asn Glu
Gly Ile Pro His Ser Ser Gln Thr Gln Glu Gln Asp 1 5
10 15 Cys Leu Gln Ser Gln Pro Val Ser Asn
Asn Glu Glu Met Ala Ile Lys 20 25
30 Gln Glu Ser Gly Gly Asp Gly Glu Val Glu Glu Tyr Leu Ser
Phe Arg 35 40 45
Ser Val Gly Asp Gly Leu Ser Thr Ser Ala Val Gly Cys Ala Ser Ala 50
55 60 Ala Pro Arg Arg Gly
Pro Ala Leu Leu His Ile Asp Arg His Gln Ile 65 70
75 80 Gln Ala Val Glu Pro Ser Ala Gln Ala Leu
Glu Leu Gln Gly Leu Gly 85 90
95 Val Asp Val Tyr Asp Gln Asp Val Leu Glu Gln Gly Val Leu Gln
Gln 100 105 110 Val
Asp Asn Ala Ile His Glu Ala Ser Arg Ala Ser Gln Leu Val Asp 115
120 125 Val Glu Lys Glu Tyr Arg
Ser Val Leu Asp Asp Leu Thr Ser Cys Thr 130 135
140 Thr Ser Leu Arg Gln Ile Asn Lys Ile Ile Glu
Gln Leu Ser Pro Gln 145 150 155
160 Ala Ala Thr Ser Arg Asp Ile Asn Arg Lys Leu Asp Ser Val Lys Arg
165 170 175 Gln Lys
Tyr Asn Lys Glu Gln Gln Leu Lys Lys Ile Thr Ala Lys Gln 180
185 190 Lys His Leu Gln Ala Ile Leu
Gly Gly Ala Glu Val Lys Ile Glu Leu 195 200
205 Asp His Ala Ser Leu Glu Glu Asp Ala Glu Pro Gly
Pro Ser Ser Leu 210 215 220
Gly Ser Met Leu Met Pro Val Gln Glu Thr Ala Trp Glu Glu Leu Ile 225
230 235 240 Arg Thr Gly
Gln Met Thr Pro Phe Gly Thr Gln Ile Pro Gln Lys Gln 245
250 255 Glu Lys Lys Pro Arg Lys Ile Met
Leu Asn Glu Ala Ser Gly Phe Glu 260 265
270 Lys Tyr Leu Ala Asp Gln Ala Lys Leu Ser Phe Glu Arg
Lys Lys Gln 275 280 285
Gly Cys Asn Lys Arg Ala Ala Arg Lys Ala Pro Ala Pro Val Thr Pro 290
295 300 Pro Ala Pro Val
Gln Asn Lys Asn Lys Pro Asn Lys Lys Ala Arg Val 305 310
315 320 Leu Ser Lys Lys Glu Glu Arg Leu Lys
Lys His Ile Lys Lys Leu Gln 325 330
335 Lys Arg Ala Leu Gln Phe Gln Gly Lys Val Gly Leu Pro Lys
Ala Arg 340 345 350
Arg Pro Trp Glu Ser Asp Met Arg Pro Glu Ala Glu Gly Asp Ser Glu
355 360 365 Gly Glu Glu Ser
Glu Tyr Phe Pro Thr Glu Glu Glu Glu Glu Glu Glu 370
375 380 Asp Asp Glu Val Glu Gly Ala Glu
Ala Asp Leu Ser Gly Asp Gly Thr 385 390
395 400 Asp Tyr Glu Leu Lys Pro Leu Pro Lys Gly Gly Lys
Arg Gln Lys Lys 405 410
415 Val Pro Val Gln Glu Ile Asp Asp Asp Phe Phe Pro Ser Ser Gly Glu
420 425 430 Glu Ala Glu
Ala Ala Ser Val Gly Glu Gly Gly Gly Gly Gly Arg Lys 435
440 445 Val Gly Arg Tyr Arg Asp Asp Gly
Asp Glu Asp Tyr Tyr Lys Gln Arg 450 455
460 Leu Arg Arg Trp Asn Lys Leu Arg Leu Gln Asp Lys Glu
Lys Arg Leu 465 470 475
480 Lys Leu Glu Asp Asp Ser Glu Glu Ser Asp Ala Glu Phe Asp Glu Gly
485 490 495 Phe Lys Val Pro
Gly Phe Leu Phe Lys Lys Leu Phe Lys Tyr Gln Gln 500
505 510 Thr Gly Val Arg Trp Leu Trp Glu Leu
His Cys Gln Gln Ala Gly Gly 515 520
525 Ile Leu Gly Asp Glu Met Gly Leu Gly Lys Thr Ile Gln Ile
Ile Ala 530 535 540
Phe Leu Ala Gly Leu Ser Tyr Ser Lys Ile Arg Thr Arg Gly Ser Asn 545
550 555 560 Tyr Arg Phe Glu Gly
Leu Gly Pro Thr Val Ile Val Cys Pro Thr Thr 565
570 575 Val Met His Gln Trp Val Lys Glu Phe His
Thr Trp Trp Pro Pro Phe 580 585
590 Arg Val Ala Ile Leu His Glu Thr Gly Ser Tyr Thr His Lys Lys
Glu 595 600 605 Lys
Leu Ile Arg Asp Val Ala His Cys His Gly Ile Leu Ile Thr Ser 610
615 620 Tyr Ser Tyr Ile Arg Leu
Met Gln Asp Asp Ile Ser Arg Tyr Asp Trp 625 630
635 640 His Tyr Val Ile Leu Asp Glu Gly His Lys Ile
Arg Asn Pro Asn Ala 645 650
655 Ala Val Thr Leu Ala Cys Lys Gln Phe Arg Thr Pro His Arg Ile Ile
660 665 670 Leu Ser
Gly Ser Pro Met Gln Asn Asn Leu Arg Glu Leu Trp Ser Leu 675
680 685 Phe Asp Phe Ile Phe Pro Gly
Lys Leu Gly Thr Leu Pro Val Phe Met 690 695
700 Glu Gln Phe Ser Val Pro Ile Thr Met Gly Gly Tyr
Ser Asn Ala Ser 705 710 715
720 Pro Val Gln Val Lys Thr Ala Tyr Lys Cys Ala Cys Val Leu Arg Asp
725 730 735 Thr Ile Asn
Pro Tyr Leu Leu Arg Arg Met Lys Ser Asp Val Lys Met 740
745 750 Ser Leu Ser Leu Pro Asp Lys Asn
Glu Gln Val Leu Phe Cys Arg Leu 755 760
765 Thr Asp Glu Gln His Lys Val Tyr Gln Asn Phe Val Asp
Ser Lys Glu 770 775 780
Val Tyr Arg Ile Leu Asn Gly Glu Met Gln Ile Phe Ser Gly Leu Ile 785
790 795 800 Ala Leu Arg Lys
Ile Cys Asn His Pro Asp Leu Phe Ser Gly Gly Pro 805
810 815 Lys Asn Leu Lys Gly Leu Pro Asp Asp
Glu Leu Glu Glu Asp Gln Phe 820 825
830 Gly Tyr Trp Lys Arg Ser Gly Lys Met Ile Val Val Glu Ser
Leu Leu 835 840 845
Lys Ile Trp His Lys Gln Gly Gln Arg Val Leu Leu Phe Ser Gln Ser 850
855 860 Arg Gln Met Leu Asp
Ile Leu Glu Val Phe Leu Arg Ala Gln Lys Tyr 865 870
875 880 Thr Tyr Leu Lys Met Asp Gly Thr Thr Thr
Ile Ala Ser Arg Gln Pro 885 890
895 Leu Ile Thr Arg Tyr Asn Glu Asp Thr Ser Ile Phe Val Phe Leu
Leu 900 905 910 Thr
Thr Arg Val Gly Gly Leu Gly Val Asn Leu Thr Gly Ala Asn Arg 915
920 925 Val Val Ile Tyr Asp Pro
Asp Trp Asn Pro Ser Thr Asp Thr Gln Ala 930 935
940 Arg Glu Arg Ala Trp Arg Ile Gly Gln Lys Lys
Gln Val Thr Val Tyr 945 950 955
960 Arg Leu Leu Thr Ala Gly Thr Ile Glu Glu Lys Ile Tyr His Arg Gln
965 970 975 Ile Phe
Lys Gln Phe Leu Thr Asn Arg Val Leu Lys Asp Pro Lys Gln 980
985 990 Arg Arg Phe Phe Lys Ser Asn
Asp Leu Tyr Glu Leu Phe Thr Leu Thr 995 1000
1005 Ser Pro Asp Ala Ser Gln Ser Thr Glu Thr
Ser Ala Ile Phe Ala 1010 1015 1020
Gly Thr Gly Ser Asp Val Gln Thr Pro Lys Cys His Leu Lys Arg
1025 1030 1035 Arg Ile
Gln Pro Ala Phe Gly Ala Asp His Asp Val Pro Lys Arg 1040
1045 1050 Lys Lys Phe Pro Ala Ser Asn
Ile Ser Val Asn Asp Ala Thr Ser 1055 1060
1065 Ser Glu Glu Lys Ser Glu Ala Lys Gly Ala Glu Val
Asn Ala Val 1070 1075 1080
Thr Ser Asn Arg Ser Asp Pro Leu Lys Asp Asp Pro His Met Ser 1085
1090 1095 Ser Asn Val Thr Ser
Asn Asp Arg Leu Gly Glu Glu Thr Asn Ala 1100 1105
1110 Val Ser Gly Pro Glu Glu Leu Ser Val Ile
Ser Gly Asn Gly Glu 1115 1120 1125
Cys Ser Asn Ser Ser Gly Thr Gly Lys Thr Ser Met Pro Ser Gly
1130 1135 1140 Asp Glu
Ser Ile Asp Glu Lys Leu Gly Leu Ser Tyr Lys Arg Glu 1145
1150 1155 Arg Pro Ser Gln Ala Gln Thr
Glu Ala Phe Trp Glu Asn Lys Gln 1160 1165
1170 Met Glu Asn Asn Phe Tyr Lys His Lys Ser Lys Thr
Lys His His 1175 1180 1185
Ser Val Ala Glu Glu Glu Thr Leu Glu Lys His Leu Arg Pro Lys 1190
1195 1200 Gln Lys Pro Lys Asn
Ser Lys His Cys Arg Asp Ala Lys Phe Glu 1205 1210
1215 Gly Thr Arg Ile Pro His Leu Val Lys Lys
Arg Arg Tyr Gln Lys 1220 1225 1230
Gln Asp Ser Glu Asn Lys Ser Glu Ala Lys Glu Gln Ser Asn Asp
1235 1240 1245 Asp Tyr
Val Leu Glu Lys Leu Phe Lys Lys Ser Val Gly Val His 1250
1255 1260 Ser Val Met Lys His Asp Ala
Ile Met Asp Gly Ala Ser Pro Asp 1265 1270
1275 Tyr Val Leu Val Glu Ala Glu Ala Asn Arg Val Ala
Gln Asp Ala 1280 1285 1290
Leu Lys Ala Leu Arg Leu Ser Arg Gln Arg Cys Leu Gly Ala Val 1295
1300 1305 Ser Gly Val Pro Thr
Trp Thr Gly His Arg Gly Ile Ser Gly Ala 1310 1315
1320 Pro Ala Gly Lys Lys Ser Arg Phe Gly Lys
Lys Arg Asn Ser Asn 1325 1330 1335
Phe Ser Val Gln His Pro Ser Ser Thr Ser Pro Thr Glu Lys Cys
1340 1345 1350 Gln Asp
Gly Ile Met Lys Lys Glu Gly Lys Asp Asn Val Pro Glu 1355
1360 1365 His Phe Ser Gly Arg Ala Glu
Asp Ala Asp Ser Ser Ser Gly Pro 1370 1375
1380 Leu Ala Ser Ser Ser Leu Leu Ala Lys Met Arg Ala
Arg Asn His 1385 1390 1395
Leu Ile Leu Pro Glu Arg Leu Glu Ser Glu Ser Gly His Leu Gln 1400
1405 1410 Glu Ala Ser Ala Leu
Leu Pro Thr Thr Glu His Asp Asp Leu Leu 1415 1420
1425 Val Glu Met Arg Asn Phe Ile Ala Phe Gln
Ala His Thr Asp Gly 1430 1435 1440
Gln Ala Ser Thr Arg Glu Ile Leu Gln Glu Phe Glu Ser Lys Leu
1445 1450 1455 Ser Ala
Ser Gln Ser Cys Val Phe Arg Glu Leu Leu Arg Asn Leu 1460
1465 1470 Cys Thr Phe His Arg Thr Ser
Gly Gly Glu Gly Ile Trp Lys Leu 1475 1480
1485 Lys Pro Glu Tyr Cys 1490
64296DNAHomo sapiens 6atgagtgaaa aaaaattgga aacaactgca cagcagcgga
aatgtcctga atggatgaat 60gtgcagaata aaagatgtgc tgtagaagaa agaaaggcat
gtgttcggaa gagtgttttt 120gaagatgacc tccccttctt agaattcact ggatccattg
tgtatagtta cgatgctagt 180gattgctctt tcctgtcaga agatattagc atgagtctat
cagatgggga tgtggtggga 240tttgacatgg agtggccacc attatacaat agagggaaac
ttggcaaagt tgcactaatt 300cagttgtgtg tttctgagag caaatgttac ttgttccacg
tttcttccat gtcagttttt 360ccccagggat taaaaatgtt gcttgaaaat aaagcagtta
aaaaggcagg tgtaggaatt 420gaaggagatc agtggaaact tctacgtgac tttgatatca
aattgaagaa ttttgtggag 480ttgacagatg ttgccaataa aaagctgaaa tgcacagaga
cctggagcct taacagtctg 540gttaaacacc tcttaggtaa acagctcctg aaagacaagt
ctatccgctg tagcaattgg 600agtaaatttc ctctcactga ggaccagaaa ctgtatgcag
ccactgatgc ttatgctggt 660tttattattt accgaaattt agagattttg gatgatactg
tgcaaaggtt tgctataaat 720aaagaggaag aaatcctact tagcgacatg aacaaacagt
tgacttcaat ctctgaggaa 780gtgatggatc tggctaagca tcttcctcat gctttcagta
aattggaaaa cccacggagg 840gtttctatct tactaaagga tatttcagaa aatctatatt
cactgaggag gatgataatt 900gggtctacta acattgagac tgaactgagg cccagcaata
atttaaactt attatccttt 960gaagattcaa ctactggggg agtacaacag aaacaaatta
gagaacatga agttttaatt 1020cacgttgaag atgaaacatg ggacccaaca cttgatcatt
tagctaaaca tgatggagaa 1080gatgtacttg gaaataaagt ggaacgaaaa gaagatggat
ttgaagatgg agtagaagac 1140aacaaattga aagagaatat ggaaagagct tgtttgatgt
cgttagatat tacagaacat 1200gaactccaaa ttttggaaca gcagtctcag gaagaatatc
ttagtgatat tgcttataaa 1260tctactgagc atttatctcc caatgataat gaaaacgata
cgtcctatgt aattgagagt 1320gatgaagatt tagaaatgga gatgcttaag catttatctc
ccaatgataa tgaaaacgat 1380acgtcctatg taattgagag tgatgaagat ttagaaatgg
agatgcttaa gtctttagaa 1440aacctcaata gtggcacggt agaaccaact cattctaaat
gcttaaaaat ggaaagaaat 1500ctgggtcttc ctactaaaga agaagaagaa gatgatgaaa
atgaagctaa tgaaggggaa 1560gaagatgatg ataaggactt tttgtggcca gcacccaatg
aagagcaagt tacttgcctc 1620aagatgtact ttggccattc cagttttaaa ccagttcagt
ggaaagtgat tcattcagta 1680ttagaagaaa gaagagataa tgttgctgtc atggcaactg
gatatggaaa gagtttgtgc 1740ttccagtatc cacctgttta tgtaggcaag attggccttg
ttatctctcc ccttatttct 1800ctgatggaag accaagtgct acagcttaaa atgtccaaca
tcccagcttg cttccttgga 1860tcagcacagt cagaaaatgt tctaacagat attaaattag
gtaaataccg gattgtatac 1920gtaactccag aatactgttc aggtaacatg ggcctgctcc
agcaacttga ggctgatatt 1980ggtatcacgc tcattgctgt ggatgaggct cactgtattt
ctgagtgggg gcatgatttt 2040agggattcat tcaggaagtt gggctcccta aagacagcac
tgccaatggt tccaatcgtt 2100gcacttactg ctattgcaag ttcttcaatc cgggaagaca
ttgtacgttg cttaaatctg 2160agaaatcctc agatcacctg tactggtttt gatcgaccaa
acctgtattt agaagttagg 2220cgaaaaacag ggaatatcct tcaggatctg cagccatttc
ttgtcaaaac aagttcccac 2280tgggaatttg aaggtccaac aatcatctac tgtccttcta
gaaaaatgac acaacaagtt 2340acaggtgaac ttaggaaact gaatctatcc tgtggaacat
accatgcggg catgagtttt 2400agcacaagga aagacattca tcataggttt gtaagagatg
aaattcagtg tgtcatagct 2460accatagctt ttggaatggg cattaataaa gctgacattc
gccaagtcat tcattacggt 2520gctcctaagg acatggaatc atattatcag gagattggta
gagctggtcg tgatggactt 2580caaagttctt gtcacgtcct ctgggctcct gcagacatta
acttaaatag gcaccttctt 2640actgagatac gtaatgagaa gtttcgatta tacaaattaa
agatgatggc aaagatggaa 2700aaatatcttc attctagcag atgtaggaga caaatcatct
tgtctcattt tgaggacaaa 2760caagtacaaa aagcctcctt gggaattatg ggaactgaaa
aatgctgtga taattgcagg 2820tccagattgg atcattgcta ttccatggat gactcagagg
atacatcctg ggactttggt 2880ccacaagcat ttaagctttt gtctgctgtg gacatcttag
gcgaaaaatt tggaattggg 2940cttccaattt tatttctccg aggatctaat tctcagcgtc
ttgccgatca atatcgcagg 3000cacagtttat ttggcactgg caaggatcaa acagagagtt
ggtggaaggc tttttcccgt 3060cagctgatca ctgagggatt cttggtagaa gtttctcggt
ataacaaatt tatgaagatt 3120tgcgccctta cgaaaaaggg tagaaattgg cttcataaag
ctaatacaga atctcagagc 3180ctcatccttc aagctaatga agaattgtgt ccaaagaagt
tgcttctgcc tagttcgaaa 3240actgtatctt cgggcaccaa agagcattgt tataatcaag
taccagttga attaagtaca 3300gagaagaagt ctaacttgga gaagttatat tcttataaac
catgtgataa gatttcttct 3360gggagtaaca tttctaaaaa aagtatcatg gtacagtcac
cagaaaaagc ttacagttcc 3420tcacagcctg ttatttcggc acaagagcag gagactcaga
ttgtgttata tggcaaattg 3480gtagaagcta ggcagaaaca tgccaataaa atggatgttc
ccccagctat tctggcaaca 3540aacaagatac tggtggatat ggccaaaatg agaccaacta
cggttgaaaa cgtaaaaagg 3600attgatggtg tttctgaagg caaagctgcc atgttggccc
ctctgttgga agtcatcaaa 3660catttctgcc aaacaaatag tgttcagaca gacctctttt
caagtacaaa acctcaagaa 3720gaacagaaga cgagtctggt agcaaaaaat aaaatatgca
cactttcaca gtctatggcc 3780atcacatact ctttattcca agaaaagaag atgcctttga
agagcatagc tgagagcagg 3840attctgcctc tcatgacaat tggcatgcac ttatcccaag
cggtgaaagc tggctgcccc 3900cttgatttgg agcgagcagg cctgactcca gaggttcaga
agattattgc tgatgttatc 3960cgaaaccctc ccgtcaactc agatatgagt aaaattagcc
taatcagaat gttagttcct 4020gaaaacattg acacgtacct tatccacatg gcaattgaga
tccttaaaca tggtcctgac 4080agcggacttc aaccttcatg tgatgtcaac aaaaggagat
gttttcccgg ttctgaagag 4140atctgttcaa gttctaagag aagcaaggaa gaagtaggca
tcaatactga gacttcatct 4200gcagagagaa agagacgatt acctgtgtgg tttgccaaag
gaagtgatac cagcaagaaa 4260ttaatggaca aaacgaaaag gggaggtctt tttagt
429674296DNAHomo sapiens 7atgagtgaaa aaaaattgga
aacaactgca cagcagcgga aatgtcctga atggatgaat 60gtgcagaata aaagatgtgc
tgtagaagaa agaaaggcat gtgttcggaa gagtgttttt 120gaagatgacc tccccttctt
agaattcact ggatccattg tgtatagtta cgatgctagt 180gattgctctt tcctgtcaga
agatattagc atgagtctat cagatgggga tgtggtggga 240tttgacatgg agtggccacc
attatacaat agagggaaac ttggcaaagt tgcactaatt 300cagttgtgtg tttctgagag
caaatgttac ttgttccacg tttcttccat gtcagttttt 360ccccagggat taaaaatgtt
gcttgaaaat aaagcagtta aaaaggcagg tgtaggaatt 420gaaggagatc agtggaaact
tctacgtgac tttgatatca aattgaagaa ttttgtggag 480ttgacagatg ttgccaataa
aaagctgaaa tgcacagaga cctggagcct taacagtctg 540gttaaacacc tcttaggtaa
acagctcctg aaagacaagt ctatccgctg tagcaattgg 600agtaaatttc ctctcactga
ggaccagaaa ctgtatgcag ccactgatgc ttatgctggt 660tttattattt accgaaattt
agagattttg gatgatactg tgcaaaggtt tgctataaat 720aaagaggaag aaatcctact
tagcgacatg aacaaacagt tgacttcaat ctctgaggaa 780gtgatggatc tggctaagca
tcttcctcat gctttcagta aattggaaaa cccacggagg 840gtttctatct tactaaagga
tatttcagaa aatctatatt cactgaggag gatgataatt 900gggtctacta acattgagac
tgaactgagg cccagcaata atttaaactt attatccttt 960gaagattcaa ctactggggg
agtacaacag aaacaaatta gagaacatga agttttaatt 1020cacgttgaag atgaaacatg
ggacccaaca cttgatcatt tagctaaaca tgatggagaa 1080gatgtacttg gaaataaagt
ggaacgaaaa gaagatggat ttgaagatgg agtagaagac 1140aacaaattga aagagaatat
ggaaagagct tgtttgatgt cgttagatat tacagaacat 1200gaactccaaa ttttggaaca
gcagtctcag gaagaatatc ttagtgatat tgcttataaa 1260tctactgagc atttatctcc
caatgataat gaaaacgata cgtcctatgt aattgagagt 1320gatgaagatt tagaaatgga
gatgcttaag catttatctc ccaatgataa tgaaaacgat 1380acgtcctatg taattgagag
tgatgaagat ttagaaatgg agatgcttaa gtctttagaa 1440aacctcaata gtggcacggt
agaaccaact cattctaaat gcttaaaaat ggaaagaaat 1500ctgggtcttc ctactaaaga
agaagaagaa gatgatgaaa atgaagctaa tgaaggggaa 1560gaagatgatg ataaggactt
tttgtggcca gcacccaatg aagagcaagt tacttgcctc 1620aagatgtact ttggccattc
cagttttaaa ccagttcagt ggaaagtgat tcattcagta 1680ttagaagaaa gaagagataa
tgttgctgtc atggcaactg gatatggaaa gagtttgtgc 1740ttccagtatc cacctgttta
tgtaggcaag attggccttg ttatctctcc ccttatttct 1800ctgatggaag accaagtgct
acagcttaaa atgtccaaca tcccagcttg cttccttgga 1860tcagcacagt cagaaaatgt
tctaacagat attaaattag gtaaataccg gattgtatac 1920gtaactccag aatactgttc
aggtaacatg ggcctgctcc agcaacttga ggctgatatt 1980ggtatcacgc tcattgctgt
ggatgaggct cactgtattt ctgagtgggg gcatgatttt 2040agggattcat tcaggaagtt
gggctcccta aagacagcac tgccaatggt tccaatcgtt 2100gcacttactg ctactgcaag
ttcttcaatc cgggaagaca ttgtacgttg cttaaatctg 2160agaaatcctc agatcacctg
tactggtttt gatcgaccaa acctgtattt agaagttagg 2220cgaaaaacag ggaatatcct
tcaggatctg cagccatttc ttgtcaaaac aagttcccac 2280tgggaatttg aaggtccaac
aatcatctac tgtccttcta gaaaaatgac acaacaagtt 2340acaggtgaac ttaggaaact
gaatctatcc tgtggaacat accatgcggg catgagtttt 2400agcacaagga aagacattca
tcataggttt gtaagagatg aaattcagtg tgtcatagct 2460accatagctt ttggaatggg
cattaataaa gctgacattc gccaagtcat tcattacggt 2520gctcctaagg acatggaatc
atattatcag gagattggta gagctggtcg tgatggactt 2580caaagttctt gtcacgtcct
ctgggctcct gcagacatta acttaaatag gcaccttctt 2640actgagatac gtaatgagaa
gtttcgatta tacaaattaa agatgatggc aaagatggaa 2700aaatatcttc attctagcag
atgtaggaga caaatcatct tgtctcattt tgaggacaaa 2760caagtacaaa aagcctcctt
gggaattatg ggaactgaaa aatgctgtga taattgcagg 2820tccagattgg atcattgcta
ttccatggat gactcagagg atacatcctg ggactttggt 2880ccacaagcat ttaagctttt
gtctgctgtg gacatcttag gcgaaaaatt tggaattggg 2940cttccaattt tatttctccg
aggatctaat tctcagcgtc ttgccgatca atatcgcagg 3000cacagtttat ttggcactgg
caaggatcaa acagagagtt ggtggaaggc tttttcccgt 3060cagctgatca ctgagggatt
cttggtagaa gtttctcggt ataacaaatt tatgaagatt 3120tgcgccctta cgaaaaaggg
tagaaattgg cttcataaag ctaatacaga atctcagagc 3180ctcatccttc aagctaatga
agaattgtgt ccaaagaagt tgcttctgcc tagttcgaaa 3240actgtatctt cgggcaccaa
agagcattgt tataatcaag taccagttga attaagtaca 3300gagaagaagt ctaacttgga
gaagttatat tcttataaac catgtgataa gatttcttct 3360gggagtaaca tttctaaaaa
aagtatcatg gtacagtcac cagaaaaagc ttacagttcc 3420tcacagcctg ttatttcggc
acaagagcag gagactcaga ttgtgttata tggcaaattg 3480gtagaagcta ggcagaaaca
tgccaataaa atggatgttc ccccagctat tctggcaaca 3540aacaagatac tggtggatat
ggccaaaatg agaccaacta cggttgaaaa cgtaaaaagg 3600attgatggtg tttctgaagg
caaagctgcc atgttggccc ctctgttgga agtcatcaaa 3660catttctgcc aaacaaatag
tgttcagaca gacctctttt caagtacaaa acctcaagaa 3720gaacagaaga cgagtctggt
agcaaaaaat aaaatatgca cactttcaca gtctatggcc 3780atcacatact ctttattcca
agaaaagaag atgcctttga agagcatagc tgagagcagg 3840attctgcctc tcatgacaat
tggcatgcac ttatcccaag cggtgaaagc tggctgcccc 3900cttgatttgg agcgagcagg
cctgactcca gaggttcaga agattattgc tgatgttatc 3960cgaaaccctc ccgtcaactc
agatatgagt aaaattagcc taatcagaat gttagttcct 4020gaaaacattg acacgtacct
tatccacatg gcaattgaga tccttaaaca tggtcctgac 4080agcggacttc aaccttcatg
tgatgtcaac aaaaggagat gttttcccgg ttctgaagag 4140atctgttcaa gttctaagag
aagcaaggaa gaagtaggca tcaatactga gacttcatct 4200gcagagagaa agagacgatt
acctgtgtgg tttgccaaag gaagtgatac cagcaagaaa 4260ttaatggaca aaacgaaaag
gggaggtctt tttagt 429685765DNAHomo sapiens
8cagccgcccc tcctgcggcc gctgcggggg ccgccgcctg acttcggaca ccggccccgc
60acccgccagg aggggaggga aggggaggcg gggagagcga cggcgggggg cgggcggtgg
120accccgcctc ccccggcaca gcctgctgag gggaagaggg ggtctccgct cttcctcagt
180gcactctctg actgaagccc ggcgcgtggg gtgcagcggg agtgcgaggg gactggacag
240gtgggaagat gggaatgagg accgggcggc gggaatgttc tcacttctcc ggattccacc
300gggatgcagg actctagctg cccagccgca cctgcgaaga gactacactt cccgaggtgc
360tcagcggcag cgagggcctc cacgcatgcg caccgcggcg cgctgggcgg ggctggatgg
420gctgtggtgg gagggttgca gcgccgcgag aaaggcgagc cgggccgggg gcggggaaag
480gggtggggca ggaacggggg cggggacggc gctggagggg cgggtcgggt aggtctcccg
540gagctgatgt gtactgtgtg cgccggggag gcgccggctt gtactcggca gcgcgggaat
600aaagtttgct gatttggtgt ctagcctgga tgcctgggtt gcaggccctg cttgtggtgg
660cgctccacag tcatccggct gaagaagacc tgttggactg gatcttctcg ggttttcttt
720cagatattgt tttgtattta cccatgaaga cattgttttt tggactctgc aaataggaca
780tttcaaagat gagtgaaaaa aaattggaaa caactgcaca gcagcggaaa tgtcctgaat
840ggatgaatgt gcagaataaa agatgtgctg tagaagaaag aaaggcatgt gttcggaaga
900gtgtttttga agatgacctc cccttcttag aattcactgg atccattgtg tatagttacg
960atgctagtga ttgctctttc ctgtcagaag atattagcat gagtctatca gatggggatg
1020tggtgggatt tgacatggag tggccaccat tatacaatag agggaaactt ggcaaagttg
1080cactaattca gttgtgtgtt tctgagagca aatgttactt gttccacgtt tcttccatgt
1140cagtttttcc ccagggatta aaaatgttgc ttgaaaataa agcagttaaa aaggcaggtg
1200taggaattga aggagatcag tggaaacttc tacgtgactt tgatatcaaa ttgaagaatt
1260ttgtggagtt gacagatgtt gccaataaaa agctgaaatg cacagagacc tggagcctta
1320acagtctggt taaacacctc ttaggtaaac agctcctgaa agacaagtct atccgctgta
1380gcaattggag taaatttcct ctcactgagg accagaaact gtatgcagcc actgatgctt
1440atgctggttt tattatttac cgaaatttag agattttgga tgatactgtg caaaggtttg
1500ctataaataa agaggaagaa atcctactta gcgacatgaa caaacagttg acttcaatct
1560ctgaggaagt gatggatctg gctaagcatc ttcctcatgc tttcagtaaa ttggaaaacc
1620cacggagggt ttctatctta ctaaaggata tttcagaaaa tctatattca ctgaggagga
1680tgataattgg gtctactaac attgagactg aactgaggcc cagcaataat ttaaacttat
1740tatcctttga agattcaact actgggggag tacaacagaa acaaattaga gaacatgaag
1800ttttaattca cgttgaagat gaaacatggg acccaacact tgatcattta gctaaacatg
1860atggagaaga tgtacttgga aataaagtgg aacgaaaaga agatggattt gaagatggag
1920tagaagacaa caaattgaaa gagaatatgg aaagagcttg tttgatgtcg ttagatatta
1980cagaacatga actccaaatt ttggaacagc agtctcagga agaatatctt agtgatattg
2040cttataaatc tactgagcat ttatctccca atgataatga aaacgatacg tcctatgtaa
2100ttgagagtga tgaagattta gaaatggaga tgcttaagca tttatctccc aatgataatg
2160aaaacgatac gtcctatgta attgagagtg atgaagattt agaaatggag atgcttaagt
2220ctttagaaaa cctcaatagt ggcacggtag aaccaactca ttctaaatgc ttaaaaatgg
2280aaagaaatct gggtcttcct actaaagaag aagaagaaga tgatgaaaat gaagctaatg
2340aaggggaaga agatgatgat aaggactttt tgtggccagc acccaatgaa gagcaagtta
2400cttgcctcaa gatgtacttt ggccattcca gttttaaacc agttcagtgg aaagtgattc
2460attcagtatt agaagaaaga agagataatg ttgctgtcat ggcaactgga tatggaaaga
2520gtttgtgctt ccagtatcca cctgtttatg taggcaagat tggccttgtt atctctcccc
2580ttatttctct gatggaagac caagtgctac agcttaaaat gtccaacatc ccagcttgct
2640tccttggatc agcacagtca gaaaatgttc taacagatat taaattaggt aaataccgga
2700ttgtatacgt aactccagaa tactgttcag gtaacatggg cctgctccag caacttgagg
2760ctgatattgg tatcacgctc attgctgtgg atgaggctca ctgtatttct gagtgggggc
2820atgattttag ggattcattc aggaagttgg gctccctaaa gacagcactg ccaatggttc
2880caatcgttgc acttactgct actgcaagtt cttcaatccg ggaagacatt gtacgttgct
2940taaatctgag aaatcctcag atcacctgta ctggttttga tcgaccaaac ctgtatttag
3000aagttaggcg aaaaacaggg aatatccttc aggatctgca gccatttctt gtcaaaacaa
3060gttcccactg ggaatttgaa ggtccaacaa tcatctactg tccttctaga aaaatgacac
3120aacaagttac aggtgaactt aggaaactga atctatcctg tggaacatac catgcgggca
3180tgagttttag cacaaggaaa gacattcatc ataggtttgt aagagatgaa attcagtgtg
3240tcatagctac catagctttt ggaatgggca ttaataaagc tgacattcgc caagtcattc
3300attacggtgc tcctaaggac atggaatcat attatcagga gattggtaga gctggtcgtg
3360atggacttca aagttcttgt cacgtcctct gggctcctgc agacattaac ttaaataggc
3420accttcttac tgagatacgt aatgagaagt ttcgattata caaattaaag atgatggcaa
3480agatggaaaa atatcttcat tctagcagat gtaggagaca aatcatcttg tctcattttg
3540aggacaaaca agtacaaaaa gcctccttgg gaattatggg aactgaaaaa tgctgtgata
3600attgcaggtc cagattggat cattgctatt ccatggatga ctcagaggat acatcctggg
3660actttggtcc acaagcattt aagcttttgt ctgctgtgga catcttaggc gaaaaatttg
3720gaattgggct tccaatttta tttctccgag gatctaattc tcagcgtctt gccgatcaat
3780atcgcaggca cagtttattt ggcactggca aggatcaaac agagagttgg tggaaggctt
3840tttcccgtca gctgatcact gagggattct tggtagaagt ttctcggtat aacaaattta
3900tgaagatttg cgcccttacg aaaaagggta gaaattggct tcataaagct aatacagaat
3960ctcagagcct catccttcaa gctaatgaag aattgtgtcc aaagaagttg cttctgccta
4020gttcgaaaac tgtatcttcg ggcaccaaag agcattgtta taatcaagta ccagttgaat
4080taagtacaga gaagaagtct aacttggaga agttatattc ttataaacca tgtgataaga
4140tttcttctgg gagtaacatt tctaaaaaaa gtatcatggt acagtcacca gaaaaagctt
4200acagttcctc acagcctgtt atttcggcac aagagcagga gactcagatt gtgttatatg
4260gcaaattggt agaagctagg cagaaacatg ccaataaaat ggatgttccc ccagctattc
4320tggcaacaaa caagatactg gtggatatgg ccaaaatgag accaactacg gttgaaaacg
4380taaaaaggat tgatggtgtt tctgaaggca aagctgccat gttggcccct ctgttggaag
4440tcatcaaaca tttctgccaa acaaatagtg ttcagacaga cctcttttca agtacaaaac
4500ctcaagaaga acagaagacg agtctggtag caaaaaataa aatatgcaca ctttcacagt
4560ctatggccat cacatactct ttattccaag aaaagaagat gcctttgaag agcatagctg
4620agagcaggat tctgcctctc atgacaattg gcatgcactt atcccaagcg gtgaaagctg
4680gctgccccct tgatttggag cgagcaggcc tgactccaga ggttcagaag attattgctg
4740atgttatccg aaaccctccc gtcaactcag atatgagtaa aattagccta atcagaatgt
4800tagttcctga aaacattgac acgtacctta tccacatggc aattgagatc cttaaacatg
4860gtcctgacag cggacttcaa ccttcatgtg atgtcaacaa aaggagatgt tttcccggtt
4920ctgaagagat ctgttcaagt tctaagagaa gcaaggaaga agtaggcatc aatactgaga
4980cttcatctgc agagagaaag agacgattac ctgtgtggtt tgccaaagga agtgatacca
5040gcaagaaatt aatggacaaa acgaaaaggg gaggtctttt tagttaagct ggcaattacc
5100agaacaatta tgtttcttgc tgtattataa gaggatagct atattttatt tctgaagagt
5160aaggagtagt attttggctt aaaaatcatt ctaattacaa agttcactgt ttattgaaga
5220actggcatct taaatcagcc ttccgcaatt catgtagttt ctgggtcttc tgggagccta
5280cgtgagtaca tcacctaaca gaatattaaa ttagacttcc tgtaagattg ctttaagaaa
5340ctgttactgt cctgttttct aatctcttta ttaaaacagt gtatttggaa aatgttatgt
5400gctctgattt gatatagata acagattagt agttacatgg taattatgtg atataaaata
5460ttcatatatt atcaaaattc tgttttgtaa atgtaagaaa gcatagttat tttacaaatt
5520gtttttactg tcttttgaag aagttcttaa atacgttgtt aaatggtatt agttgaccag
5580ggcagtgaaa atgaaaccgc attttgggtg ccattaaata gggaaaaaac atgtaaaaaa
5640tgtaaaatgg agaccaattg cactaggcaa gtgtatattt tgtattttat atacaatttc
5700tattattttt caagtaataa aacaatgttt ttcatactga atattaaaaa aaaaaaaaaa
5760aaaaa
576591432PRTHomo sapiens 9Met Ser Glu Lys Lys Leu Glu Thr Thr Ala Gln Gln
Arg Lys Cys Pro 1 5 10
15 Glu Trp Met Asn Val Gln Asn Lys Arg Cys Ala Val Glu Glu Arg Lys
20 25 30 Ala Cys Val
Arg Lys Ser Val Phe Glu Asp Asp Leu Pro Phe Leu Glu 35
40 45 Phe Thr Gly Ser Ile Val Tyr Ser
Tyr Asp Ala Ser Asp Cys Ser Phe 50 55
60 Leu Ser Glu Asp Ile Ser Met Ser Leu Ser Asp Gly Asp
Val Val Gly 65 70 75
80 Phe Asp Met Glu Trp Pro Pro Leu Tyr Asn Arg Gly Lys Leu Gly Lys
85 90 95 Val Ala Leu Ile
Gln Leu Cys Val Ser Glu Ser Lys Cys Tyr Leu Phe 100
105 110 His Val Ser Ser Met Ser Val Phe Pro
Gln Gly Leu Lys Met Leu Leu 115 120
125 Glu Asn Lys Ala Val Lys Lys Ala Gly Val Gly Ile Glu Gly
Asp Gln 130 135 140
Trp Lys Leu Leu Arg Asp Phe Asp Ile Lys Leu Lys Asn Phe Val Glu 145
150 155 160 Leu Thr Asp Val Ala
Asn Lys Lys Leu Lys Cys Thr Glu Thr Trp Ser 165
170 175 Leu Asn Ser Leu Val Lys His Leu Leu Gly
Lys Gln Leu Leu Lys Asp 180 185
190 Lys Ser Ile Arg Cys Ser Asn Trp Ser Lys Phe Pro Leu Thr Glu
Asp 195 200 205 Gln
Lys Leu Tyr Ala Ala Thr Asp Ala Tyr Ala Gly Phe Ile Ile Tyr 210
215 220 Arg Asn Leu Glu Ile Leu
Asp Asp Thr Val Gln Arg Phe Ala Ile Asn 225 230
235 240 Lys Glu Glu Glu Ile Leu Leu Ser Asp Met Asn
Lys Gln Leu Thr Ser 245 250
255 Ile Ser Glu Glu Val Met Asp Leu Ala Lys His Leu Pro His Ala Phe
260 265 270 Ser Lys
Leu Glu Asn Pro Arg Arg Val Ser Ile Leu Leu Lys Asp Ile 275
280 285 Ser Glu Asn Leu Tyr Ser Leu
Arg Arg Met Ile Ile Gly Ser Thr Asn 290 295
300 Ile Glu Thr Glu Leu Arg Pro Ser Asn Asn Leu Asn
Leu Leu Ser Phe 305 310 315
320 Glu Asp Ser Thr Thr Gly Gly Val Gln Gln Lys Gln Ile Arg Glu His
325 330 335 Glu Val Leu
Ile His Val Glu Asp Glu Thr Trp Asp Pro Thr Leu Asp 340
345 350 His Leu Ala Lys His Asp Gly Glu
Asp Val Leu Gly Asn Lys Val Glu 355 360
365 Arg Lys Glu Asp Gly Phe Glu Asp Gly Val Glu Asp Asn
Lys Leu Lys 370 375 380
Glu Asn Met Glu Arg Ala Cys Leu Met Ser Leu Asp Ile Thr Glu His 385
390 395 400 Glu Leu Gln Ile
Leu Glu Gln Gln Ser Gln Glu Glu Tyr Leu Ser Asp 405
410 415 Ile Ala Tyr Lys Ser Thr Glu His Leu
Ser Pro Asn Asp Asn Glu Asn 420 425
430 Asp Thr Ser Tyr Val Ile Glu Ser Asp Glu Asp Leu Glu Met
Glu Met 435 440 445
Leu Lys His Leu Ser Pro Asn Asp Asn Glu Asn Asp Thr Ser Tyr Val 450
455 460 Ile Glu Ser Asp Glu
Asp Leu Glu Met Glu Met Leu Lys Ser Leu Glu 465 470
475 480 Asn Leu Asn Ser Gly Thr Val Glu Pro Thr
His Ser Lys Cys Leu Lys 485 490
495 Met Glu Arg Asn Leu Gly Leu Pro Thr Lys Glu Glu Glu Glu Asp
Asp 500 505 510 Glu
Asn Glu Ala Asn Glu Gly Glu Glu Asp Asp Asp Lys Asp Phe Leu 515
520 525 Trp Pro Ala Pro Asn Glu
Glu Gln Val Thr Cys Leu Lys Met Tyr Phe 530 535
540 Gly His Ser Ser Phe Lys Pro Val Gln Trp Lys
Val Ile His Ser Val 545 550 555
560 Leu Glu Glu Arg Arg Asp Asn Val Ala Val Met Ala Thr Gly Tyr Gly
565 570 575 Lys Ser
Leu Cys Phe Gln Tyr Pro Pro Val Tyr Val Gly Lys Ile Gly 580
585 590 Leu Val Ile Ser Pro Leu Ile
Ser Leu Met Glu Asp Gln Val Leu Gln 595 600
605 Leu Lys Met Ser Asn Ile Pro Ala Cys Phe Leu Gly
Ser Ala Gln Ser 610 615 620
Glu Asn Val Leu Thr Asp Ile Lys Leu Gly Lys Tyr Arg Ile Val Tyr 625
630 635 640 Val Thr Pro
Glu Tyr Cys Ser Gly Asn Met Gly Leu Leu Gln Gln Leu 645
650 655 Glu Ala Asp Ile Gly Ile Thr Leu
Ile Ala Val Asp Glu Ala His Cys 660 665
670 Ile Ser Glu Trp Gly His Asp Phe Arg Asp Ser Phe Arg
Lys Leu Gly 675 680 685
Ser Leu Lys Thr Ala Leu Pro Met Val Pro Ile Val Ala Leu Thr Ala 690
695 700 Ile Ala Ser Ser
Ser Ile Arg Glu Asp Ile Val Arg Cys Leu Asn Leu 705 710
715 720 Arg Asn Pro Gln Ile Thr Cys Thr Gly
Phe Asp Arg Pro Asn Leu Tyr 725 730
735 Leu Glu Val Arg Arg Lys Thr Gly Asn Ile Leu Gln Asp Leu
Gln Pro 740 745 750
Phe Leu Val Lys Thr Ser Ser His Trp Glu Phe Glu Gly Pro Thr Ile
755 760 765 Ile Tyr Cys Pro
Ser Arg Lys Met Thr Gln Gln Val Thr Gly Glu Leu 770
775 780 Arg Lys Leu Asn Leu Ser Cys Gly
Thr Tyr His Ala Gly Met Ser Phe 785 790
795 800 Ser Thr Arg Lys Asp Ile His His Arg Phe Val Arg
Asp Glu Ile Gln 805 810
815 Cys Val Ile Ala Thr Ile Ala Phe Gly Met Gly Ile Asn Lys Ala Asp
820 825 830 Ile Arg Gln
Val Ile His Tyr Gly Ala Pro Lys Asp Met Glu Ser Tyr 835
840 845 Tyr Gln Glu Ile Gly Arg Ala Gly
Arg Asp Gly Leu Gln Ser Ser Cys 850 855
860 His Val Leu Trp Ala Pro Ala Asp Ile Asn Leu Asn Arg
His Leu Leu 865 870 875
880 Thr Glu Ile Arg Asn Glu Lys Phe Arg Leu Tyr Lys Leu Lys Met Met
885 890 895 Ala Lys Met Glu
Lys Tyr Leu His Ser Ser Arg Cys Arg Arg Gln Ile 900
905 910 Ile Leu Ser His Phe Glu Asp Lys Gln
Val Gln Lys Ala Ser Leu Gly 915 920
925 Ile Met Gly Thr Glu Lys Cys Cys Asp Asn Cys Arg Ser Arg
Leu Asp 930 935 940
His Cys Tyr Ser Met Asp Asp Ser Glu Asp Thr Ser Trp Asp Phe Gly 945
950 955 960 Pro Gln Ala Phe Lys
Leu Leu Ser Ala Val Asp Ile Leu Gly Glu Lys 965
970 975 Phe Gly Ile Gly Leu Pro Ile Leu Phe Leu
Arg Gly Ser Asn Ser Gln 980 985
990 Arg Leu Ala Asp Gln Tyr Arg Arg His Ser Leu Phe Gly Thr
Gly Lys 995 1000 1005
Asp Gln Thr Glu Ser Trp Trp Lys Ala Phe Ser Arg Gln Leu Ile 1010
1015 1020 Thr Glu Gly Phe Leu
Val Glu Val Ser Arg Tyr Asn Lys Phe Met 1025 1030
1035 Lys Ile Cys Ala Leu Thr Lys Lys Gly Arg
Asn Trp Leu His Lys 1040 1045 1050
Ala Asn Thr Glu Ser Gln Ser Leu Ile Leu Gln Ala Asn Glu Glu
1055 1060 1065 Leu Cys
Pro Lys Lys Leu Leu Leu Pro Ser Ser Lys Thr Val Ser 1070
1075 1080 Ser Gly Thr Lys Glu His Cys
Tyr Asn Gln Val Pro Val Glu Leu 1085 1090
1095 Ser Thr Glu Lys Lys Ser Asn Leu Glu Lys Leu Tyr
Ser Tyr Lys 1100 1105 1110
Pro Cys Asp Lys Ile Ser Ser Gly Ser Asn Ile Ser Lys Lys Ser 1115
1120 1125 Ile Met Val Gln Ser
Pro Glu Lys Ala Tyr Ser Ser Ser Gln Pro 1130 1135
1140 Val Ile Ser Ala Gln Glu Gln Glu Thr Gln
Ile Val Leu Tyr Gly 1145 1150 1155
Lys Leu Val Glu Ala Arg Gln Lys His Ala Asn Lys Met Asp Val
1160 1165 1170 Pro Pro
Ala Ile Leu Ala Thr Asn Lys Ile Leu Val Asp Met Ala 1175
1180 1185 Lys Met Arg Pro Thr Thr Val
Glu Asn Val Lys Arg Ile Asp Gly 1190 1195
1200 Val Ser Glu Gly Lys Ala Ala Met Leu Ala Pro Leu
Leu Glu Val 1205 1210 1215
Ile Lys His Phe Cys Gln Thr Asn Ser Val Gln Thr Asp Leu Phe 1220
1225 1230 Ser Ser Thr Lys Pro
Gln Glu Glu Gln Lys Thr Ser Leu Val Ala 1235 1240
1245 Lys Asn Lys Ile Cys Thr Leu Ser Gln Ser
Met Ala Ile Thr Tyr 1250 1255 1260
Ser Leu Phe Gln Glu Lys Lys Met Pro Leu Lys Ser Ile Ala Glu
1265 1270 1275 Ser Arg
Ile Leu Pro Leu Met Thr Ile Gly Met His Leu Ser Gln 1280
1285 1290 Ala Val Lys Ala Gly Cys Pro
Leu Asp Leu Glu Arg Ala Gly Leu 1295 1300
1305 Thr Pro Glu Val Gln Lys Ile Ile Ala Asp Val Ile
Arg Asn Pro 1310 1315 1320
Pro Val Asn Ser Asp Met Ser Lys Ile Ser Leu Ile Arg Met Leu 1325
1330 1335 Val Pro Glu Asn Ile
Asp Thr Tyr Leu Ile His Met Ala Ile Glu 1340 1345
1350 Ile Leu Lys His Gly Pro Asp Ser Gly Leu
Gln Pro Ser Cys Asp 1355 1360 1365
Val Asn Lys Arg Arg Cys Phe Pro Gly Ser Glu Glu Ile Cys Ser
1370 1375 1380 Ser Ser
Lys Arg Ser Lys Glu Glu Val Gly Ile Asn Thr Glu Thr 1385
1390 1395 Ser Ser Ala Glu Arg Lys Arg
Arg Leu Pro Val Trp Phe Ala Lys 1400 1405
1410 Gly Ser Asp Thr Ser Lys Lys Leu Met Asp Lys Thr
Lys Arg Gly 1415 1420 1425
Gly Leu Phe Ser 1430 101432PRTHomo sapiens 10Met Ser Glu
Lys Lys Leu Glu Thr Thr Ala Gln Gln Arg Lys Cys Pro 1 5
10 15 Glu Trp Met Asn Val Gln Asn Lys
Arg Cys Ala Val Glu Glu Arg Lys 20 25
30 Ala Cys Val Arg Lys Ser Val Phe Glu Asp Asp Leu Pro
Phe Leu Glu 35 40 45
Phe Thr Gly Ser Ile Val Tyr Ser Tyr Asp Ala Ser Asp Cys Ser Phe 50
55 60 Leu Ser Glu Asp
Ile Ser Met Ser Leu Ser Asp Gly Asp Val Val Gly 65 70
75 80 Phe Asp Met Glu Trp Pro Pro Leu Tyr
Asn Arg Gly Lys Leu Gly Lys 85 90
95 Val Ala Leu Ile Gln Leu Cys Val Ser Glu Ser Lys Cys Tyr
Leu Phe 100 105 110
His Val Ser Ser Met Ser Val Phe Pro Gln Gly Leu Lys Met Leu Leu
115 120 125 Glu Asn Lys Ala
Val Lys Lys Ala Gly Val Gly Ile Glu Gly Asp Gln 130
135 140 Trp Lys Leu Leu Arg Asp Phe Asp
Ile Lys Leu Lys Asn Phe Val Glu 145 150
155 160 Leu Thr Asp Val Ala Asn Lys Lys Leu Lys Cys Thr
Glu Thr Trp Ser 165 170
175 Leu Asn Ser Leu Val Lys His Leu Leu Gly Lys Gln Leu Leu Lys Asp
180 185 190 Lys Ser Ile
Arg Cys Ser Asn Trp Ser Lys Phe Pro Leu Thr Glu Asp 195
200 205 Gln Lys Leu Tyr Ala Ala Thr Asp
Ala Tyr Ala Gly Phe Ile Ile Tyr 210 215
220 Arg Asn Leu Glu Ile Leu Asp Asp Thr Val Gln Arg Phe
Ala Ile Asn 225 230 235
240 Lys Glu Glu Glu Ile Leu Leu Ser Asp Met Asn Lys Gln Leu Thr Ser
245 250 255 Ile Ser Glu Glu
Val Met Asp Leu Ala Lys His Leu Pro His Ala Phe 260
265 270 Ser Lys Leu Glu Asn Pro Arg Arg Val
Ser Ile Leu Leu Lys Asp Ile 275 280
285 Ser Glu Asn Leu Tyr Ser Leu Arg Arg Met Ile Ile Gly Ser
Thr Asn 290 295 300
Ile Glu Thr Glu Leu Arg Pro Ser Asn Asn Leu Asn Leu Leu Ser Phe 305
310 315 320 Glu Asp Ser Thr Thr
Gly Gly Val Gln Gln Lys Gln Ile Arg Glu His 325
330 335 Glu Val Leu Ile His Val Glu Asp Glu Thr
Trp Asp Pro Thr Leu Asp 340 345
350 His Leu Ala Lys His Asp Gly Glu Asp Val Leu Gly Asn Lys Val
Glu 355 360 365 Arg
Lys Glu Asp Gly Phe Glu Asp Gly Val Glu Asp Asn Lys Leu Lys 370
375 380 Glu Asn Met Glu Arg Ala
Cys Leu Met Ser Leu Asp Ile Thr Glu His 385 390
395 400 Glu Leu Gln Ile Leu Glu Gln Gln Ser Gln Glu
Glu Tyr Leu Ser Asp 405 410
415 Ile Ala Tyr Lys Ser Thr Glu His Leu Ser Pro Asn Asp Asn Glu Asn
420 425 430 Asp Thr
Ser Tyr Val Ile Glu Ser Asp Glu Asp Leu Glu Met Glu Met 435
440 445 Leu Lys His Leu Ser Pro Asn
Asp Asn Glu Asn Asp Thr Ser Tyr Val 450 455
460 Ile Glu Ser Asp Glu Asp Leu Glu Met Glu Met Leu
Lys Ser Leu Glu 465 470 475
480 Asn Leu Asn Ser Gly Thr Val Glu Pro Thr His Ser Lys Cys Leu Lys
485 490 495 Met Glu Arg
Asn Leu Gly Leu Pro Thr Lys Glu Glu Glu Glu Asp Asp 500
505 510 Glu Asn Glu Ala Asn Glu Gly Glu
Glu Asp Asp Asp Lys Asp Phe Leu 515 520
525 Trp Pro Ala Pro Asn Glu Glu Gln Val Thr Cys Leu Lys
Met Tyr Phe 530 535 540
Gly His Ser Ser Phe Lys Pro Val Gln Trp Lys Val Ile His Ser Val 545
550 555 560 Leu Glu Glu Arg
Arg Asp Asn Val Ala Val Met Ala Thr Gly Tyr Gly 565
570 575 Lys Ser Leu Cys Phe Gln Tyr Pro Pro
Val Tyr Val Gly Lys Ile Gly 580 585
590 Leu Val Ile Ser Pro Leu Ile Ser Leu Met Glu Asp Gln Val
Leu Gln 595 600 605
Leu Lys Met Ser Asn Ile Pro Ala Cys Phe Leu Gly Ser Ala Gln Ser 610
615 620 Glu Asn Val Leu Thr
Asp Ile Lys Leu Gly Lys Tyr Arg Ile Val Tyr 625 630
635 640 Val Thr Pro Glu Tyr Cys Ser Gly Asn Met
Gly Leu Leu Gln Gln Leu 645 650
655 Glu Ala Asp Ile Gly Ile Thr Leu Ile Ala Val Asp Glu Ala His
Cys 660 665 670 Ile
Ser Glu Trp Gly His Asp Phe Arg Asp Ser Phe Arg Lys Leu Gly 675
680 685 Ser Leu Lys Thr Ala Leu
Pro Met Val Pro Ile Val Ala Leu Thr Ala 690 695
700 Thr Ala Ser Ser Ser Ile Arg Glu Asp Ile Val
Arg Cys Leu Asn Leu 705 710 715
720 Arg Asn Pro Gln Ile Thr Cys Thr Gly Phe Asp Arg Pro Asn Leu Tyr
725 730 735 Leu Glu
Val Arg Arg Lys Thr Gly Asn Ile Leu Gln Asp Leu Gln Pro 740
745 750 Phe Leu Val Lys Thr Ser Ser
His Trp Glu Phe Glu Gly Pro Thr Ile 755 760
765 Ile Tyr Cys Pro Ser Arg Lys Met Thr Gln Gln Val
Thr Gly Glu Leu 770 775 780
Arg Lys Leu Asn Leu Ser Cys Gly Thr Tyr His Ala Gly Met Ser Phe 785
790 795 800 Ser Thr Arg
Lys Asp Ile His His Arg Phe Val Arg Asp Glu Ile Gln 805
810 815 Cys Val Ile Ala Thr Ile Ala Phe
Gly Met Gly Ile Asn Lys Ala Asp 820 825
830 Ile Arg Gln Val Ile His Tyr Gly Ala Pro Lys Asp Met
Glu Ser Tyr 835 840 845
Tyr Gln Glu Ile Gly Arg Ala Gly Arg Asp Gly Leu Gln Ser Ser Cys 850
855 860 His Val Leu Trp
Ala Pro Ala Asp Ile Asn Leu Asn Arg His Leu Leu 865 870
875 880 Thr Glu Ile Arg Asn Glu Lys Phe Arg
Leu Tyr Lys Leu Lys Met Met 885 890
895 Ala Lys Met Glu Lys Tyr Leu His Ser Ser Arg Cys Arg Arg
Gln Ile 900 905 910
Ile Leu Ser His Phe Glu Asp Lys Gln Val Gln Lys Ala Ser Leu Gly
915 920 925 Ile Met Gly Thr
Glu Lys Cys Cys Asp Asn Cys Arg Ser Arg Leu Asp 930
935 940 His Cys Tyr Ser Met Asp Asp Ser
Glu Asp Thr Ser Trp Asp Phe Gly 945 950
955 960 Pro Gln Ala Phe Lys Leu Leu Ser Ala Val Asp Ile
Leu Gly Glu Lys 965 970
975 Phe Gly Ile Gly Leu Pro Ile Leu Phe Leu Arg Gly Ser Asn Ser Gln
980 985 990 Arg Leu Ala
Asp Gln Tyr Arg Arg His Ser Leu Phe Gly Thr Gly Lys 995
1000 1005 Asp Gln Thr Glu Ser Trp
Trp Lys Ala Phe Ser Arg Gln Leu Ile 1010 1015
1020 Thr Glu Gly Phe Leu Val Glu Val Ser Arg Tyr
Asn Lys Phe Met 1025 1030 1035
Lys Ile Cys Ala Leu Thr Lys Lys Gly Arg Asn Trp Leu His Lys
1040 1045 1050 Ala Asn Thr
Glu Ser Gln Ser Leu Ile Leu Gln Ala Asn Glu Glu 1055
1060 1065 Leu Cys Pro Lys Lys Leu Leu Leu
Pro Ser Ser Lys Thr Val Ser 1070 1075
1080 Ser Gly Thr Lys Glu His Cys Tyr Asn Gln Val Pro Val
Glu Leu 1085 1090 1095
Ser Thr Glu Lys Lys Ser Asn Leu Glu Lys Leu Tyr Ser Tyr Lys 1100
1105 1110 Pro Cys Asp Lys Ile
Ser Ser Gly Ser Asn Ile Ser Lys Lys Ser 1115 1120
1125 Ile Met Val Gln Ser Pro Glu Lys Ala Tyr
Ser Ser Ser Gln Pro 1130 1135 1140
Val Ile Ser Ala Gln Glu Gln Glu Thr Gln Ile Val Leu Tyr Gly
1145 1150 1155 Lys Leu
Val Glu Ala Arg Gln Lys His Ala Asn Lys Met Asp Val 1160
1165 1170 Pro Pro Ala Ile Leu Ala Thr
Asn Lys Ile Leu Val Asp Met Ala 1175 1180
1185 Lys Met Arg Pro Thr Thr Val Glu Asn Val Lys Arg
Ile Asp Gly 1190 1195 1200
Val Ser Glu Gly Lys Ala Ala Met Leu Ala Pro Leu Leu Glu Val 1205
1210 1215 Ile Lys His Phe Cys
Gln Thr Asn Ser Val Gln Thr Asp Leu Phe 1220 1225
1230 Ser Ser Thr Lys Pro Gln Glu Glu Gln Lys
Thr Ser Leu Val Ala 1235 1240 1245
Lys Asn Lys Ile Cys Thr Leu Ser Gln Ser Met Ala Ile Thr Tyr
1250 1255 1260 Ser Leu
Phe Gln Glu Lys Lys Met Pro Leu Lys Ser Ile Ala Glu 1265
1270 1275 Ser Arg Ile Leu Pro Leu Met
Thr Ile Gly Met His Leu Ser Gln 1280 1285
1290 Ala Val Lys Ala Gly Cys Pro Leu Asp Leu Glu Arg
Ala Gly Leu 1295 1300 1305
Thr Pro Glu Val Gln Lys Ile Ile Ala Asp Val Ile Arg Asn Pro 1310
1315 1320 Pro Val Asn Ser Asp
Met Ser Lys Ile Ser Leu Ile Arg Met Leu 1325 1330
1335 Val Pro Glu Asn Ile Asp Thr Tyr Leu Ile
His Met Ala Ile Glu 1340 1345 1350
Ile Leu Lys His Gly Pro Asp Ser Gly Leu Gln Pro Ser Cys Asp
1355 1360 1365 Val Asn
Lys Arg Arg Cys Phe Pro Gly Ser Glu Glu Ile Cys Ser 1370
1375 1380 Ser Ser Lys Arg Ser Lys Glu
Glu Val Gly Ile Asn Thr Glu Thr 1385 1390
1395 Ser Ser Ala Glu Arg Lys Arg Arg Leu Pro Val Trp
Phe Ala Lys 1400 1405 1410
Gly Ser Asp Thr Ser Lys Lys Leu Met Asp Lys Thr Lys Arg Gly 1415
1420 1425 Gly Leu Phe Ser
1430 114296DNAHomo sapiens 11atgagtgaaa aaaaattgga aacaactgca
cagcagcgga aatgtcctga atggatgaat 60gtgcagaata aaagatgtgc tgtagaagaa
agaaaggcat gtgttcggaa gagtgttttt 120gaagatgacc tccccttctt agaattcact
ggatccattg tgtatagtta cgatgctagt 180gattgctctt tcctgtcaga agatattagc
atgagtctat cagatgggga tgtggtggga 240tttgacatgg agtggccacc attatacaat
agagggaaac ttggcaaagt tgcactaatt 300cagttgtgtg tttctgagag caaatgttac
ttgttccacg tttcttccat gtcagttttt 360ccccagggat taaaaatgtt gcttgaaaat
aaagcagtta aaaaggcagg tgtaggaatt 420gaaggagatc agtggaaact tctacgtgac
tttgatatca aattgaagaa ttttgtggag 480ttgacagatg ttgccaataa aaagctgaaa
tgcacagaga cctggagcct taacagtctg 540gttaaacacc tcttaggtaa acagctcctg
aaagacaagt ctatccgctg tagcaattgg 600agtaaatttc ctctcactga ggaccagaaa
ctgtatgcag ccactgatgc ttatgctggt 660tttattattt accgaaattt agagattttg
gatgatactg tgcaaaggtt tgctataaat 720aaagaggaag aaatcctact tagcgacatg
aacaaacagt tgacttcaat ctctgaggaa 780gtgatggatc tggctaagca tcttcctcat
gctttcagta aattggaaaa cccacggagg 840gtttctatct tactaaagga tatttcagaa
aatctatatt cactgaggag gatgataatt 900gggtctacta acattgagac tgaactgagg
cccagcaata atttaaactt attatccttt 960gaagattcaa ctactggggg agtacaacag
aaacaaatta gagaacatga agttttaatt 1020cacgttgaag atgaaacatg ggacccaaca
cttgatcatt tagctaaaca tgatggagaa 1080gatgtacttg gaaataaagt ggaacgaaaa
gaagatggat ttgaagatgg agtagaagac 1140aacaaattga aagagaatat ggaaagagct
tgtttgatgt cgttagatat tacagaacat 1200gaactccaaa ttttggaaca gcagtctcag
gaagaatatc ttagtgatat tgcttataaa 1260tctactgagc atttatctcc caatgataat
gaaaacgata cgtcctatgt aattgagagt 1320gatgaagatt tagaaatgga gatgcttaag
catttatctc ccaatgataa tgaaaacgat 1380acgtcctatg taattgagag tgatgaagat
ttagaaatgg agatgcttaa gtctttagaa 1440aacctcaata gtggcacggt agaaccaact
cattctaaat gcttaaaaat ggaaagaaat 1500ctgggtcttc ctactaaaga agaagaagaa
gatgatgaaa atgaagctaa tgaaggggaa 1560gaagatgatg ataaggactt tttgtggcca
gcacccaatg aagagcaagt tacttgcctc 1620aagatgtact ttggccattc cagttttaaa
ccagttcagt ggaaagtgat tcattcagta 1680ttagaagaaa gaagagataa tgttgctgtc
atggcaactg gatatggaaa gagtttgtgc 1740ttccagtatc cacctgttta tgtaggcaag
attggccttg ttatctctcc ccttatttct 1800ctgatggaag accaagtgct acagcttaaa
atgtccaaca tcccagcttg cttccttgga 1860tcagcacagt cagaaaatgt tctaacagat
attaaattag gtaaataccg gattgtatac 1920gtaactccag aatactgttc aggtaacatg
ggcctgctcc agcaacttga ggctgatatt 1980ggtatcacgc tcattgctgt ggatgaggct
cactgtattt ctgagtgggg gcatgatttt 2040agggattcat tcaggaagtt gggctcccta
aagacagcac tgccaatggt tccaatcgtt 2100gcacttactg ctactgcaag ttcttcaatc
cgggaagaca ttgtacgttg cttaaatctg 2160agaaatcctc agatcacctg tactggtttt
gatcgaccaa acctgtattt agaagttagg 2220cgaaaaacag ggaatatcct tcaggatctg
cagccatttc ttgtcaaaac aagttcccac 2280tgggaatttg aaggtccaac aatcatctac
tgtccttcta gaaaaatgac acaacaagtt 2340acaggtgaac ttaggaaact gaatctatcc
tgtggaacat accatgcggg catgagtttt 2400agcacaagga aagacattca tcataggttt
gtaagagatg aaattcagtg tgtcatagct 2460accatagctt ttggaatggg cattaataaa
gctgacattc gccaagtcat tcattacggt 2520gctcctaagg acatggaatc atattatcag
gagattggta gagctggtcg tgatggactt 2580caaagttctt gtcacgtcct ctgggctcct
gcagacatta acttaaatag gcaccttctt 2640actgagatac gtaatgagaa gtttcgatta
tacaaattaa agatgatggc aaagatggaa 2700aaatatcttc attctagcag atgtaggaga
caaatcatct tgtctcattt tgaggacaaa 2760caagtacaaa aagcctcctt gggaattatg
ggaactgaaa aatgctgtga taattgcagg 2820tccagattgg atcattgcta ttccatggat
gactcagagg atacatcctg ggactttggt 2880ccacaagcat ttaagctttt gtctgctgtg
gacatcttag gcgaaaaatt tggaattggg 2940cttccaattt tatttctccg aggatctaat
tctcagcgtc ttgccgatca atatcgcagg 3000cacagtttat ttggcactgg caaggatcaa
acagagagtt ggtggaaggc tttttcccgt 3060cagctgatca ctgagggatt cttggtagaa
gtttctcggt ataacaaatt tatgaagatt 3120tgcgccctta cgaaaaaggg tagaaattgg
cttcataaag ctaatacaga atctcagagc 3180ctcatccttc aagctaatga agaattgtgt
ccaaagaagt tgcttctgcc tagttcgaaa 3240actgtatctt cgggcaccaa agagcattgt
tataatcaag taccagttga attaagtaca 3300gagaagaagt ctaacttgga gaagttatat
tcttataaac catgtgataa gatttcttct 3360gggagtaaca tttctaaaaa aagtatcatg
gtacagtcac cagaaaaagc ttacagttcc 3420tcacagcctg ttatttcggc acaagagcag
gagactcaga ttgtgttata tggcaaattg 3480gtagaagcta ggcagaaaca tgccaataaa
atggatgttc ccccagctat tctggcaaca 3540aacaagatac tggtggatat ggccaaaatg
agaccaacta cggttgaaaa cgtaaaaagg 3600attgatggtg tttctgaagg caaagctgcc
atgttggccc ctctgttgga agtcatcaaa 3660catttctgcc aaacaaatag tgttcagaca
gacctctttt caagtacaaa acctcaagaa 3720gaacagaaga cgagtctggt agcaaaaaat
aaaatatgca cactttcaca gtctatggcc 3780atcacatact ctttattcca agaaaagaag
atgcctttga agagcatagc tgagagcagg 3840attctgcctc tcatgacaat tggcatgcac
ttataccaag cggtgaaagc tggctgcccc 3900cttgatttgg agcgagcagg cctgactcca
gaggttcaga agattattgc tgatgttatc 3960cgaaaccctc ccgtcaactc agatatgagt
aaaattagcc taatcagaat gttagttcct 4020gaaaacattg acacgtacct tatccacatg
gcaattgaga tccttaaaca tggtcctgac 4080agcggacttc aaccttcatg tgatgtcaac
aaaaggagat gttttcccgg ttctgaagag 4140atctgttcaa gttctaagag aagcaaggaa
gaagtaggca tcaatactga gacttcatct 4200gcagagagaa agagacgatt acctgtgtgg
tttgccaaag gaagtgatac cagcaagaaa 4260ttaatggaca aaacgaaaag gggaggtctt
tttagt 4296121432PRTHomo sapiens 12Met Ser
Glu Lys Lys Leu Glu Thr Thr Ala Gln Gln Arg Lys Cys Pro 1 5
10 15 Glu Trp Met Asn Val Gln Asn
Lys Arg Cys Ala Val Glu Glu Arg Lys 20 25
30 Ala Cys Val Arg Lys Ser Val Phe Glu Asp Asp Leu
Pro Phe Leu Glu 35 40 45
Phe Thr Gly Ser Ile Val Tyr Ser Tyr Asp Ala Ser Asp Cys Ser Phe
50 55 60 Leu Ser Glu
Asp Ile Ser Met Ser Leu Ser Asp Gly Asp Val Val Gly 65
70 75 80 Phe Asp Met Glu Trp Pro Pro
Leu Tyr Asn Arg Gly Lys Leu Gly Lys 85
90 95 Val Ala Leu Ile Gln Leu Cys Val Ser Glu Ser
Lys Cys Tyr Leu Phe 100 105
110 His Val Ser Ser Met Ser Val Phe Pro Gln Gly Leu Lys Met Leu
Leu 115 120 125 Glu
Asn Lys Ala Val Lys Lys Ala Gly Val Gly Ile Glu Gly Asp Gln 130
135 140 Trp Lys Leu Leu Arg Asp
Phe Asp Ile Lys Leu Lys Asn Phe Val Glu 145 150
155 160 Leu Thr Asp Val Ala Asn Lys Lys Leu Lys Cys
Thr Glu Thr Trp Ser 165 170
175 Leu Asn Ser Leu Val Lys His Leu Leu Gly Lys Gln Leu Leu Lys Asp
180 185 190 Lys Ser
Ile Arg Cys Ser Asn Trp Ser Lys Phe Pro Leu Thr Glu Asp 195
200 205 Gln Lys Leu Tyr Ala Ala Thr
Asp Ala Tyr Ala Gly Phe Ile Ile Tyr 210 215
220 Arg Asn Leu Glu Ile Leu Asp Asp Thr Val Gln Arg
Phe Ala Ile Asn 225 230 235
240 Lys Glu Glu Glu Ile Leu Leu Ser Asp Met Asn Lys Gln Leu Thr Ser
245 250 255 Ile Ser Glu
Glu Val Met Asp Leu Ala Lys His Leu Pro His Ala Phe 260
265 270 Ser Lys Leu Glu Asn Pro Arg Arg
Val Ser Ile Leu Leu Lys Asp Ile 275 280
285 Ser Glu Asn Leu Tyr Ser Leu Arg Arg Met Ile Ile Gly
Ser Thr Asn 290 295 300
Ile Glu Thr Glu Leu Arg Pro Ser Asn Asn Leu Asn Leu Leu Ser Phe 305
310 315 320 Glu Asp Ser Thr
Thr Gly Gly Val Gln Gln Lys Gln Ile Arg Glu His 325
330 335 Glu Val Leu Ile His Val Glu Asp Glu
Thr Trp Asp Pro Thr Leu Asp 340 345
350 His Leu Ala Lys His Asp Gly Glu Asp Val Leu Gly Asn Lys
Val Glu 355 360 365
Arg Lys Glu Asp Gly Phe Glu Asp Gly Val Glu Asp Asn Lys Leu Lys 370
375 380 Glu Asn Met Glu Arg
Ala Cys Leu Met Ser Leu Asp Ile Thr Glu His 385 390
395 400 Glu Leu Gln Ile Leu Glu Gln Gln Ser Gln
Glu Glu Tyr Leu Ser Asp 405 410
415 Ile Ala Tyr Lys Ser Thr Glu His Leu Ser Pro Asn Asp Asn Glu
Asn 420 425 430 Asp
Thr Ser Tyr Val Ile Glu Ser Asp Glu Asp Leu Glu Met Glu Met 435
440 445 Leu Lys His Leu Ser Pro
Asn Asp Asn Glu Asn Asp Thr Ser Tyr Val 450 455
460 Ile Glu Ser Asp Glu Asp Leu Glu Met Glu Met
Leu Lys Ser Leu Glu 465 470 475
480 Asn Leu Asn Ser Gly Thr Val Glu Pro Thr His Ser Lys Cys Leu Lys
485 490 495 Met Glu
Arg Asn Leu Gly Leu Pro Thr Lys Glu Glu Glu Glu Asp Asp 500
505 510 Glu Asn Glu Ala Asn Glu Gly
Glu Glu Asp Asp Asp Lys Asp Phe Leu 515 520
525 Trp Pro Ala Pro Asn Glu Glu Gln Val Thr Cys Leu
Lys Met Tyr Phe 530 535 540
Gly His Ser Ser Phe Lys Pro Val Gln Trp Lys Val Ile His Ser Val 545
550 555 560 Leu Glu Glu
Arg Arg Asp Asn Val Ala Val Met Ala Thr Gly Tyr Gly 565
570 575 Lys Ser Leu Cys Phe Gln Tyr Pro
Pro Val Tyr Val Gly Lys Ile Gly 580 585
590 Leu Val Ile Ser Pro Leu Ile Ser Leu Met Glu Asp Gln
Val Leu Gln 595 600 605
Leu Lys Met Ser Asn Ile Pro Ala Cys Phe Leu Gly Ser Ala Gln Ser 610
615 620 Glu Asn Val Leu
Thr Asp Ile Lys Leu Gly Lys Tyr Arg Ile Val Tyr 625 630
635 640 Val Thr Pro Glu Tyr Cys Ser Gly Asn
Met Gly Leu Leu Gln Gln Leu 645 650
655 Glu Ala Asp Ile Gly Ile Thr Leu Ile Ala Val Asp Glu Ala
His Cys 660 665 670
Ile Ser Glu Trp Gly His Asp Phe Arg Asp Ser Phe Arg Lys Leu Gly
675 680 685 Ser Leu Lys Thr
Ala Leu Pro Met Val Pro Ile Val Ala Leu Thr Ala 690
695 700 Thr Ala Ser Ser Ser Ile Arg Glu
Asp Ile Val Arg Cys Leu Asn Leu 705 710
715 720 Arg Asn Pro Gln Ile Thr Cys Thr Gly Phe Asp Arg
Pro Asn Leu Tyr 725 730
735 Leu Glu Val Arg Arg Lys Thr Gly Asn Ile Leu Gln Asp Leu Gln Pro
740 745 750 Phe Leu Val
Lys Thr Ser Ser His Trp Glu Phe Glu Gly Pro Thr Ile 755
760 765 Ile Tyr Cys Pro Ser Arg Lys Met
Thr Gln Gln Val Thr Gly Glu Leu 770 775
780 Arg Lys Leu Asn Leu Ser Cys Gly Thr Tyr His Ala Gly
Met Ser Phe 785 790 795
800 Ser Thr Arg Lys Asp Ile His His Arg Phe Val Arg Asp Glu Ile Gln
805 810 815 Cys Val Ile Ala
Thr Ile Ala Phe Gly Met Gly Ile Asn Lys Ala Asp 820
825 830 Ile Arg Gln Val Ile His Tyr Gly Ala
Pro Lys Asp Met Glu Ser Tyr 835 840
845 Tyr Gln Glu Ile Gly Arg Ala Gly Arg Asp Gly Leu Gln Ser
Ser Cys 850 855 860
His Val Leu Trp Ala Pro Ala Asp Ile Asn Leu Asn Arg His Leu Leu 865
870 875 880 Thr Glu Ile Arg Asn
Glu Lys Phe Arg Leu Tyr Lys Leu Lys Met Met 885
890 895 Ala Lys Met Glu Lys Tyr Leu His Ser Ser
Arg Cys Arg Arg Gln Ile 900 905
910 Ile Leu Ser His Phe Glu Asp Lys Gln Val Gln Lys Ala Ser Leu
Gly 915 920 925 Ile
Met Gly Thr Glu Lys Cys Cys Asp Asn Cys Arg Ser Arg Leu Asp 930
935 940 His Cys Tyr Ser Met Asp
Asp Ser Glu Asp Thr Ser Trp Asp Phe Gly 945 950
955 960 Pro Gln Ala Phe Lys Leu Leu Ser Ala Val Asp
Ile Leu Gly Glu Lys 965 970
975 Phe Gly Ile Gly Leu Pro Ile Leu Phe Leu Arg Gly Ser Asn Ser Gln
980 985 990 Arg Leu
Ala Asp Gln Tyr Arg Arg His Ser Leu Phe Gly Thr Gly Lys 995
1000 1005 Asp Gln Thr Glu Ser
Trp Trp Lys Ala Phe Ser Arg Gln Leu Ile 1010 1015
1020 Thr Glu Gly Phe Leu Val Glu Val Ser Arg
Tyr Asn Lys Phe Met 1025 1030 1035
Lys Ile Cys Ala Leu Thr Lys Lys Gly Arg Asn Trp Leu His Lys
1040 1045 1050 Ala Asn
Thr Glu Ser Gln Ser Leu Ile Leu Gln Ala Asn Glu Glu 1055
1060 1065 Leu Cys Pro Lys Lys Leu Leu
Leu Pro Ser Ser Lys Thr Val Ser 1070 1075
1080 Ser Gly Thr Lys Glu His Cys Tyr Asn Gln Val Pro
Val Glu Leu 1085 1090 1095
Ser Thr Glu Lys Lys Ser Asn Leu Glu Lys Leu Tyr Ser Tyr Lys 1100
1105 1110 Pro Cys Asp Lys Ile
Ser Ser Gly Ser Asn Ile Ser Lys Lys Ser 1115 1120
1125 Ile Met Val Gln Ser Pro Glu Lys Ala Tyr
Ser Ser Ser Gln Pro 1130 1135 1140
Val Ile Ser Ala Gln Glu Gln Glu Thr Gln Ile Val Leu Tyr Gly
1145 1150 1155 Lys Leu
Val Glu Ala Arg Gln Lys His Ala Asn Lys Met Asp Val 1160
1165 1170 Pro Pro Ala Ile Leu Ala Thr
Asn Lys Ile Leu Val Asp Met Ala 1175 1180
1185 Lys Met Arg Pro Thr Thr Val Glu Asn Val Lys Arg
Ile Asp Gly 1190 1195 1200
Val Ser Glu Gly Lys Ala Ala Met Leu Ala Pro Leu Leu Glu Val 1205
1210 1215 Ile Lys His Phe Cys
Gln Thr Asn Ser Val Gln Thr Asp Leu Phe 1220 1225
1230 Ser Ser Thr Lys Pro Gln Glu Glu Gln Lys
Thr Ser Leu Val Ala 1235 1240 1245
Lys Asn Lys Ile Cys Thr Leu Ser Gln Ser Met Ala Ile Thr Tyr
1250 1255 1260 Ser Leu
Phe Gln Glu Lys Lys Met Pro Leu Lys Ser Ile Ala Glu 1265
1270 1275 Ser Arg Ile Leu Pro Leu Met
Thr Ile Gly Met His Leu Tyr Gln 1280 1285
1290 Ala Val Lys Ala Gly Cys Pro Leu Asp Leu Glu Arg
Ala Gly Leu 1295 1300 1305
Thr Pro Glu Val Gln Lys Ile Ile Ala Asp Val Ile Arg Asn Pro 1310
1315 1320 Pro Val Asn Ser Asp
Met Ser Lys Ile Ser Leu Ile Arg Met Leu 1325 1330
1335 Val Pro Glu Asn Ile Asp Thr Tyr Leu Ile
His Met Ala Ile Glu 1340 1345 1350
Ile Leu Lys His Gly Pro Asp Ser Gly Leu Gln Pro Ser Cys Asp
1355 1360 1365 Val Asn
Lys Arg Arg Cys Phe Pro Gly Ser Glu Glu Ile Cys Ser 1370
1375 1380 Ser Ser Lys Arg Ser Lys Glu
Glu Val Gly Ile Asn Thr Glu Thr 1385 1390
1395 Ser Ser Ala Glu Arg Lys Arg Arg Leu Pro Val Trp
Phe Ala Lys 1400 1405 1410
Gly Ser Asp Thr Ser Lys Lys Leu Met Asp Lys Thr Lys Arg Gly 1415
1420 1425 Gly Leu Phe Ser
1430 134296DNAHomo sapiens 13atgagtgaaa aaaaattgga aacaactgca
cagcagcgga aatgtcctga atggatgaat 60gtgcagaata aaagatgtgc tgtagaagaa
agaaaggcat gtgttcggaa gagtgttttt 120gaagatgacc tccccttctt agaattcact
ggatccattg tgtatagtta cgatgctagt 180gattgctctt tcctgtcaga agatattagc
atgagtctat cagatgggga tgtggtggga 240tttgacatgg agtggccacc attatacaat
agagggaaac ttggcaaagt tgcactaatt 300cagttgtgtg tttctgagag caaatgttac
ttgttccacg tttcttccat gtcagttttt 360ccccagggat taaaaatgtt gcttgaaaat
aaagcagtta aaaaggcagg tgtaggaatt 420gaaggagatc agtggaaact tctacgtgac
tttgatatca aattgaagaa ttttgtggag 480ttgacagatg ttgccaataa aaagctgaaa
tgcacagaga cctggagcct taacagtctg 540gttaaacacc tcttaggtaa acagctcctg
aaagacaagt ctatccgctg tagcaattgg 600agtaaatttc ctctcactga ggaccagaaa
ctgtatgcag ccactgatgc ttatgctggt 660tttattattt accgaaattt agagattttg
gatgatactg tgcaaaggtt tgctataaat 720aaagaggaag aaatcctact tagcgacatg
aacaaacagt tgacttcaat ctctgaggaa 780gtgatggatc tggctaagca tcttcctcat
gctttcagta aattggaaaa cccacggagg 840gtttctatct tactaaagga tatttcagaa
aatctatatt cactgaggag gatgataatt 900gggtctacta acattgagac tgaactgagg
cccagcaata atttaaactt attatccttt 960gaagattcaa ctactggggg agtacaacag
aaacaaatta gagaacatga agttttaatt 1020cacgttgaag atgaaacatg ggacccaaca
cttgatcatt tagctaaaca tgatggagaa 1080gatgtacttg gaaataaagt ggaacgaaaa
gaagatggat ttgaagatgg agtagaagac 1140aacaaattga aagagaatat ggaaagagct
tgtttgatgt cgttagatat tacagaacat 1200gaactccaaa ttttggaaca gcagtctcag
gaagaatatc ttagtgatat tgcttataaa 1260tctactgagc atttatctcc caatgataat
gaaaacgata cgtcctatgt aattgagagt 1320gatgaagatt tagaaatgga gatgcttaag
catttatctc ccaatgataa tgaaaacgat 1380acgtcctatg taattgagag tgatgaagat
ttagaaatgg agatgcttaa gtctttagaa 1440aacctcaata gtggcacggt agaaccaact
cattctaaat gcttaaaaat ggaaagaaat 1500ctgggtcttc ctactaaaga agaagaagaa
gatgatgaaa atgaagctaa tgaaggggaa 1560gaagatgatg ataaggactt tttgtggcca
gcacccaatg aagagcaagt tacttgcctc 1620aagatgtact ttggccattc cagttttaaa
ccagttcagt ggaaagtgat tcattcagta 1680ttagaagaaa gaagagataa tgttgctgtc
atggcaactg gatatggaaa gagtttgtgc 1740ttccagtatc cacctgttta tgtaggcaag
attggccttg ttatctctcc ccttatttct 1800ctgatggaag accaagtgct acagcttaaa
atgtccaaca tcccagcttg cttccttgga 1860tcagcacagt cagaaaatgt tctaacagat
attaaattag gtaaataccg gattgtatac 1920gtaactccag aatactgttc aggtaacatg
ggcctgctcc agcaacttga ggctgatatt 1980ggtatcacgc tcattgctgt ggatgaggct
cactgtattt ctgagtgggg gcatgatttt 2040agggattcat tcaggaagtt gggctcccta
aagacagcac tgccaatggt tccaatcgtt 2100gcacttactg ctattgcaag ttcttcaatc
cgggaagaca ttgtacgttg cttaaatctg 2160agaaatcctc agatcacctg tactggtttt
gatcgaccaa acctgtattt agaagttagg 2220cgaaaaacag ggaatatcct tcaggatctg
cagccatttc ttgtcaaaac aagttcccac 2280tgggaatttg aaggtccaac aatcatctac
tgtccttcta gaaaaatgac acaacaagtt 2340acaggtgaac ttaggaaact gaatctatcc
tgtggaacat accatgcggg catgagtttt 2400agcacaagga aagacattca tcataggttt
gtaagagatg aaattcagtg tgtcatagct 2460accatagctt ttggaatggg cattaataaa
gctgacattc gccaagtcat tcattacggt 2520gctcctaagg acatggaatc atattatcag
gagattggta gagctggtcg tgatggactt 2580caaagttctt gtcacgtcct ctgggctcct
gcagacatta acttaaatag gcaccttctt 2640actgagatac gtaatgagaa gtttcgatta
tacaaattaa agatgatggc aaagatggaa 2700aaatatcttc attctagcag atgtaggaga
caaatcatct tgtctcattt tgaggacaaa 2760caagtacaaa aagcctcctt gggaattatg
ggaactgaaa aatgctgtga taattgcagg 2820tccagattgg atcattgcta ttccatggat
gactcagagg atacatcctg ggactttggt 2880ccacaagcat ttaagctttt gtctgctgtg
gacatcttag gcgaaaaatt tggaattggg 2940cttccaattt tatttctccg aggatctaat
tctcagcgtc ttgccgatca atatcgcagg 3000cacagtttat ttggcactgg caaggatcaa
acagagagtt ggtggaaggc tttttcccgt 3060cagctgatca ctgagggatt cttggtagaa
gtttctcggt ataacaaatt tatgaagatt 3120tgcgccctta cgaaaaaggg tagaaattgg
cttcataaag ctaatacaga atctcagagc 3180ctcatccttc aagctaatga agaattgtgt
ccaaagaagt tgcttctgcc tagttcgaaa 3240actgtatctt cgggcaccaa agagcattgt
tataatcaag taccagttga attaagtaca 3300gagaagaagt ctaacttgga gaagttatat
tcttataaac catgtgataa gatttcttct 3360gggagtaaca tttctaaaaa aagtatcatg
gtacagtcac cagaaaaagc ttacagttcc 3420tcacagcctg ttatttcggc acaagagcag
gagactcaga ttgtgttata tggcaaattg 3480gtagaagcta ggcagaaaca tgccaataaa
atggatgttc ccccagctat tctggcaaca 3540aacaagatac tggtggatat ggccaaaatg
agaccaacta cggttgaaaa cgtaaaaagg 3600attgatggtg tttctgaagg caaagctgcc
atgttggccc ctctgttgga agtcatcaaa 3660catttctgcc aaacaaatag tgttcagaca
gacctctttt caagtacaaa acctcaagaa 3720gaacagaaga cgagtctggt agcaaaaaat
aaaatatgca cactttcaca gtctatggcc 3780atcacatact ctttattcca agaaaagaag
atgcctttga agagcatagc tgagagcagg 3840attctgcctc tcatgacaat tggcatgcac
ttataccaag cggtgaaagc tggctgcccc 3900cttgatttgg agcgagcagg cctgactcca
gaggttcaga agattattgc tgatgttatc 3960cgaaaccctc ccgtcaactc agatatgagt
aaaattagcc taatcagaat gttagttcct 4020gaaaacattg acacgtacct tatccacatg
gcaattgaga tccttaaaca tggtcctgac 4080agcggacttc aaccttcatg tgatgtcaac
aaaaggagat gttttcccgg ttctgaagag 4140atctgttcaa gttctaagag aagcaaggaa
gaagtaggca tcaatactga gacttcatct 4200gcagagagaa agagacgatt acctgtgtgg
tttgccaaag gaagtgatac cagcaagaaa 4260ttaatggaca aaacgaaaag gggaggtctt
tttagt 4296141432PRTHomo sapiens 14Met Ser
Glu Lys Lys Leu Glu Thr Thr Ala Gln Gln Arg Lys Cys Pro 1 5
10 15 Glu Trp Met Asn Val Gln Asn
Lys Arg Cys Ala Val Glu Glu Arg Lys 20 25
30 Ala Cys Val Arg Lys Ser Val Phe Glu Asp Asp Leu
Pro Phe Leu Glu 35 40 45
Phe Thr Gly Ser Ile Val Tyr Ser Tyr Asp Ala Ser Asp Cys Ser Phe
50 55 60 Leu Ser Glu
Asp Ile Ser Met Ser Leu Ser Asp Gly Asp Val Val Gly 65
70 75 80 Phe Asp Met Glu Trp Pro Pro
Leu Tyr Asn Arg Gly Lys Leu Gly Lys 85
90 95 Val Ala Leu Ile Gln Leu Cys Val Ser Glu Ser
Lys Cys Tyr Leu Phe 100 105
110 His Val Ser Ser Met Ser Val Phe Pro Gln Gly Leu Lys Met Leu
Leu 115 120 125 Glu
Asn Lys Ala Val Lys Lys Ala Gly Val Gly Ile Glu Gly Asp Gln 130
135 140 Trp Lys Leu Leu Arg Asp
Phe Asp Ile Lys Leu Lys Asn Phe Val Glu 145 150
155 160 Leu Thr Asp Val Ala Asn Lys Lys Leu Lys Cys
Thr Glu Thr Trp Ser 165 170
175 Leu Asn Ser Leu Val Lys His Leu Leu Gly Lys Gln Leu Leu Lys Asp
180 185 190 Lys Ser
Ile Arg Cys Ser Asn Trp Ser Lys Phe Pro Leu Thr Glu Asp 195
200 205 Gln Lys Leu Tyr Ala Ala Thr
Asp Ala Tyr Ala Gly Phe Ile Ile Tyr 210 215
220 Arg Asn Leu Glu Ile Leu Asp Asp Thr Val Gln Arg
Phe Ala Ile Asn 225 230 235
240 Lys Glu Glu Glu Ile Leu Leu Ser Asp Met Asn Lys Gln Leu Thr Ser
245 250 255 Ile Ser Glu
Glu Val Met Asp Leu Ala Lys His Leu Pro His Ala Phe 260
265 270 Ser Lys Leu Glu Asn Pro Arg Arg
Val Ser Ile Leu Leu Lys Asp Ile 275 280
285 Ser Glu Asn Leu Tyr Ser Leu Arg Arg Met Ile Ile Gly
Ser Thr Asn 290 295 300
Ile Glu Thr Glu Leu Arg Pro Ser Asn Asn Leu Asn Leu Leu Ser Phe 305
310 315 320 Glu Asp Ser Thr
Thr Gly Gly Val Gln Gln Lys Gln Ile Arg Glu His 325
330 335 Glu Val Leu Ile His Val Glu Asp Glu
Thr Trp Asp Pro Thr Leu Asp 340 345
350 His Leu Ala Lys His Asp Gly Glu Asp Val Leu Gly Asn Lys
Val Glu 355 360 365
Arg Lys Glu Asp Gly Phe Glu Asp Gly Val Glu Asp Asn Lys Leu Lys 370
375 380 Glu Asn Met Glu Arg
Ala Cys Leu Met Ser Leu Asp Ile Thr Glu His 385 390
395 400 Glu Leu Gln Ile Leu Glu Gln Gln Ser Gln
Glu Glu Tyr Leu Ser Asp 405 410
415 Ile Ala Tyr Lys Ser Thr Glu His Leu Ser Pro Asn Asp Asn Glu
Asn 420 425 430 Asp
Thr Ser Tyr Val Ile Glu Ser Asp Glu Asp Leu Glu Met Glu Met 435
440 445 Leu Lys His Leu Ser Pro
Asn Asp Asn Glu Asn Asp Thr Ser Tyr Val 450 455
460 Ile Glu Ser Asp Glu Asp Leu Glu Met Glu Met
Leu Lys Ser Leu Glu 465 470 475
480 Asn Leu Asn Ser Gly Thr Val Glu Pro Thr His Ser Lys Cys Leu Lys
485 490 495 Met Glu
Arg Asn Leu Gly Leu Pro Thr Lys Glu Glu Glu Glu Asp Asp 500
505 510 Glu Asn Glu Ala Asn Glu Gly
Glu Glu Asp Asp Asp Lys Asp Phe Leu 515 520
525 Trp Pro Ala Pro Asn Glu Glu Gln Val Thr Cys Leu
Lys Met Tyr Phe 530 535 540
Gly His Ser Ser Phe Lys Pro Val Gln Trp Lys Val Ile His Ser Val 545
550 555 560 Leu Glu Glu
Arg Arg Asp Asn Val Ala Val Met Ala Thr Gly Tyr Gly 565
570 575 Lys Ser Leu Cys Phe Gln Tyr Pro
Pro Val Tyr Val Gly Lys Ile Gly 580 585
590 Leu Val Ile Ser Pro Leu Ile Ser Leu Met Glu Asp Gln
Val Leu Gln 595 600 605
Leu Lys Met Ser Asn Ile Pro Ala Cys Phe Leu Gly Ser Ala Gln Ser 610
615 620 Glu Asn Val Leu
Thr Asp Ile Lys Leu Gly Lys Tyr Arg Ile Val Tyr 625 630
635 640 Val Thr Pro Glu Tyr Cys Ser Gly Asn
Met Gly Leu Leu Gln Gln Leu 645 650
655 Glu Ala Asp Ile Gly Ile Thr Leu Ile Ala Val Asp Glu Ala
His Cys 660 665 670
Ile Ser Glu Trp Gly His Asp Phe Arg Asp Ser Phe Arg Lys Leu Gly
675 680 685 Ser Leu Lys Thr
Ala Leu Pro Met Val Pro Ile Val Ala Leu Thr Ala 690
695 700 Ile Ala Ser Ser Ser Ile Arg Glu
Asp Ile Val Arg Cys Leu Asn Leu 705 710
715 720 Arg Asn Pro Gln Ile Thr Cys Thr Gly Phe Asp Arg
Pro Asn Leu Tyr 725 730
735 Leu Glu Val Arg Arg Lys Thr Gly Asn Ile Leu Gln Asp Leu Gln Pro
740 745 750 Phe Leu Val
Lys Thr Ser Ser His Trp Glu Phe Glu Gly Pro Thr Ile 755
760 765 Ile Tyr Cys Pro Ser Arg Lys Met
Thr Gln Gln Val Thr Gly Glu Leu 770 775
780 Arg Lys Leu Asn Leu Ser Cys Gly Thr Tyr His Ala Gly
Met Ser Phe 785 790 795
800 Ser Thr Arg Lys Asp Ile His His Arg Phe Val Arg Asp Glu Ile Gln
805 810 815 Cys Val Ile Ala
Thr Ile Ala Phe Gly Met Gly Ile Asn Lys Ala Asp 820
825 830 Ile Arg Gln Val Ile His Tyr Gly Ala
Pro Lys Asp Met Glu Ser Tyr 835 840
845 Tyr Gln Glu Ile Gly Arg Ala Gly Arg Asp Gly Leu Gln Ser
Ser Cys 850 855 860
His Val Leu Trp Ala Pro Ala Asp Ile Asn Leu Asn Arg His Leu Leu 865
870 875 880 Thr Glu Ile Arg Asn
Glu Lys Phe Arg Leu Tyr Lys Leu Lys Met Met 885
890 895 Ala Lys Met Glu Lys Tyr Leu His Ser Ser
Arg Cys Arg Arg Gln Ile 900 905
910 Ile Leu Ser His Phe Glu Asp Lys Gln Val Gln Lys Ala Ser Leu
Gly 915 920 925 Ile
Met Gly Thr Glu Lys Cys Cys Asp Asn Cys Arg Ser Arg Leu Asp 930
935 940 His Cys Tyr Ser Met Asp
Asp Ser Glu Asp Thr Ser Trp Asp Phe Gly 945 950
955 960 Pro Gln Ala Phe Lys Leu Leu Ser Ala Val Asp
Ile Leu Gly Glu Lys 965 970
975 Phe Gly Ile Gly Leu Pro Ile Leu Phe Leu Arg Gly Ser Asn Ser Gln
980 985 990 Arg Leu
Ala Asp Gln Tyr Arg Arg His Ser Leu Phe Gly Thr Gly Lys 995
1000 1005 Asp Gln Thr Glu Ser
Trp Trp Lys Ala Phe Ser Arg Gln Leu Ile 1010 1015
1020 Thr Glu Gly Phe Leu Val Glu Val Ser Arg
Tyr Asn Lys Phe Met 1025 1030 1035
Lys Ile Cys Ala Leu Thr Lys Lys Gly Arg Asn Trp Leu His Lys
1040 1045 1050 Ala Asn
Thr Glu Ser Gln Ser Leu Ile Leu Gln Ala Asn Glu Glu 1055
1060 1065 Leu Cys Pro Lys Lys Leu Leu
Leu Pro Ser Ser Lys Thr Val Ser 1070 1075
1080 Ser Gly Thr Lys Glu His Cys Tyr Asn Gln Val Pro
Val Glu Leu 1085 1090 1095
Ser Thr Glu Lys Lys Ser Asn Leu Glu Lys Leu Tyr Ser Tyr Lys 1100
1105 1110 Pro Cys Asp Lys Ile
Ser Ser Gly Ser Asn Ile Ser Lys Lys Ser 1115 1120
1125 Ile Met Val Gln Ser Pro Glu Lys Ala Tyr
Ser Ser Ser Gln Pro 1130 1135 1140
Val Ile Ser Ala Gln Glu Gln Glu Thr Gln Ile Val Leu Tyr Gly
1145 1150 1155 Lys Leu
Val Glu Ala Arg Gln Lys His Ala Asn Lys Met Asp Val 1160
1165 1170 Pro Pro Ala Ile Leu Ala Thr
Asn Lys Ile Leu Val Asp Met Ala 1175 1180
1185 Lys Met Arg Pro Thr Thr Val Glu Asn Val Lys Arg
Ile Asp Gly 1190 1195 1200
Val Ser Glu Gly Lys Ala Ala Met Leu Ala Pro Leu Leu Glu Val 1205
1210 1215 Ile Lys His Phe Cys
Gln Thr Asn Ser Val Gln Thr Asp Leu Phe 1220 1225
1230 Ser Ser Thr Lys Pro Gln Glu Glu Gln Lys
Thr Ser Leu Val Ala 1235 1240 1245
Lys Asn Lys Ile Cys Thr Leu Ser Gln Ser Met Ala Ile Thr Tyr
1250 1255 1260 Ser Leu
Phe Gln Glu Lys Lys Met Pro Leu Lys Ser Ile Ala Glu 1265
1270 1275 Ser Arg Ile Leu Pro Leu Met
Thr Ile Gly Met His Leu Tyr Gln 1280 1285
1290 Ala Val Lys Ala Gly Cys Pro Leu Asp Leu Glu Arg
Ala Gly Leu 1295 1300 1305
Thr Pro Glu Val Gln Lys Ile Ile Ala Asp Val Ile Arg Asn Pro 1310
1315 1320 Pro Val Asn Ser Asp
Met Ser Lys Ile Ser Leu Ile Arg Met Leu 1325 1330
1335 Val Pro Glu Asn Ile Asp Thr Tyr Leu Ile
His Met Ala Ile Glu 1340 1345 1350
Ile Leu Lys His Gly Pro Asp Ser Gly Leu Gln Pro Ser Cys Asp
1355 1360 1365 Val Asn
Lys Arg Arg Cys Phe Pro Gly Ser Glu Glu Ile Cys Ser 1370
1375 1380 Ser Ser Lys Arg Ser Lys Glu
Glu Val Gly Ile Asn Thr Glu Thr 1385 1390
1395 Ser Ser Ala Glu Arg Lys Arg Arg Leu Pro Val Trp
Phe Ala Lys 1400 1405 1410
Gly Ser Asp Thr Ser Lys Lys Leu Met Asp Lys Thr Lys Arg Gly 1415
1420 1425 Gly Leu Phe Ser
1430 153396DNAHomo sapiens 15atgccgcgcg ctccccgctg ccgagccgtg
cgctccctgc tgcgcagcca ctaccgcgag 60gtgctgccgc tggccacgtt cgtgcggcgc
ctggggcccc agggctggcg gctggtgcag 120cgcggggacc cggcggcttt ccgcgcgctg
gtggcccagt gcctggtgtg cgtgccctgg 180gacgcacggc cgccccccgc cgccccctcc
ttccgccagg tgtcctgcct gaaggagctg 240gtggcccgag tgctgcagag gctgtgcgag
cgcggcgcga agaacgtgct ggccttcggc 300ttcgcgctgc tggacggggc ccgcgggggc
ccccccgagg ccttcaccac cagcgtgcgc 360agctacctgc ccaacacggt gaccgacgca
ctgcggggga gcggggcgtg ggggctgctg 420ctgcgccgcg tgggcgacga cgtgctggtt
cacctgctgg cacgctgcgc gctctttgtg 480ctggtggctc ccagctgcgc ctaccaggtg
tgcgggccgc cgctgtacca gctcggcgct 540gccactcagg cccggccccc gccacacgct
agtggacccc gaaggcgtct gcgatgcgaa 600cgggcctgga accatagcgt cagggaggcc
ggggtccccc tgggcctgcc agccccgggt 660gcgaggaggc gcgggggcag tgccagccga
agtctgccgt tgcccaagag gcccaggcgt 720ggcgctgccc ctgagccgga gcggacgccc
gttgggcagg ggtcctgggc ccacccgggc 780aggacgcgtg gaccgagtga ccgtggtttc
tgtgtggtgt cacctgccag acccgccgaa 840gaagccacct ctttggaggg tgcgctctct
ggcacgcgcc actcccaccc atccgtgggc 900cgccagcacc acgcgggccc cccatccaca
tcgcggccac cacgtccctg ggacacgcct 960tgtcccccgg tgtacgccga gaccaagcac
ttcctctact cctcaggcga caaggagcag 1020ctgcggccct ccttcctact cagctctctg
aggcccagcc tgactggcgc tcggaggctc 1080gtggagacca tctttctggg ttccaggccc
tggatgccag ggactccccg caggttgccc 1140cgcctgcccc agcgctactg gcaaatgcgg
cccctgtttc tggagctgct tgggaaccac 1200gcgcagtgcc cctacggggt gctcctcaag
acgcactgcc cgctgcgagc tgcggtcacc 1260ccagcagccg gtgtctgtgc ccgggagaag
ccccagggct ctgtggcggc ccccgaggag 1320gaggacacag acccccgtcg cctggtgcag
ctgctccgcc agcacagcag cccctggcag 1380gtgtacggct tcgtgcgggc ctgcctgcgc
cggctggtgc ccccaggcct ctggggctcc 1440aggcacaacg aacgccgctt cctcaggaac
accaagaagt tcatctccct ggggaagcat 1500gccaagctct cgctgcagga gctgacgtgg
aagatgagcg tgcgggactg cgcttggctg 1560cgcaggagcc caggggttgg ctgtgttccg
gccgcagagc accgtctgcg tgaggagatc 1620ctggccaagt tcctgcactg gctgatgagt
gtgtacgtcg tcgagctgct caggtctttc 1680ttttatgtca cggagaccac gtttcaaaag
aacaggctct ttttctaccg gaagagtgtc 1740tggagcaagt tgcaaagcat tggaatcaga
cagcacttga agagggtgca gctgcgggag 1800ctgtcggaag cagaggtcag gcagcatcgg
gaagccaggc ccgccctgct gacgtccaga 1860ctccgcttca tccccaagcc tgacgggctg
cggccgattg tgaacatgga ctacgtcgtg 1920ggagccagaa cgttccgcag agaaaagagg
gccgagcgtc tcacctcgag ggtgaaggca 1980ctgttcagcg tgctcaacta cgagcgggcg
cggcgccccg gcctcctggg cgcctctgtg 2040ctgggcctgg acgatatcca cagggcctgg
cgcaccttcg tgctgcgtgt gcgggcccag 2100gacccgccgc ctgagctgta ctttgtcaag
gtggatgtga cgggcgcgta cgacaccatc 2160ccccaggaca ggctcacgga ggtcatcgcc
agcatcatca aaccccagaa cacgtactgc 2220gtgcgtcggt atgccgtggt ccagaaggcc
gcccatgggc acgtccgcaa ggccttcaag 2280agccacgtct ctaccttgac agacctccag
ccgtacatgc gacagttcgt ggctcacctg 2340caggagacca gcccgctgag ggatgccgtc
gtcatcgagc agagctcctc cctgaatgag 2400gccagcagtg gcctcttcga cgtcttccta
cgcttcatgt gccaccacgc cgtgcgcatc 2460aggggcaagt cctacgtcca gtgccagggg
atcccgcagg gctccatcct ctccacgctg 2520ctctgcagcc tgtgctacgg cgacatggag
aacaagctgt ttgcggggat tcggcgggac 2580gggctgctcc tgcgtttggt ggatgatttc
ttgttggtga cacctcacct cacccacgcg 2640aaaaccttcc tcaggaccct ggtccgaggt
gtccctgagt atggctgcgt ggtgaacttg 2700cggaagacag tggtgaactt ccctgtagaa
gacgaggccc tgggtggcac ggcttttgtt 2760cagatgccgg cccacggcct attcccctgg
tgcggcctgc tgctggatac ccggaccctg 2820gaggtgcaga gcgactactc cagctatgcc
cggacctcca tcagagccag tctcaccttc 2880aaccgcggct tcaaggctgg gaggaacatg
cgtcgcaaac tctttggggt cttgcggctg 2940aagtgtcaca gcctgtttct ggatttgcag
gtgaacagcc tccagacggt gtgcaccaac 3000atctacaaga tcctcctgct gcaggcgtac
aggtttcacg catgtgtgct gcagctccca 3060tttcatcagc aagtttggaa gaaccccaca
tttttcctgc gcgtcatctc tgacacggcc 3120tccctctgct actccatcct gaaagccaag
aacgcaggga tgtcgctggg ggccaagggc 3180gccgccggcc ctctgccctc cgaggccgtg
cagtggctgt gccaccaagc attcctgctc 3240aagctgactc gacaccgtgt cacctacgtg
ccactcctgg ggtcactcag gacagcccag 3300acgcagctga gtcggaagct cccggggacg
acgctgactg ccctggaggc cgcagccaac 3360ccggcactgc cctcagactt caagaccatc
ctggac 3396163396DNAHomo sapiens 16atgccgcgcg
ctccccgctg ccgagccgtg cgctccctgc tgcgcagcca ctaccgcgag 60gtgctgccgc
tggccacgtt cgtgcggcgc ctggggcccc agggctggcg gctggtgcag 120cgcggggacc
cggcggcttt ccgcgcgctg gtggcccagt gcctggtgtg cgtgccctgg 180gacgcacggc
cgccccccgc cgccccctcc ttccgccagg tgtcctgcct gaaggagctg 240gtggcccgag
tgctgcagag gctgtgcgag cgcggcgcga agaacgtgct ggccttcggc 300ttcgcgctgc
tggacggggc ccgcgggggc ccccccgagg ccttcaccac cagcgtgcgc 360agctacctgc
ccaacacggt gaccgacgca ctgcggggga gcggggcgtg ggggctgctg 420ctgcgccgcg
tgggcgacga cgtgctggtt cacctgctgg cacgctgcgc gctctttgtg 480ctggtggctc
ccagctgcgc ctaccaggtg tgcgggccgc cgctgtacca gctcggcgct 540gccactcagg
cccggccccc gccacacgct agtggacccc gaaggcgtct gggatgcgaa 600cgggcctgga
accatagcgt cagggaggcc ggggtccccc tgggcctgcc agccccgggt 660gcgaggaggc
gcgggggcag tgccagccga agtctgccgt tgcccaagag gcccaggcgt 720ggcgctgccc
ctgagccgga gcggacgccc gttgggcagg ggtcctgggc ccacccgggc 780aggacgcgtg
gaccgagtga ccgtggtttc tgtgtggtgt cacctgccag acccgccgaa 840gaagccacct
ctttggaggg tgcgctctct ggcacgcgcc actcccaccc atccgtgggc 900cgccagcacc
acgcgggccc cccatccaca tcgcggccac cacgtccctg ggacacgcct 960tgtcccccgg
tgtacgccga gaccaagcac ttcctctact cctcaggcga caaggagcag 1020ctgcggccct
ccttcctact cagctctctg aggcccagcc tgactggcgc tcggaggctc 1080gtggagacca
tctttctggg ttccaggccc tggatgccag ggactccccg caggttgccc 1140cgcctgcccc
agcgctactg gcaaatgcgg cccctgtttc tggagctgct tgggaaccac 1200gcgcagtgcc
cctacggggt gctcctcaag acgcactgcc cgctgcgagc tgcggtcacc 1260ccagcagccg
gtgtctgtgc ccgggagaag ccccagggct ctgtggcggc ccccgaggag 1320gaggacacag
acccccgtcg cctggtgcag ctgctccgcc agcacagcag cccctggcag 1380gtgtacggct
tcgtgcgggc ctgcctgcgc cggctggtgc ccccaggcct ctggggctcc 1440aggcacaacg
aacgccgctt cctcaggaac accaagaagt tcatctccct ggggaagcat 1500gccaagctct
cgctgcagga gctgacgtgg aagatgagcg tgcgggactg cgcttggctg 1560cgcaggagcc
caggggttgg ctgtgttccg gccgcagagc accgtctgcg tgaggagatc 1620ctggccaagt
tcctgcactg gctgatgagt gtgtacgtcg tcgagctgct caggtctttc 1680ttttatgtca
cggagaccac gtttcaaaag aacaggctct ttttctaccg gaagagtgtc 1740tggagcaagt
tgcaaagcat tggaatcaga cagcacttga agagggtgca gctgcgggag 1800ctgtcggaag
cagaggtcag gcagcatcgg gaagccaggc ccgccctgct gacgtccaga 1860ctccgcttca
tccccaagcc tgacgggctg cggccgattg tgaacatgga ctacgtcgtg 1920ggagccagaa
cgttccgcag agaaaagagg gccgagcgtc tcacctcgag ggtgaaggca 1980ctgttcagcg
tgctcaacta cgagcgggcg cggcgccccg gcctcctggg cgcctctgtg 2040ctgggcctgg
acgatatcca cagggcctgg cgcaccttcg tgctgcgtgt gcgggcccag 2100gacccgccgc
ctgagctgta ctttgtcaag gtggatgtga cgggcgcgta cgacaccatc 2160ccccaggaca
ggctcacgga ggtcatcgcc agcatcatca aaccccagaa cacgtactgc 2220gtgcgtcggt
atgccgtggt ccagaaggcc gcccatgggc acgtccgcaa ggccttcaag 2280agccacgtct
ctaccttgac agacctccag ccgtacatgc gacagttcgt ggctcacctg 2340caggagacca
gcccgctgag ggatgccgtc gtcatcgagc agagctcctc cctgaatgag 2400gccagcagtg
gcctcttcga cgtcttccta cgcttcatgt gccaccacgc cgtgcgcatc 2460aggggcaagt
cctacgtcca gtgccagggg atcccgcagg gctccatcct ctccacgctg 2520ctctgcagcc
tgtgctacgg cgacatggag aacaagctgt ttgcggggat tcggcgggac 2580gggctgctcc
tgcgtttggt ggatgatttc ttgttggtga cacctcacct cacccacgcg 2640aaaaccttcc
tcaggaccct ggtccgaggt gtccctgagt atggctgcgt ggtgaacttg 2700cggaagacag
tggtgaactt ccctgtagaa gacgaggccc tgggtggcac ggcttttgtt 2760cagatgccgg
cccacggcct attcccctgg tgcggcctgc tgctggatac ccggaccctg 2820gaggtgcaga
gcgactactc cagctatgcc cggacctcca tcagagccag tctcaccttc 2880aaccgcggct
tcaaggctgg gaggaacatg cgtcgcaaac tctttggggt cttgcggctg 2940aagtgtcaca
gcctgtttct ggatttgcag gtgaacagcc tccagacggt gtgcaccaac 3000atctacaaga
tcctcctgct gcaggcgtac aggtttcacg catgtgtgct gcagctccca 3060tttcatcagc
aagtttggaa gaaccccaca tttttcctgc gcgtcatctc tgacacggcc 3120tccctctgct
actccatcct gaaagccaag aacgcaggga tgtcgctggg ggccaagggc 3180gccgccggcc
ctctgccctc cgaggccgtg cagtggctgt gccaccaagc attcctgctc 3240aagctgactc
gacaccgtgt cacctacgtg ccactcctgg ggtcactcag gacagcccag 3300acgcagctga
gtcggaagct cccggggacg acgctgactg ccctggaggc cgcagccaac 3360ccggcactgc
cctcagactt caagaccatc ctggac
3396174018DNAHomo sapiens 17caggcagcgc tgcgtcctgc tgcgcacgtg ggaagccctg
gccccggcca cccccgcgat 60gccgcgcgct ccccgctgcc gagccgtgcg ctccctgctg
cgcagccact accgcgaggt 120gctgccgctg gccacgttcg tgcggcgcct ggggccccag
ggctggcggc tggtgcagcg 180cggggacccg gcggctttcc gcgcgctggt ggcccagtgc
ctggtgtgcg tgccctggga 240cgcacggccg ccccccgccg ccccctcctt ccgccaggtg
tcctgcctga aggagctggt 300ggcccgagtg ctgcagaggc tgtgcgagcg cggcgcgaag
aacgtgctgg ccttcggctt 360cgcgctgctg gacggggccc gcgggggccc ccccgaggcc
ttcaccacca gcgtgcgcag 420ctacctgccc aacacggtga ccgacgcact gcgggggagc
ggggcgtggg ggctgctgct 480gcgccgcgtg ggcgacgacg tgctggttca cctgctggca
cgctgcgcgc tctttgtgct 540ggtggctccc agctgcgcct accaggtgtg cgggccgccg
ctgtaccagc tcggcgctgc 600cactcaggcc cggcccccgc cacacgctag tggaccccga
aggcgtctgg gatgcgaacg 660ggcctggaac catagcgtca gggaggccgg ggtccccctg
ggcctgccag ccccgggtgc 720gaggaggcgc gggggcagtg ccagccgaag tctgccgttg
cccaagaggc ccaggcgtgg 780cgctgcccct gagccggagc ggacgcccgt tgggcagggg
tcctgggccc acccgggcag 840gacgcgtgga ccgagtgacc gtggtttctg tgtggtgtca
cctgccagac ccgccgaaga 900agccacctct ttggagggtg cgctctctgg cacgcgccac
tcccacccat ccgtgggccg 960ccagcaccac gcgggccccc catccacatc gcggccacca
cgtccctggg acacgccttg 1020tcccccggtg tacgccgaga ccaagcactt cctctactcc
tcaggcgaca aggagcagct 1080gcggccctcc ttcctactca gctctctgag gcccagcctg
actggcgctc ggaggctcgt 1140ggagaccatc tttctgggtt ccaggccctg gatgccaggg
actccccgca ggttgccccg 1200cctgccccag cgctactggc aaatgcggcc cctgtttctg
gagctgcttg ggaaccacgc 1260gcagtgcccc tacggggtgc tcctcaagac gcactgcccg
ctgcgagctg cggtcacccc 1320agcagccggt gtctgtgccc gggagaagcc ccagggctct
gtggcggccc ccgaggagga 1380ggacacagac ccccgtcgcc tggtgcagct gctccgccag
cacagcagcc cctggcaggt 1440gtacggcttc gtgcgggcct gcctgcgccg gctggtgccc
ccaggcctct ggggctccag 1500gcacaacgaa cgccgcttcc tcaggaacac caagaagttc
atctccctgg ggaagcatgc 1560caagctctcg ctgcaggagc tgacgtggaa gatgagcgtg
cgggactgcg cttggctgcg 1620caggagccca ggggttggct gtgttccggc cgcagagcac
cgtctgcgtg aggagatcct 1680ggccaagttc ctgcactggc tgatgagtgt gtacgtcgtc
gagctgctca ggtctttctt 1740ttatgtcacg gagaccacgt ttcaaaagaa caggctcttt
ttctaccgga agagtgtctg 1800gagcaagttg caaagcattg gaatcagaca gcacttgaag
agggtgcagc tgcgggagct 1860gtcggaagca gaggtcaggc agcatcggga agccaggccc
gccctgctga cgtccagact 1920ccgcttcatc cccaagcctg acgggctgcg gccgattgtg
aacatggact acgtcgtggg 1980agccagaacg ttccgcagag aaaagagggc cgagcgtctc
acctcgaggg tgaaggcact 2040gttcagcgtg ctcaactacg agcgggcgcg gcgccccggc
ctcctgggcg cctctgtgct 2100gggcctggac gatatccaca gggcctggcg caccttcgtg
ctgcgtgtgc gggcccagga 2160cccgccgcct gagctgtact ttgtcaaggt ggatgtgacg
ggcgcgtacg acaccatccc 2220ccaggacagg ctcacggagg tcatcgccag catcatcaaa
ccccagaaca cgtactgcgt 2280gcgtcggtat gccgtggtcc agaaggccgc ccatgggcac
gtccgcaagg ccttcaagag 2340ccacgtctct accttgacag acctccagcc gtacatgcga
cagttcgtgg ctcacctgca 2400ggagaccagc ccgctgaggg atgccgtcgt catcgagcag
agctcctccc tgaatgaggc 2460cagcagtggc ctcttcgacg tcttcctacg cttcatgtgc
caccacgccg tgcgcatcag 2520gggcaagtcc tacgtccagt gccaggggat cccgcagggc
tccatcctct ccacgctgct 2580ctgcagcctg tgctacggcg acatggagaa caagctgttt
gcggggattc ggcgggacgg 2640gctgctcctg cgtttggtgg atgatttctt gttggtgaca
cctcacctca cccacgcgaa 2700aaccttcctc aggaccctgg tccgaggtgt ccctgagtat
ggctgcgtgg tgaacttgcg 2760gaagacagtg gtgaacttcc ctgtagaaga cgaggccctg
ggtggcacgg cttttgttca 2820gatgccggcc cacggcctat tcccctggtg cggcctgctg
ctggataccc ggaccctgga 2880ggtgcagagc gactactcca gctatgcccg gacctccatc
agagccagtc tcaccttcaa 2940ccgcggcttc aaggctggga ggaacatgcg tcgcaaactc
tttggggtct tgcggctgaa 3000gtgtcacagc ctgtttctgg atttgcaggt gaacagcctc
cagacggtgt gcaccaacat 3060ctacaagatc ctcctgctgc aggcgtacag gtttcacgca
tgtgtgctgc agctcccatt 3120tcatcagcaa gtttggaaga accccacatt tttcctgcgc
gtcatctctg acacggcctc 3180cctctgctac tccatcctga aagccaagaa cgcagggatg
tcgctggggg ccaagggcgc 3240cgccggccct ctgccctccg aggccgtgca gtggctgtgc
caccaagcat tcctgctcaa 3300gctgactcga caccgtgtca cctacgtgcc actcctgggg
tcactcagga cagcccagac 3360gcagctgagt cggaagctcc cggggacgac gctgactgcc
ctggaggccg cagccaaccc 3420ggcactgccc tcagacttca agaccatcct ggactgatgg
ccacccgccc acagccaggc 3480cgagagcaga caccagcagc cctgtcacgc cgggctctac
gtcccaggga gggaggggcg 3540gcccacaccc aggcccgcac cgctgggagt ctgaggcctg
agtgagtgtt tggccgaggc 3600ctgcatgtcc ggctgaaggc tgagtgtccg gctgaggcct
gagcgagtgt ccagccaagg 3660gctgagtgtc cagcacacct gccgtcttca cttccccaca
ggctggcgct cggctccacc 3720ccagggccag cttttcctca ccaggagccc ggcttccact
ccccacatag gaatagtcca 3780tccccagatt cgccattgtt cacccctcgc cctgccctcc
tttgccttcc acccccacca 3840tccaggtgga gaccctgaga aggaccctgg gagctctggg
aatttggagt gaccaaaggt 3900gtgccctgta cacaggcgag gaccctgcac ctggatgggg
gtccctgtgg gtcaaattgg 3960ggggaggtgc tgtgggagta aaatactgaa tatatgagtt
tttcagtttt gaaaaaaa 4018181132PRTHomo sapiens 18Met Pro Arg Ala Pro
Arg Cys Arg Ala Val Arg Ser Leu Leu Arg Ser 1 5
10 15 His Tyr Arg Glu Val Leu Pro Leu Ala Thr
Phe Val Arg Arg Leu Gly 20 25
30 Pro Gln Gly Trp Arg Leu Val Gln Arg Gly Asp Pro Ala Ala Phe
Arg 35 40 45 Ala
Leu Val Ala Gln Cys Leu Val Cys Val Pro Trp Asp Ala Arg Pro 50
55 60 Pro Pro Ala Ala Pro Ser
Phe Arg Gln Val Ser Cys Leu Lys Glu Leu 65 70
75 80 Val Ala Arg Val Leu Gln Arg Leu Cys Glu Arg
Gly Ala Lys Asn Val 85 90
95 Leu Ala Phe Gly Phe Ala Leu Leu Asp Gly Ala Arg Gly Gly Pro Pro
100 105 110 Glu Ala
Phe Thr Thr Ser Val Arg Ser Tyr Leu Pro Asn Thr Val Thr 115
120 125 Asp Ala Leu Arg Gly Ser Gly
Ala Trp Gly Leu Leu Leu Arg Arg Val 130 135
140 Gly Asp Asp Val Leu Val His Leu Leu Ala Arg Cys
Ala Leu Phe Val 145 150 155
160 Leu Val Ala Pro Ser Cys Ala Tyr Gln Val Cys Gly Pro Pro Leu Tyr
165 170 175 Gln Leu Gly
Ala Ala Thr Gln Ala Arg Pro Pro Pro His Ala Ser Gly 180
185 190 Pro Arg Arg Arg Leu Arg Cys Glu
Arg Ala Trp Asn His Ser Val Arg 195 200
205 Glu Ala Gly Val Pro Leu Gly Leu Pro Ala Pro Gly Ala
Arg Arg Arg 210 215 220
Gly Gly Ser Ala Ser Arg Ser Leu Pro Leu Pro Lys Arg Pro Arg Arg 225
230 235 240 Gly Ala Ala Pro
Glu Pro Glu Arg Thr Pro Val Gly Gln Gly Ser Trp 245
250 255 Ala His Pro Gly Arg Thr Arg Gly Pro
Ser Asp Arg Gly Phe Cys Val 260 265
270 Val Ser Pro Ala Arg Pro Ala Glu Glu Ala Thr Ser Leu Glu
Gly Ala 275 280 285
Leu Ser Gly Thr Arg His Ser His Pro Ser Val Gly Arg Gln His His 290
295 300 Ala Gly Pro Pro Ser
Thr Ser Arg Pro Pro Arg Pro Trp Asp Thr Pro 305 310
315 320 Cys Pro Pro Val Tyr Ala Glu Thr Lys His
Phe Leu Tyr Ser Ser Gly 325 330
335 Asp Lys Glu Gln Leu Arg Pro Ser Phe Leu Leu Ser Ser Leu Arg
Pro 340 345 350 Ser
Leu Thr Gly Ala Arg Arg Leu Val Glu Thr Ile Phe Leu Gly Ser 355
360 365 Arg Pro Trp Met Pro Gly
Thr Pro Arg Arg Leu Pro Arg Leu Pro Gln 370 375
380 Arg Tyr Trp Gln Met Arg Pro Leu Phe Leu Glu
Leu Leu Gly Asn His 385 390 395
400 Ala Gln Cys Pro Tyr Gly Val Leu Leu Lys Thr His Cys Pro Leu Arg
405 410 415 Ala Ala
Val Thr Pro Ala Ala Gly Val Cys Ala Arg Glu Lys Pro Gln 420
425 430 Gly Ser Val Ala Ala Pro Glu
Glu Glu Asp Thr Asp Pro Arg Arg Leu 435 440
445 Val Gln Leu Leu Arg Gln His Ser Ser Pro Trp Gln
Val Tyr Gly Phe 450 455 460
Val Arg Ala Cys Leu Arg Arg Leu Val Pro Pro Gly Leu Trp Gly Ser 465
470 475 480 Arg His Asn
Glu Arg Arg Phe Leu Arg Asn Thr Lys Lys Phe Ile Ser 485
490 495 Leu Gly Lys His Ala Lys Leu Ser
Leu Gln Glu Leu Thr Trp Lys Met 500 505
510 Ser Val Arg Asp Cys Ala Trp Leu Arg Arg Ser Pro Gly
Val Gly Cys 515 520 525
Val Pro Ala Ala Glu His Arg Leu Arg Glu Glu Ile Leu Ala Lys Phe 530
535 540 Leu His Trp Leu
Met Ser Val Tyr Val Val Glu Leu Leu Arg Ser Phe 545 550
555 560 Phe Tyr Val Thr Glu Thr Thr Phe Gln
Lys Asn Arg Leu Phe Phe Tyr 565 570
575 Arg Lys Ser Val Trp Ser Lys Leu Gln Ser Ile Gly Ile Arg
Gln His 580 585 590
Leu Lys Arg Val Gln Leu Arg Glu Leu Ser Glu Ala Glu Val Arg Gln
595 600 605 His Arg Glu Ala
Arg Pro Ala Leu Leu Thr Ser Arg Leu Arg Phe Ile 610
615 620 Pro Lys Pro Asp Gly Leu Arg Pro
Ile Val Asn Met Asp Tyr Val Val 625 630
635 640 Gly Ala Arg Thr Phe Arg Arg Glu Lys Arg Ala Glu
Arg Leu Thr Ser 645 650
655 Arg Val Lys Ala Leu Phe Ser Val Leu Asn Tyr Glu Arg Ala Arg Arg
660 665 670 Pro Gly Leu
Leu Gly Ala Ser Val Leu Gly Leu Asp Asp Ile His Arg 675
680 685 Ala Trp Arg Thr Phe Val Leu Arg
Val Arg Ala Gln Asp Pro Pro Pro 690 695
700 Glu Leu Tyr Phe Val Lys Val Asp Val Thr Gly Ala Tyr
Asp Thr Ile 705 710 715
720 Pro Gln Asp Arg Leu Thr Glu Val Ile Ala Ser Ile Ile Lys Pro Gln
725 730 735 Asn Thr Tyr Cys
Val Arg Arg Tyr Ala Val Val Gln Lys Ala Ala His 740
745 750 Gly His Val Arg Lys Ala Phe Lys Ser
His Val Ser Thr Leu Thr Asp 755 760
765 Leu Gln Pro Tyr Met Arg Gln Phe Val Ala His Leu Gln Glu
Thr Ser 770 775 780
Pro Leu Arg Asp Ala Val Val Ile Glu Gln Ser Ser Ser Leu Asn Glu 785
790 795 800 Ala Ser Ser Gly Leu
Phe Asp Val Phe Leu Arg Phe Met Cys His His 805
810 815 Ala Val Arg Ile Arg Gly Lys Ser Tyr Val
Gln Cys Gln Gly Ile Pro 820 825
830 Gln Gly Ser Ile Leu Ser Thr Leu Leu Cys Ser Leu Cys Tyr Gly
Asp 835 840 845 Met
Glu Asn Lys Leu Phe Ala Gly Ile Arg Arg Asp Gly Leu Leu Leu 850
855 860 Arg Leu Val Asp Asp Phe
Leu Leu Val Thr Pro His Leu Thr His Ala 865 870
875 880 Lys Thr Phe Leu Arg Thr Leu Val Arg Gly Val
Pro Glu Tyr Gly Cys 885 890
895 Val Val Asn Leu Arg Lys Thr Val Val Asn Phe Pro Val Glu Asp Glu
900 905 910 Ala Leu
Gly Gly Thr Ala Phe Val Gln Met Pro Ala His Gly Leu Phe 915
920 925 Pro Trp Cys Gly Leu Leu Leu
Asp Thr Arg Thr Leu Glu Val Gln Ser 930 935
940 Asp Tyr Ser Ser Tyr Ala Arg Thr Ser Ile Arg Ala
Ser Leu Thr Phe 945 950 955
960 Asn Arg Gly Phe Lys Ala Gly Arg Asn Met Arg Arg Lys Leu Phe Gly
965 970 975 Val Leu Arg
Leu Lys Cys His Ser Leu Phe Leu Asp Leu Gln Val Asn 980
985 990 Ser Leu Gln Thr Val Cys Thr Asn
Ile Tyr Lys Ile Leu Leu Leu Gln 995 1000
1005 Ala Tyr Arg Phe His Ala Cys Val Leu Gln Leu
Pro Phe His Gln 1010 1015 1020
Gln Val Trp Lys Asn Pro Thr Phe Phe Leu Arg Val Ile Ser Asp
1025 1030 1035 Thr Ala Ser
Leu Cys Tyr Ser Ile Leu Lys Ala Lys Asn Ala Gly 1040
1045 1050 Met Ser Leu Gly Ala Lys Gly Ala
Ala Gly Pro Leu Pro Ser Glu 1055 1060
1065 Ala Val Gln Trp Leu Cys His Gln Ala Phe Leu Leu Lys
Leu Thr 1070 1075 1080
Arg His Arg Val Thr Tyr Val Pro Leu Leu Gly Ser Leu Arg Thr 1085
1090 1095 Ala Gln Thr Gln Leu
Ser Arg Lys Leu Pro Gly Thr Thr Leu Thr 1100 1105
1110 Ala Leu Glu Ala Ala Ala Asn Pro Ala Leu
Pro Ser Asp Phe Lys 1115 1120 1125
Thr Ile Leu Asp 1130 191132PRTHomo sapiens 19Met
Pro Arg Ala Pro Arg Cys Arg Ala Val Arg Ser Leu Leu Arg Ser 1
5 10 15 His Tyr Arg Glu Val Leu
Pro Leu Ala Thr Phe Val Arg Arg Leu Gly 20
25 30 Pro Gln Gly Trp Arg Leu Val Gln Arg Gly
Asp Pro Ala Ala Phe Arg 35 40
45 Ala Leu Val Ala Gln Cys Leu Val Cys Val Pro Trp Asp Ala
Arg Pro 50 55 60
Pro Pro Ala Ala Pro Ser Phe Arg Gln Val Ser Cys Leu Lys Glu Leu 65
70 75 80 Val Ala Arg Val Leu
Gln Arg Leu Cys Glu Arg Gly Ala Lys Asn Val 85
90 95 Leu Ala Phe Gly Phe Ala Leu Leu Asp Gly
Ala Arg Gly Gly Pro Pro 100 105
110 Glu Ala Phe Thr Thr Ser Val Arg Ser Tyr Leu Pro Asn Thr Val
Thr 115 120 125 Asp
Ala Leu Arg Gly Ser Gly Ala Trp Gly Leu Leu Leu Arg Arg Val 130
135 140 Gly Asp Asp Val Leu Val
His Leu Leu Ala Arg Cys Ala Leu Phe Val 145 150
155 160 Leu Val Ala Pro Ser Cys Ala Tyr Gln Val Cys
Gly Pro Pro Leu Tyr 165 170
175 Gln Leu Gly Ala Ala Thr Gln Ala Arg Pro Pro Pro His Ala Ser Gly
180 185 190 Pro Arg
Arg Arg Leu Gly Cys Glu Arg Ala Trp Asn His Ser Val Arg 195
200 205 Glu Ala Gly Val Pro Leu Gly
Leu Pro Ala Pro Gly Ala Arg Arg Arg 210 215
220 Gly Gly Ser Ala Ser Arg Ser Leu Pro Leu Pro Lys
Arg Pro Arg Arg 225 230 235
240 Gly Ala Ala Pro Glu Pro Glu Arg Thr Pro Val Gly Gln Gly Ser Trp
245 250 255 Ala His Pro
Gly Arg Thr Arg Gly Pro Ser Asp Arg Gly Phe Cys Val 260
265 270 Val Ser Pro Ala Arg Pro Ala Glu
Glu Ala Thr Ser Leu Glu Gly Ala 275 280
285 Leu Ser Gly Thr Arg His Ser His Pro Ser Val Gly Arg
Gln His His 290 295 300
Ala Gly Pro Pro Ser Thr Ser Arg Pro Pro Arg Pro Trp Asp Thr Pro 305
310 315 320 Cys Pro Pro Val
Tyr Ala Glu Thr Lys His Phe Leu Tyr Ser Ser Gly 325
330 335 Asp Lys Glu Gln Leu Arg Pro Ser Phe
Leu Leu Ser Ser Leu Arg Pro 340 345
350 Ser Leu Thr Gly Ala Arg Arg Leu Val Glu Thr Ile Phe Leu
Gly Ser 355 360 365
Arg Pro Trp Met Pro Gly Thr Pro Arg Arg Leu Pro Arg Leu Pro Gln 370
375 380 Arg Tyr Trp Gln Met
Arg Pro Leu Phe Leu Glu Leu Leu Gly Asn His 385 390
395 400 Ala Gln Cys Pro Tyr Gly Val Leu Leu Lys
Thr His Cys Pro Leu Arg 405 410
415 Ala Ala Val Thr Pro Ala Ala Gly Val Cys Ala Arg Glu Lys Pro
Gln 420 425 430 Gly
Ser Val Ala Ala Pro Glu Glu Glu Asp Thr Asp Pro Arg Arg Leu 435
440 445 Val Gln Leu Leu Arg Gln
His Ser Ser Pro Trp Gln Val Tyr Gly Phe 450 455
460 Val Arg Ala Cys Leu Arg Arg Leu Val Pro Pro
Gly Leu Trp Gly Ser 465 470 475
480 Arg His Asn Glu Arg Arg Phe Leu Arg Asn Thr Lys Lys Phe Ile Ser
485 490 495 Leu Gly
Lys His Ala Lys Leu Ser Leu Gln Glu Leu Thr Trp Lys Met 500
505 510 Ser Val Arg Asp Cys Ala Trp
Leu Arg Arg Ser Pro Gly Val Gly Cys 515 520
525 Val Pro Ala Ala Glu His Arg Leu Arg Glu Glu Ile
Leu Ala Lys Phe 530 535 540
Leu His Trp Leu Met Ser Val Tyr Val Val Glu Leu Leu Arg Ser Phe 545
550 555 560 Phe Tyr Val
Thr Glu Thr Thr Phe Gln Lys Asn Arg Leu Phe Phe Tyr 565
570 575 Arg Lys Ser Val Trp Ser Lys Leu
Gln Ser Ile Gly Ile Arg Gln His 580 585
590 Leu Lys Arg Val Gln Leu Arg Glu Leu Ser Glu Ala Glu
Val Arg Gln 595 600 605
His Arg Glu Ala Arg Pro Ala Leu Leu Thr Ser Arg Leu Arg Phe Ile 610
615 620 Pro Lys Pro Asp
Gly Leu Arg Pro Ile Val Asn Met Asp Tyr Val Val 625 630
635 640 Gly Ala Arg Thr Phe Arg Arg Glu Lys
Arg Ala Glu Arg Leu Thr Ser 645 650
655 Arg Val Lys Ala Leu Phe Ser Val Leu Asn Tyr Glu Arg Ala
Arg Arg 660 665 670
Pro Gly Leu Leu Gly Ala Ser Val Leu Gly Leu Asp Asp Ile His Arg
675 680 685 Ala Trp Arg Thr
Phe Val Leu Arg Val Arg Ala Gln Asp Pro Pro Pro 690
695 700 Glu Leu Tyr Phe Val Lys Val Asp
Val Thr Gly Ala Tyr Asp Thr Ile 705 710
715 720 Pro Gln Asp Arg Leu Thr Glu Val Ile Ala Ser Ile
Ile Lys Pro Gln 725 730
735 Asn Thr Tyr Cys Val Arg Arg Tyr Ala Val Val Gln Lys Ala Ala His
740 745 750 Gly His Val
Arg Lys Ala Phe Lys Ser His Val Ser Thr Leu Thr Asp 755
760 765 Leu Gln Pro Tyr Met Arg Gln Phe
Val Ala His Leu Gln Glu Thr Ser 770 775
780 Pro Leu Arg Asp Ala Val Val Ile Glu Gln Ser Ser Ser
Leu Asn Glu 785 790 795
800 Ala Ser Ser Gly Leu Phe Asp Val Phe Leu Arg Phe Met Cys His His
805 810 815 Ala Val Arg Ile
Arg Gly Lys Ser Tyr Val Gln Cys Gln Gly Ile Pro 820
825 830 Gln Gly Ser Ile Leu Ser Thr Leu Leu
Cys Ser Leu Cys Tyr Gly Asp 835 840
845 Met Glu Asn Lys Leu Phe Ala Gly Ile Arg Arg Asp Gly Leu
Leu Leu 850 855 860
Arg Leu Val Asp Asp Phe Leu Leu Val Thr Pro His Leu Thr His Ala 865
870 875 880 Lys Thr Phe Leu Arg
Thr Leu Val Arg Gly Val Pro Glu Tyr Gly Cys 885
890 895 Val Val Asn Leu Arg Lys Thr Val Val Asn
Phe Pro Val Glu Asp Glu 900 905
910 Ala Leu Gly Gly Thr Ala Phe Val Gln Met Pro Ala His Gly Leu
Phe 915 920 925 Pro
Trp Cys Gly Leu Leu Leu Asp Thr Arg Thr Leu Glu Val Gln Ser 930
935 940 Asp Tyr Ser Ser Tyr Ala
Arg Thr Ser Ile Arg Ala Ser Leu Thr Phe 945 950
955 960 Asn Arg Gly Phe Lys Ala Gly Arg Asn Met Arg
Arg Lys Leu Phe Gly 965 970
975 Val Leu Arg Leu Lys Cys His Ser Leu Phe Leu Asp Leu Gln Val Asn
980 985 990 Ser Leu
Gln Thr Val Cys Thr Asn Ile Tyr Lys Ile Leu Leu Leu Gln 995
1000 1005 Ala Tyr Arg Phe His
Ala Cys Val Leu Gln Leu Pro Phe His Gln 1010 1015
1020 Gln Val Trp Lys Asn Pro Thr Phe Phe Leu
Arg Val Ile Ser Asp 1025 1030 1035
Thr Ala Ser Leu Cys Tyr Ser Ile Leu Lys Ala Lys Asn Ala Gly
1040 1045 1050 Met Ser
Leu Gly Ala Lys Gly Ala Ala Gly Pro Leu Pro Ser Glu 1055
1060 1065 Ala Val Gln Trp Leu Cys His
Gln Ala Phe Leu Leu Lys Leu Thr 1070 1075
1080 Arg His Arg Val Thr Tyr Val Pro Leu Leu Gly Ser
Leu Arg Thr 1085 1090 1095
Ala Gln Thr Gln Leu Ser Arg Lys Leu Pro Gly Thr Thr Leu Thr 1100
1105 1110 Ala Leu Glu Ala Ala
Ala Asn Pro Ala Leu Pro Ser Asp Phe Lys 1115 1120
1125 Thr Ile Leu Asp 1130
202190DNAHomo sapiens 20atgcagctgt ttgagcagcc ctgtcctggg gaggaccccc
ggccaggagg ccagatcggt 60gaggtggagc tgtcctccta cacgccccca gccggggtcc
caggaaagcc tgcagccccc 120cacttccttc cagtgctgtg ctctgtgtca ccatcaggct
ccagggtccc gcacgacctc 180ctcgggggct ccgggggctt cacgctggag gacgccctct
tcgggctcct ctttggagct 240gatgccaccc tcctgcagtc acctgtggtc ctctgtggtc
tccctgatgg ccagctctgc 300tgtgtgatcc tgaaggccct ggtcacctcc aggtcagccc
ctggtgaccc aaatgccctt 360gtcaagatcc tccatcacct ggaggagccc gtcatcttca
taggggcctt gaagacagag 420ccacaggctg cagaagctgc agagaatttt ctgcctgacg
aggatgtgca ctgtgactgc 480ctggtggcct ttggtcacca cggccggatg ctggccatca
aggccagctg ggatgagtcc 540gggaagctgg tgcccgagct gcgggagtac tgcctcccag
gccctgtgct ctgcgctgcc 600tgtggcgggg gtggccgcgt gtaccacagc accccttctg
acctctgtgt ggtggatctg 660tctcggggaa gcaccccgct gggccctgag cagcccgaag
aaggcccggg aggcctgccc 720cccatgctgt gcccagccag cctgaacatc tgcagtgtcg
tctcgctgtc cgcgtctccc 780aggacgcatg aaggtggcac caagctcctg gccctgtccg
ccaaaggccg cctgatgacc 840tgcagcctgg acctggactc tgagatgcct ggcccagcca
ggatgaccac agagagtgca 900ggtcagaaaa taaaggagct gctgtctgga attggcaaca
tctctgagag agtgtctttt 960ctaaagaagg cggttgacca gcggaacaag gcactgacaa
gcctcaacga ggccatgaac 1020gtgagctgtg cactgctgtc aagcggcacg ggccccagac
ccatctcctg caccaccagc 1080accacctgga gccgcctgca gacacaggat gtgctcatgg
ccacctgcgt gctagagaac 1140agcagcagct tcagcctgga ccaggggtgg accctgtgca
tccaggtgct caccagctcc 1200tgtgctctcg acctggactc ggcctgctcc gccatcacct
acaccatccc cgtggaccag 1260ctcggccccg gtgctcggcg ggaggtgacg ctacccctgg
gccctggtga gaacggcggg 1320ctcgacctgc ccgtgaccgt gtcctgcacg ctgttctaca
gtctcaggga ggtggtgggc 1380ggggcccttg cccccttaga ctctgaggac ccctttctgg
atgagtgccc ctccgacgtc 1440ctgcccgagc aagagggtgt ttgcctgccc ctgagcaggc
acacagtgga catgctgcag 1500tgtctgcgct tccctggcct ggccccgcca cacacacggg
ccccctcccc actcggcccc 1560acccgagacc ctgtggccac ttttctggaa acttgtcggg
agcctggcag ccagccagca 1620ggacccgcct ccctgcgggc cgagtacctg cccccatctg
tggcttccat caaggtgtcg 1680gcggagctgc tcagagctgc cttgaaggac ggccactcag
gcgtgcccct gtgctgtgcc 1740accctgcagt ggctccttgc tgagaatgct gctgtggacg
tcgtgagggc ccgagcacta 1800tcttccatcc agggagtggc ccctgatggc gccaacgttc
acctcatcgt ccgagaggtg 1860gccatgaccg acctgtgccc agcagggccc atccaggccg
tggagattca agtggaaagc 1920tcctctctgg ccgacatttg cagggcgcac catgccgttg
tcgggcgcat gcagacgatg 1980gtgacagagc aggccgccca gggctccagc gctcctgatc
tccgtgtgca gtacctccgc 2040cagatccacg ccaaccacga gacactgctg cgggaggtgc
agaccctgcg cgaccggctc 2100tgcacggagg atgaggccag ctcctgtgcc accgcccaga
ggctgctaca ggtgtaccgg 2160cagctgcgcc accccagcct catcctgctg
2190212190DNAHomo sapiens 21atgcagctgt ttgagcagcc
ctgtcctggg gaggaccccc ggccaggagg ccagatcggt 60gaggtggagc tgtcctccta
cacgccccca gccggggtcc caggaaagcc tgcagccccc 120cacttccttc cagtgctgtg
ctctgtgtca ccatcaggct ccagggtccc gcacgacctc 180ctcgggggct ccgggggctt
cacgctggag gacgccctct tcgggctcct ctttggagct 240gatgccaccc tcctgcagtc
acctgtggtc ctctgtggtc tccctgatgg ccagctctgc 300tgtgtgatcc tgaaggccct
ggtcacctcc aggtcagccc ctggtgaccc aaatgccctt 360gtcaagatcc tccatcacct
ggaggagccc gtcatcttca taggggcctt gaagacagag 420ccacaggctg cagaagctgc
agagaatttt ctgcctgacg aggatgtgca ctgtgactgc 480ctggtggcct ttggtcacca
cggccggatg ctggccatca aggccagctg ggatgagtcc 540gggaagctgg tgcccgagct
gcgggagtac tgcctcccag gccctgtgct ctgcgctgcc 600tgtggcgggg gtggccgcgt
gtaccacagc accccttctg acctctgtgt ggtggatctg 660tctcggggaa gcaccccgct
gggccctgag cagcccgaag aaggcccggg aggcctgccc 720cccatgctgt gcccagccag
cctgaacatc tgcagtgtcg tctcgctgtc cgcgtctccc 780aggacgcatg aaggtggcac
caagctcctg gccctgtccg ccaaaggccg cctgatgacc 840tgcagcctgg acctggactc
tgagatgcct ggcccagcca ggatgaccac agagagtgca 900ggtcagaaaa taaaggagct
gctgtctgga attggcaaca tctctgagag agtgtctttt 960ctaaagaagg cggttgacca
gcggaacaag gcactgacaa gcctcaacga ggccatgaac 1020gtgagctgtg cactgctgtc
aagcggcacg ggccccagac ccatctcctg caccaccagc 1080accacctgga gccgcctgca
gacacaggat gtgctcatgg ccacctgcgt gctagagaac 1140agcagcagct tcagcctgga
ccaggggtgg accctgtgca tccaggtgct caccagctcc 1200tgtgctctcg acctggactc
ggcctgctcc gccatcacct acaccatccc cgtggaccag 1260ctcggccccg gtgctcggcg
ggaggtgacg ctacccctgg gccctggtga gaacggcggg 1320ctcgacctgc ccgtgaccgt
gtcctgcacg ctgttctaca gtctcaggga ggtggtgggc 1380ggggcccttg ccccctcaga
ctctgaggac ccctttctgg atgagtgccc ctccgacgtc 1440ctgcccgagc aagagggtgt
ttgcctgccc ctgagcaggc acacagtgga catgctgcag 1500tgtctgcgct tccctggcct
ggccccgcca cacacacggg ccccctcccc actcggcccc 1560acccgagacc ctgtggccac
ttttctggaa acttgtcggg agcctggcag ccagccagca 1620ggacccgcct ccctgcgggc
cgagtacctg cccccatctg tggcttccat caaggtgtcg 1680gcggagctgc tcagagctgc
cttgaaggac ggccactcag gcgtgcccct gtgctgtgcc 1740accctgcagt ggctccttgc
tgagaatgct gctgtggacg tcgtgagggc ccgagcacta 1800tcttccatcc agggagtggc
ccctgatggc gccaacgttc acctcatcgt ccgagaggtg 1860gccatgaccg acctgtgccc
agcagggccc atccaggccg tggagattca agtggaaagc 1920tcctctctgg ccgacatttg
cagggcgcac catgccgttg tcgggcgcat gcagacgatg 1980gtgacagagc aggccgccca
gggctccagc gctcctgatc tccgtgtgca gtacctccgc 2040cagatccacg ccaaccacga
gacactgctg cgggaggtgc agaccctgcg cgaccggctc 2100tgcacggagg atgaggccag
ctcctgtgcc accgcccaga ggctgctaca ggtgtaccgg 2160cagctgcgcc accccagcct
catcctgctg 2190223163DNAHomo sapiens
22ggacagtgtg ctggtcaccc tggtgcaggg ccctgcccga tggaagatgc agctgtttga
60gcagccctgt cctggggagg acccccggcc aggaggccag atcggtgagg tggagctgtc
120ctcctacacg cccccagccg gggtcccagg aaagcctgca gccccccact tccttccagt
180gctgtgctct gtgtcaccat caggctccag ggtcccgcac gacctcctcg ggggctccgg
240gggcttcacg ctggaggacg ccctcttcgg gctcctcttt ggagctgatg ccaccctcct
300gcagtcacct gtggtcctct gtggtctccc tgatggccag ctctgctgtg tgatcctgaa
360ggccctggtc acctccaggt cagcccctgg tgacccaaat gcccttgtca agatcctcca
420tcacctggag gagcccgtca tcttcatagg ggccttgaag acagagccac aggctgcaga
480agctgcagag aattttctgc ctgacgagga tgtgcactgt gactgcctgg tggcctttgg
540tcaccacggc cggatgctgg ccatcaaggc cagctgggat gagtccggga agctggtgcc
600cgagctgcgg gagtactgcc tcccaggccc tgtgctctgc gctgcctgtg gcgggggtgg
660ccgcgtgtac cacagcaccc cttctgacct ctgtgtggtg gatctgtctc ggggaagcac
720cccgctgggc cctgagcagc ccgaagaagg cccgggaggc ctgcccccca tgctgtgccc
780agccagcctg aacatctgca gtgtcgtctc gctgtccgcg tctcccagga cgcatgaagg
840tggcaccaag ctcctggccc tgtccgccaa aggccgcctg atgacctgca gcctggacct
900ggactctgag atgcctggcc cagccaggat gaccacagag agtgcaggtc agaaaataaa
960ggagctgctg tctggaattg gcaacatctc tgagagagtg tcttttctaa agaaggcggt
1020tgaccagcgg aacaaggcac tgacaagcct caacgaggcc atgaacgtga gctgtgcact
1080gctgtcaagc ggcacgggcc ccagacccat ctcctgcacc accagcacca cctggagccg
1140cctgcagaca caggatgtgc tcatggccac ctgcgtgcta gagaacagca gcagcttcag
1200cctggaccag gggtggaccc tgtgcatcca ggtgctcacc agctcctgtg ctctcgacct
1260ggactcggcc tgctccgcca tcacctacac catccccgtg gaccagctcg gccccggtgc
1320tcggcgggag gtgacgctac ccctgggccc tggtgagaac ggcgggctcg acctgcccgt
1380gaccgtgtcc tgcacgctgt tctacagtct cagggaggtg gtgggcgggg cccttgcccc
1440ctcagactct gaggacccct ttctggatga gtgcccctcc gacgtcctgc ccgagcaaga
1500gggtgtttgc ctgcccctga gcaggcacac agtggacatg ctgcagtgtc tgcgcttccc
1560tggcctggcc ccgccacaca cacgggcccc ctccccactc ggccccaccc gagaccctgt
1620ggccactttt ctggaaactt gtcgggagcc tggcagccag ccagcaggac ccgcctccct
1680gcgggccgag tacctgcccc catctgtggc ttccatcaag gtgtcggcgg agctgctcag
1740agctgccttg aaggacggcc actcaggcgt gcccctgtgc tgtgccaccc tgcagtggct
1800ccttgctgag aatgctgctg tggacgtcgt gagggcccga gcactatctt ccatccaggg
1860agtggcccct gatggcgcca acgttcacct catcgtccga gaggtggcca tgaccgacct
1920gtgcccagca gggcccatcc aggccgtgga gattcaagtg gaaagctcct ctctggccga
1980catttgcagg gcgcaccatg ccgttgtcgg gcgcatgcag acgatggtga cagagcaggc
2040cgcccagggc tccagcgctc ctgatctccg tgtgcagtac ctccgccaga tccacgccaa
2100ccacgagaca ctgctgcggg aggtgcagac cctgcgcgac cggctctgca cggaggatga
2160ggccagctcc tgtgccaccg cccagaggct gctacaggtg taccggcagc tgcgccaccc
2220cagcctcatc ctgctgtgac caggcgggcc tgcccctggg ctctggccac gcttccagcc
2280tctgtcacag cccccccagg cctcatgggt tagagggaaa ccgagctggc ctggccagag
2340ccgtcaggga aggtaggacc tggccacgta ggagcagaac gctcatgaaa gtgcttggag
2400gccgtggagc acaaagcaga ttctgattgg gagcaaccga ggcgggctct gaacctggcc
2460ggtccagctt cgcgtcctct gctggtgtct ctccttctct gaccgcggcc gcagcccctg
2520cactcgcctt cctcactgct gggcagcctt cccaccaccg cagcagcccc tgaggccagg
2580aggcagtgca gggcattctg gacccggagg gccagagaaa caggatttct ggggtttgga
2640cttggggtga gtttgtaact gttgctgcca caccaccagg agcaccggct gcccctctgg
2700gtggcactac caggtgcccc acggtaccct tgtcacactg ttcacacctg cccggctgcc
2760cactctggga ccccgaggta ggagggtgct ccctgagacc aaagcacaaa acagcatgca
2820gggagctcct gcaagtgccc gtggtctcgt gccacaccaa ggaagggcca gcgggtggcc
2880tgtggccgga atgctcaaca actaggtgcc tccggccggg gcagtaccca gcactgtgca
2940ctattttcag ggccactcag ggtggcgctg tggcccgggg gggggccctg agccccagcc
3000cccagcctcc tccctcagcc tgggctacgg cccacctcct ggtgctggtg ttttcatctg
3060gggagggtgc tcgcgccgct cccgctgcag gcactgtccg cgatgagtgc gggtaggagc
3120cgtgaggtgc ttctctgctg tgacaaacga ccctgtctgt ccg
316323730PRTHomo sapiens 23Met Gln Leu Phe Glu Gln Pro Cys Pro Gly Glu
Asp Pro Arg Pro Gly 1 5 10
15 Gly Gln Ile Gly Glu Val Glu Leu Ser Ser Tyr Thr Pro Pro Ala Gly
20 25 30 Val Pro
Gly Lys Pro Ala Ala Pro His Phe Leu Pro Val Leu Cys Ser 35
40 45 Val Ser Pro Ser Gly Ser Arg
Val Pro His Asp Leu Leu Gly Gly Ser 50 55
60 Gly Gly Phe Thr Leu Glu Asp Ala Leu Phe Gly Leu
Leu Phe Gly Ala 65 70 75
80 Asp Ala Thr Leu Leu Gln Ser Pro Val Val Leu Cys Gly Leu Pro Asp
85 90 95 Gly Gln Leu
Cys Cys Val Ile Leu Lys Ala Leu Val Thr Ser Arg Ser 100
105 110 Ala Pro Gly Asp Pro Asn Ala Leu
Val Lys Ile Leu His His Leu Glu 115 120
125 Glu Pro Val Ile Phe Ile Gly Ala Leu Lys Thr Glu Pro
Gln Ala Ala 130 135 140
Glu Ala Ala Glu Asn Phe Leu Pro Asp Glu Asp Val His Cys Asp Cys 145
150 155 160 Leu Val Ala Phe
Gly His His Gly Arg Met Leu Ala Ile Lys Ala Ser 165
170 175 Trp Asp Glu Ser Gly Lys Leu Val Pro
Glu Leu Arg Glu Tyr Cys Leu 180 185
190 Pro Gly Pro Val Leu Cys Ala Ala Cys Gly Gly Gly Gly Arg
Val Tyr 195 200 205
His Ser Thr Pro Ser Asp Leu Cys Val Val Asp Leu Ser Arg Gly Ser 210
215 220 Thr Pro Leu Gly Pro
Glu Gln Pro Glu Glu Gly Pro Gly Gly Leu Pro 225 230
235 240 Pro Met Leu Cys Pro Ala Ser Leu Asn Ile
Cys Ser Val Val Ser Leu 245 250
255 Ser Ala Ser Pro Arg Thr His Glu Gly Gly Thr Lys Leu Leu Ala
Leu 260 265 270 Ser
Ala Lys Gly Arg Leu Met Thr Cys Ser Leu Asp Leu Asp Ser Glu 275
280 285 Met Pro Gly Pro Ala Arg
Met Thr Thr Glu Ser Ala Gly Gln Lys Ile 290 295
300 Lys Glu Leu Leu Ser Gly Ile Gly Asn Ile Ser
Glu Arg Val Ser Phe 305 310 315
320 Leu Lys Lys Ala Val Asp Gln Arg Asn Lys Ala Leu Thr Ser Leu Asn
325 330 335 Glu Ala
Met Asn Val Ser Cys Ala Leu Leu Ser Ser Gly Thr Gly Pro 340
345 350 Arg Pro Ile Ser Cys Thr Thr
Ser Thr Thr Trp Ser Arg Leu Gln Thr 355 360
365 Gln Asp Val Leu Met Ala Thr Cys Val Leu Glu Asn
Ser Ser Ser Phe 370 375 380
Ser Leu Asp Gln Gly Trp Thr Leu Cys Ile Gln Val Leu Thr Ser Ser 385
390 395 400 Cys Ala Leu
Asp Leu Asp Ser Ala Cys Ser Ala Ile Thr Tyr Thr Ile 405
410 415 Pro Val Asp Gln Leu Gly Pro Gly
Ala Arg Arg Glu Val Thr Leu Pro 420 425
430 Leu Gly Pro Gly Glu Asn Gly Gly Leu Asp Leu Pro Val
Thr Val Ser 435 440 445
Cys Thr Leu Phe Tyr Ser Leu Arg Glu Val Val Gly Gly Ala Leu Ala 450
455 460 Pro Leu Asp Ser
Glu Asp Pro Phe Leu Asp Glu Cys Pro Ser Asp Val 465 470
475 480 Leu Pro Glu Gln Glu Gly Val Cys Leu
Pro Leu Ser Arg His Thr Val 485 490
495 Asp Met Leu Gln Cys Leu Arg Phe Pro Gly Leu Ala Pro Pro
His Thr 500 505 510
Arg Ala Pro Ser Pro Leu Gly Pro Thr Arg Asp Pro Val Ala Thr Phe
515 520 525 Leu Glu Thr Cys
Arg Glu Pro Gly Ser Gln Pro Ala Gly Pro Ala Ser 530
535 540 Leu Arg Ala Glu Tyr Leu Pro Pro
Ser Val Ala Ser Ile Lys Val Ser 545 550
555 560 Ala Glu Leu Leu Arg Ala Ala Leu Lys Asp Gly His
Ser Gly Val Pro 565 570
575 Leu Cys Cys Ala Thr Leu Gln Trp Leu Leu Ala Glu Asn Ala Ala Val
580 585 590 Asp Val Val
Arg Ala Arg Ala Leu Ser Ser Ile Gln Gly Val Ala Pro 595
600 605 Asp Gly Ala Asn Val His Leu Ile
Val Arg Glu Val Ala Met Thr Asp 610 615
620 Leu Cys Pro Ala Gly Pro Ile Gln Ala Val Glu Ile Gln
Val Glu Ser 625 630 635
640 Ser Ser Leu Ala Asp Ile Cys Arg Ala His His Ala Val Val Gly Arg
645 650 655 Met Gln Thr Met
Val Thr Glu Gln Ala Ala Gln Gly Ser Ser Ala Pro 660
665 670 Asp Leu Arg Val Gln Tyr Leu Arg Gln
Ile His Ala Asn His Glu Thr 675 680
685 Leu Leu Arg Glu Val Gln Thr Leu Arg Asp Arg Leu Cys Thr
Glu Asp 690 695 700
Glu Ala Ser Ser Cys Ala Thr Ala Gln Arg Leu Leu Gln Val Tyr Arg 705
710 715 720 Gln Leu Arg His Pro
Ser Leu Ile Leu Leu 725 730 24730PRTHomo
sapiens 24Met Gln Leu Phe Glu Gln Pro Cys Pro Gly Glu Asp Pro Arg Pro Gly
1 5 10 15 Gly Gln
Ile Gly Glu Val Glu Leu Ser Ser Tyr Thr Pro Pro Ala Gly 20
25 30 Val Pro Gly Lys Pro Ala Ala
Pro His Phe Leu Pro Val Leu Cys Ser 35 40
45 Val Ser Pro Ser Gly Ser Arg Val Pro His Asp Leu
Leu Gly Gly Ser 50 55 60
Gly Gly Phe Thr Leu Glu Asp Ala Leu Phe Gly Leu Leu Phe Gly Ala 65
70 75 80 Asp Ala Thr
Leu Leu Gln Ser Pro Val Val Leu Cys Gly Leu Pro Asp 85
90 95 Gly Gln Leu Cys Cys Val Ile Leu
Lys Ala Leu Val Thr Ser Arg Ser 100 105
110 Ala Pro Gly Asp Pro Asn Ala Leu Val Lys Ile Leu His
His Leu Glu 115 120 125
Glu Pro Val Ile Phe Ile Gly Ala Leu Lys Thr Glu Pro Gln Ala Ala 130
135 140 Glu Ala Ala Glu
Asn Phe Leu Pro Asp Glu Asp Val His Cys Asp Cys 145 150
155 160 Leu Val Ala Phe Gly His His Gly Arg
Met Leu Ala Ile Lys Ala Ser 165 170
175 Trp Asp Glu Ser Gly Lys Leu Val Pro Glu Leu Arg Glu Tyr
Cys Leu 180 185 190
Pro Gly Pro Val Leu Cys Ala Ala Cys Gly Gly Gly Gly Arg Val Tyr
195 200 205 His Ser Thr Pro
Ser Asp Leu Cys Val Val Asp Leu Ser Arg Gly Ser 210
215 220 Thr Pro Leu Gly Pro Glu Gln Pro
Glu Glu Gly Pro Gly Gly Leu Pro 225 230
235 240 Pro Met Leu Cys Pro Ala Ser Leu Asn Ile Cys Ser
Val Val Ser Leu 245 250
255 Ser Ala Ser Pro Arg Thr His Glu Gly Gly Thr Lys Leu Leu Ala Leu
260 265 270 Ser Ala Lys
Gly Arg Leu Met Thr Cys Ser Leu Asp Leu Asp Ser Glu 275
280 285 Met Pro Gly Pro Ala Arg Met Thr
Thr Glu Ser Ala Gly Gln Lys Ile 290 295
300 Lys Glu Leu Leu Ser Gly Ile Gly Asn Ile Ser Glu Arg
Val Ser Phe 305 310 315
320 Leu Lys Lys Ala Val Asp Gln Arg Asn Lys Ala Leu Thr Ser Leu Asn
325 330 335 Glu Ala Met Asn
Val Ser Cys Ala Leu Leu Ser Ser Gly Thr Gly Pro 340
345 350 Arg Pro Ile Ser Cys Thr Thr Ser Thr
Thr Trp Ser Arg Leu Gln Thr 355 360
365 Gln Asp Val Leu Met Ala Thr Cys Val Leu Glu Asn Ser Ser
Ser Phe 370 375 380
Ser Leu Asp Gln Gly Trp Thr Leu Cys Ile Gln Val Leu Thr Ser Ser 385
390 395 400 Cys Ala Leu Asp Leu
Asp Ser Ala Cys Ser Ala Ile Thr Tyr Thr Ile 405
410 415 Pro Val Asp Gln Leu Gly Pro Gly Ala Arg
Arg Glu Val Thr Leu Pro 420 425
430 Leu Gly Pro Gly Glu Asn Gly Gly Leu Asp Leu Pro Val Thr Val
Ser 435 440 445 Cys
Thr Leu Phe Tyr Ser Leu Arg Glu Val Val Gly Gly Ala Leu Ala 450
455 460 Pro Ser Asp Ser Glu Asp
Pro Phe Leu Asp Glu Cys Pro Ser Asp Val 465 470
475 480 Leu Pro Glu Gln Glu Gly Val Cys Leu Pro Leu
Ser Arg His Thr Val 485 490
495 Asp Met Leu Gln Cys Leu Arg Phe Pro Gly Leu Ala Pro Pro His Thr
500 505 510 Arg Ala
Pro Ser Pro Leu Gly Pro Thr Arg Asp Pro Val Ala Thr Phe 515
520 525 Leu Glu Thr Cys Arg Glu Pro
Gly Ser Gln Pro Ala Gly Pro Ala Ser 530 535
540 Leu Arg Ala Glu Tyr Leu Pro Pro Ser Val Ala Ser
Ile Lys Val Ser 545 550 555
560 Ala Glu Leu Leu Arg Ala Ala Leu Lys Asp Gly His Ser Gly Val Pro
565 570 575 Leu Cys Cys
Ala Thr Leu Gln Trp Leu Leu Ala Glu Asn Ala Ala Val 580
585 590 Asp Val Val Arg Ala Arg Ala Leu
Ser Ser Ile Gln Gly Val Ala Pro 595 600
605 Asp Gly Ala Asn Val His Leu Ile Val Arg Glu Val Ala
Met Thr Asp 610 615 620
Leu Cys Pro Ala Gly Pro Ile Gln Ala Val Glu Ile Gln Val Glu Ser 625
630 635 640 Ser Ser Leu Ala
Asp Ile Cys Arg Ala His His Ala Val Val Gly Arg 645
650 655 Met Gln Thr Met Val Thr Glu Gln Ala
Ala Gln Gly Ser Ser Ala Pro 660 665
670 Asp Leu Arg Val Gln Tyr Leu Arg Gln Ile His Ala Asn His
Glu Thr 675 680 685
Leu Leu Arg Glu Val Gln Thr Leu Arg Asp Arg Leu Cys Thr Glu Asp 690
695 700 Glu Ala Ser Ser Cys
Ala Thr Ala Gln Arg Leu Leu Gln Val Tyr Arg 705 710
715 720 Gln Leu Arg His Pro Ser Leu Ile Leu Leu
725 730
User Contributions:
Comment about this patent or add new information about this topic:
People who visited this patent also read: | |
Patent application number | Title |
---|---|
20220025854 | CORRECTING MEASURED WIND CHARACTERISTIC OF A WIND TURBINE |
20220025853 | DETECTING A WIND TURBINE ROTOR BLADE ADJUSTMENT FAULT |
20220025852 | METHOD FOR CONTROLLING A WIND TURBINE |
20220025851 | ENERGY PRODUCTION APPARATUS |
20220025850 | VERTICAL AXIS WIND TURBINE AIR CONCENTRATION TOWER WITH REDUCED RADAR CROSS SECTION |