Patent application title: NUCLEOTIDE SEQUENCES MEDIATING MALE FERTILITY AND METHOD OF USING SAME
Inventors:
Marc Albertsen (Grimes, IA, US)
Timothy Fox (Des Moines, IA, US)
Gary A. Huffman (Des Moines, IA, US)
Mary Trimnell (West Des Moines, IA, US)
IPC8 Class: AC12N1582FI
USPC Class:
800286
Class name: Method of introducing a polynucleotide molecule into or rearrangement of genetic material within a plant or plant part the polynucleotide encodes an inhibitory rna molecule the rna is antisense
Publication date: 2014-11-13
Patent application number: 20140338071
Abstract:
Nucleotide sequences mediating male fertility in plants are described,
with DNA molecule and amino acid sequences set forth. Promoter sequences
and their essential regions are also identified. The nucleotide sequences
are useful in mediating male fertility in plants.Claims:
1. A male-fertile plant which is homozygous recessive for ms26, wherein
said plant comprises an expression cassette comprising an isolated
polynucleotide operably linked to a heterologous promoter, wherein the
polynucleotide is selected from the group consisting of: a. A
polynucleotide of SEQ ID NO: 17; b. A polynucleotide at least 95%
identical to the full length of SEQ ID NO: 17 and which encodes a
functional Ms26 protein; and c. A polynucleotide encoding a functional
MS26 protein having a polypeptide sequence at least 86% identical to the
full length of SEQ ID NO: 18.
2. The plant of claim 1, wherein said plant is maize, rice, or sorghum.
3. The plant of claim 1, wherein the polynucleotide encodes a functional MS26 protein having a polypeptide sequence at least 95% identical to the full length of SEQ ID NO: 18.
4. The plant of claim 1, wherein the heterologous promoter drives male tissue preferred expression of the operably-linked polynucleotide.
5. A method of restoring male fertility of a male-sterile plant, the method comprising a. introducing into a male-sterile plant homozygous recessive for ms26 an expression cassette comprising an isolated polynucleotide operably linked to a heterologous promoter, wherein the polynucleotide is selected from the group consisting of: i. A polynucleotide of SEQ ID NO: 17; ii. A polynucleotide at least 95% identical to the full length of SEQ ID NO: 17, wherein said polynucleotide encodes a functional Ms26 protein; and iii. A polynucleotide encoding a functional MS26 protein, wherein said protein comprises a polypeptide sequence at least 86% identical to the full length of SEQ ID NO: 18; and b. growing the plant under conditions which result in expression of the operably-linked polynucleotide and restoration of male fertility.
6. The method of claim 5, wherein the method restores male fertility to a maize, rice, or sorghum plant.
7. The method of claim 5, wherein the polynucleotide encodes a functional MS26 protein having a polypeptide sequence at least 95% identical to the full length of SEQ ID NO: 18.
8. The method of claim 5, wherein the heterologous promoter drives male tissue preferred expression of the operably-linked polynucleotide.
9. A method of controlling male fertility of a plant, comprising silencing expression of a gene native to the plant, wherein the gene comprises a polynucleotide selected from the group consisting of: i. A polynucleotide of SEQ ID NO: 17; ii. A polynucleotide at least 95% identical to the full length of SEQ ID NO: 17, wherein said polynucleotide encodes a functional Ms26 protein; and iii. A polynucleotide encoding a functional MS26 protein, wherein said protein comprises a polypeptide sequence at least 86% identical to the full length of SEQ ID NO: 18; wherein the plant is male-sterile as a result of the gene silencing.
10. The method of claim 9 wherein the gene silencing is accomplished by transposon targeting or antisense suppression.
11. The method of claim 9 wherein the plant is maize, rice, or sorghum.
12. The method of claim 9 wherein the polynucleotide encodes a functional MS26 protein having a polypeptide sequence at least 95% identical to the full length of SEQ ID NO: 18.
13. The method of claim 9, further comprising the steps of: a. introducing into the male-sterile plant an expression cassette comprising an isolated polynucleotide operably linked to a heterologous promoter, wherein the operably-linked polynucleotide is selected from the group consisting of: i. A polynucleotide of SEQ ID NO: 17; ii. A polynucleotide at least 95% identical to the full length of SEQ ID NO: 17, wherein said polynucleotide encodes a functional Ms26 protein; and iii. A polynucleotide encoding a functional MS26 protein, wherein said protein comprises a polypeptide sequence at least 86% identical to the full length of SEQ ID NO: 18; and b. growing the plant under conditions which result in expression of the operably-linked polynucleotide and restoration of male fertility of the plant.
14. The method of claim 13 wherein the plant is maize, rice, or sorghum.
15. The method of claim 13 wherein the operably-linked polynucleotide encodes a functional MS26 protein having a polypeptide sequence at least 95% identical to the full length of SEQ ID NO: 18.
Description:
CROSS-REFERENCE
[0001] This application is a continuation of previously filed and co-pending U.S. patent application Ser. No. 12/400,168 filed Mar. 9, 2009 which is a divisional of U.S. patent application Ser. No. 11/166,609 filed Jun. 24, 2005, now U.S. Pat. No. 7,517,975 issued on Apr. 14, 2009 which is a continuation-in-part of U.S. patent application Ser. No. 10/412,000 filed Apr. 11, 2003, now U.S. Pat. No. 7,151,205 issued on Dec. 19, 2006, which is a continuation of previously filed U.S. patent application Ser. No. 09/670,153, filed Sep. 26, 2000, now abandoned, all of which are incorporated by reference in their entirety.
BACKGROUND OF THE INVENTION
[0002] Development of hybrid plant breeding has made possible considerable advances in quality and quantity of crops produced. Increased yield and combination of desirable characteristics, such as resistance to disease and insects, heat and drought tolerance, along with variations in plant composition are all possible because of hybridization procedures. These procedures frequently rely heavily on providing for a male parent contributing pollen to a female parent to produce the resulting hybrid.
[0003] Field crops are bred through techniques that take advantage of the plant's method of pollination. A plant is self-pollinating if pollen from one flower is transferred to the same or another flower of the same plant. A plant is cross-pollinated if the pollen comes from a flower on a different plant.
[0004] In Brassica, the plant is normally self sterile and can only be cross-pollinated. In self-pollinating species, such as soybeans and cotton, the male and female plants are anatomically juxtaposed. During natural pollination, the male reproductive organs of a given flower pollinate the female reproductive organs of the same flower.
[0005] Maize plants (Zea mays L.) present a unique situation in that they can be bred by both self-pollination and cross-pollination techniques. Maize has male flowers, located on the tassel, and female flowers, located on the ear, on the same plant. It can self or cross pollinate. Natural pollination occurs in maize when wind blows pollen from the tassels to the silks that protrude from the tops of the incipient ears.
[0006] A reliable method of controlling fertility in plants would offer the opportunity for improved plant breeding. This is especially true for development of maize hybrids, which relies upon some sort of male sterility system and where a female sterility system would reduce production costs.
[0007] The development of maize hybrids requires the development of homozygous inbred lines, the crossing of these lines, and the evaluation of the crosses. Pedigree breeding and recurrent selection are two of the breeding methods used to develop inbred lines from populations. Breeding programs combine desirable traits from two or more inbred lines or various broad-based sources into breeding pools from which new inbred lines are developed by selfing and selection of desired phenotypes. A hybrid maize variety is the cross of two such inbred lines, each of which may have one or more desirable characteristics lacked by the other or which complement the other. The new inbreds are crossed with other inbred lines and the hybrids from these crosses are evaluated to determine which have commercial potential. The hybrid progeny of the first generation is designated F1. In the development of hybrids only the F1 hybrid plants are sought. The F1 hybrid is more vigorous than its inbred parents. This hybrid vigor, or heterosis, can be manifested in many ways, including increased vegetative growth and increased yield.
[0008] Hybrid maize seed can be produced by a male sterility system incorporating manual detasseling. To produce hybrid seed, the male tassel is removed from the growing female inbred parent, which can be planted in various alternating row patterns with the male inbred parent. Consequently, providing that there is sufficient isolation from sources of foreign maize pollen, the ears of the female inbred will be fertilized only with pollen from the male inbred. The resulting seed is therefore hybrid (F1) and will form hybrid plants.
[0009] Environmental variation in plant development can result in plants tasseling after manual detasseling of the female parent is completed. Or, a detasseler might not completely remove the tassel of a female inbred plant. In any event, the result is that the female plant will successfully shed pollen and some female plants will be self-pollinated. This will result in seed of the female inbred being harvested along with the hybrid seed which is normally produced. Female inbred seed is not as productive as F1 seed. In addition, the presence of female inbred seed can represent a germplasm security risk for the company producing the hybrid.
[0010] Alternatively, the female inbred can be mechanically detasseled by machine. Mechanical detasseling is approximately as reliable as hand detasseling, but is faster and less costly. However, most detasseling machines produce more damage to the plants than hand detasseling. Thus, no form of detasseling is presently entirely satisfactory, and a need continues to exist for alternatives which further reduce production costs and to eliminate self-pollination of the female parent in the production of hybrid seed.
[0011] A reliable system of genetic male sterility would provide advantages. The laborious detasseling process can be avoided in some genotypes by using cytoplasmic male-sterile (CMS) inbreds. In the absence of a fertility restorer gene, plants of a CMS inbred are male sterile as a result of factors resulting from the cytoplasmic, as opposed to the nuclear, genome. Thus, this characteristic is inherited exclusively through the female parent in maize plants, since only the female provides cytoplasm to the fertilized seed. CMS plants are fertilized with pollen from another inbred that is not male-sterile. Pollen from the second inbred may or may not contribute genes that make the hybrid plants male-fertile. Usually seed from detasseled normal maize and CMS produced seed of the same hybrid must be blended to insure that adequate pollen loads are available for fertilization when the hybrid plants are grown and to insure cytoplasmic diversity.
[0012] There can be other drawbacks to CMS. One is an historically observed association of a specific variant of CMS with susceptibility to certain crop diseases. This problem has discouraged widespread use of that CMS variant in producing hybrid maize and has had a negative impact on the use of CMS in maize in general.
[0013] One type of genetic sterility is disclosed in U.S. Pat. Nos. 4,654,465 and 4,727,219 to Brar, et al. However, this form of genetic male sterility requires maintenance of multiple mutant genes at separate locations within the genome and requires a complex marker system to track the genes and make use of the system convenient. Patterson also described a genic system of chromosomal translocations which can be effective, but which are complicated. (See, U.S. Pat. Nos. 3,861,709 and 3,710,511).
[0014] Many other attempts have been made to improve on these drawbacks. For example, Fabijanski, et al., developed several methods of causing male sterility in plants (see, EPO 89/3010153.8 Publication Number 329,308 and PCT Application Number PCT/CA90/00037, published as WO 1990/08828). One method includes delivering into the plant a gene encoding a cytotoxic substance associated with a male tissue specific promoter. Another involves an antisense system in which a gene critical to fertility is identified and an antisense to the gene inserted in the plant. Mariani, et al., also shows several cytotoxic antisense systems. See, EP 89/401,194. Still other systems use "repressor" genes which inhibit the expression of another gene critical to male sterility. See, PCT/GB90/00102, published as WO 1990/08829.
[0015] A still further improvement of this system is one described at U.S. Pat. No. 5,478,369 (incorporated herein by reference) in which a method of imparting controllable male sterility is achieved by silencing a gene native to the plant that is critical for male fertility and replacing the native DNA with the gene critical to male fertility linked to an inducible promoter controlling expression of the gene. The plant is thus constitutively sterile, becoming fertile only when the promoter is induced and its attached male fertility gene is expressed.
[0016] As noted, an essential aspect of much of the work underway with male sterility systems is the identification of genes impacting male fertility.
[0017] Such a gene can be used in a variety of systems to control male fertility including those described herein. Previously, a male fertility gene has been identified in Arabidopsis thaliana and used to produce a male sterile plant. Aarts, et al., (1993) Nature 363:715-717. U.S. Pat. No. 5,478,369 discloses therein one such gene impacting male fertility. In the present invention the inventors provide novel DNA molecules and the amino acid sequence encoded that are critical to male fertility in plants. These can be used in any of the systems where control of fertility is useful, including those described above.
[0018] Thus, one object of the invention is to provide a nucleic acid sequence, the expression of which is critical to male fertility in plants.
[0019] Another object of the invention is to provide a DNA molecule encoding an amino acid sequence, the expression of which is critical to male fertility in plants.
[0020] Yet another object of the invention is to provide a promoter of such nucleotide sequence and its essential sequences.
[0021] A further object of the invention is to provide a method of using such DNA molecules to mediate male fertility in plants.
[0022] Further objects of the invention will become apparent in the description and claims that follow.
SUMMARY OF THE INVENTION
[0023] This invention relates to nucleic acid sequences, and, specifically, DNA molecules and the amino acid encoded by the DNA molecules, which are critical to male fertility. A promoter of the DNA is identified, as well as its essential sequences. It also relates to use of such DNA molecules to mediate fertility in plants.
BRIEF DESCRIPTION OF THE DRAWINGS
[0024] FIG. 1 is a locus map of the male fertility gene Ms26.
[0025] FIG. 2A is a Southern blot of the ms26-m2::Mu8 family hybridized with a Mu8 probe;
[0026] FIG. 2B is a Southern blot of the ms26-m2::Mu8 family hybridized with a PstI fragment isolated from the ms26 clone.
[0027] FIG. 3 is a Northern Blot analysis gel hybridized with a PstI fragment isolated from the Ms26 gene.
[0028] FIG. 4A-4D is the sequence of Ms26 (The cDNA is SEQ ID NO: 1, the protein is SEQ ID NO: 2 and 34)
[0029] FIG. 5A-5D is a comparison of the genomic Ms26 sequence (Residues 1051-3326 of SEQ ID NO: 7) with the cDNA of Ms26 (SEQ ID NO: 1).
[0030] FIG. 6A is a Northern analysis gel showing expression in various plant tissues and FIG. 6B is a gel showing expression stages of microsporogenesis
[0031] FIG. 7 is the full length promoter of Ms26 (SEQ ID NO: 5)
[0032] FIG. 8 is a bar graph showing luciferase activity after deletions of select regions of the Ms26 promoter.
[0033] FIG. 9 shows essential regions of the Ms26 promoter (SEQ ID NO: 6).
[0034] FIG. 10 is a bar graph showing luciferase activity after substitution by restriction site linker scanning of select small (9-10 bp) regions of the Ms26 essential promoter fragment.
[0035] FIGS. 11A and 11B is a comparison of the nucleotide sequence (SEQ ID NO: 3) from the Ms26 orthologue from a sorghum panicle and Ms26 maize cDNA (Residues 201-750 of SEQ ID NO: 1), and the sorghum protein sequence (SEQ ID NO: 4) and Ms26 maize protein (Residues 87-244 of SEQ ID NO: 2).
[0036] FIG. 12 is a representation of the mapping of the male sterility gene ms26.
[0037] FIG. 13 shows a sequence comparison of the region of excision of the ms26-ref allele (SEQ ID NO: 8) with wild-type Ms26 (SEQ ID NO: 9).
[0038] FIG. 14A shows a translated protein sequence alignment between regions of the CYP704B1, a P450 gene (SEQ ID NO: 12) and Ms26 (SEQ ID NO: 13); FIG. 14B shows the phylogenetic tree analysis of select P450 genes.
[0039] FIG. 15 demonstrates the heme binding domain frame shift, showing the translated sequence alignment of regions of the Ms26 cDNA (SEQ ID NO: 14 and 28-29), the genomic regions of exon 5 in fertile plants (SEQ ID NO: 15 and 30-31) and sterile plants (SEQ ID NO: 16 and 32-33).
[0040] FIG. 16 shows alignment of the Ms26 promoter of corn (Residues 650-1089 of SEQ ID NO: 5), sorghum (SEQ ID NO: 19) and rice (SEQ ID NO: 20).
[0041] FIG. 17 shows alignment of the maize Ms26 protein (SEQ ID NO: 2); rice Ms26 protein (SEQ ID NO: 18) and sorghum Ms26 protein (SEQ ID NO: 22) along with a consensus sequence (SEQ ID NO: 35).
DISCLOSURE OF THE INVENTION
[0042] All references referred to are incorporated herein by reference.
[0043] Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. Unless mentioned otherwise, the techniques employed or contemplated therein are standard methodologies well known to one of ordinary skill in the art. The materials, methods and examples are illustrative only and not limiting.
[0044] Genetic male sterility results from a mutation, suppression or other impact to one of the genes critical to a specific step in microsporogenesis, the term applied to the entire process of pollen formulation. These genes can be collectively referred to as male fertility genes (or, alternatively, male sterility genes). There are many steps in the overall pathway where gene function impacts fertility. This seems aptly supported by the frequency of genetic male sterility in maize. New alleles of male sterility mutants are uncovered in materials that range from elite inbreds to unadapted populations. To date, published genetic male sterility research has been mostly descriptive. Some efforts have been made to establish the mechanism of sterility in maize, but few have been satisfactory. This should not be surprising given the number of genes that have been identified as being responsible for male sterility. One mechanism is unlikely to apply to all mutations.
[0045] At U.S. Pat. No. 5,478,369, there is described a method by which a male sterility gene was tagged and cloned on maize chromosome 9. Previously, there has been described a male sterility gene on chromosome 9, ms2, which has never been cloned and sequenced. It is not allelic to the gene referred to in the '369 patent. See, Albertsen and Phillips, (1981) Canadian Journal of Genetics & Cytology 23:195-208. The only fertility gene cloned before that had been the Arabadopsis gene described at Aarts, et al., supra.
[0046] Thus the invention includes using the sequences shown herein it impacts male fertility in a plant, that is, to control male fertility by manipulation of the genome using the genes of the invention. By way of example, without limitation, any of the methods described supra can be used with the sequence of the invention such as introducing a mutant sequence into a plant to cause sterility, causing mutation to the native sequence, introducing an antisense of the sequence into the plant, linking it with other sequences to control its expression or any one of a myriad of processes available to one skilled in the art to impact male fertility in a plant.
[0047] The Ms26 gene described herein is located on maize chromosome 1 and its dominant allele is critical to male fertility. The locus map is represented at FIG. 1. It can be used in the systems described above, and other systems impacting male fertility.
[0048] The maize family cosegregating for sterility was named ms*-SBMu200 and was found to have an approximately 5.5 Kb EcoRI fragment that hybridized with a Mu8 probe (FIG. 2A). A genomic clone from the family was isolated which contained a Mu8 transposon. A probe made from DNA bordering the transposon was found to hybridize to the same ˜5.5 Kb EcoR1 fragment (FIG. 2B). This probe was used to isolate cDNA clones from a tassel cDNA library. The cDNA is 1906 bp, and the Mu insertion occurred in exon 1 of the gene. This probe was also used to map the mutation in an RFLP mapping population. The mutant mapped to the short arm of chromosome 1, near Ms26. Allelism crosses between ms26-ref and ms*-SBMu200 showed that these were allelic, indicating that the mutations occurred in the same gene. The ms*-SBMu200 allele was renamed ms26-m2::Mu8. Two additional alleles for the Ms26 gene were cloned, one containing a Mutator element in the second exon, named ms26-m3::Mu*, and one containing an unknown transposon in the fifth exon from the ms26-ref allele. SEQ ID NO: 7 (discussed further below) represents the genomic nucleotide sequence. Expression patterns, as determined by Northern analysis, show tassel specificity with peak expression at about the quartet to quartet release stages of microsporogenesis.
[0049] Further, it will be evident to one skilled in the art that variations, mutations, derivations including fragments smaller than the entire sequence set forth may be used which retain the male sterility controlling properties of the gene. One of ordinary skill in the art can readily assess the variant or fragment by introduction into plants homozygous for a stable male sterile allele of Ms26, followed by observation of the plant's male tissue development.
[0050] The sequences of the invention may be isolated from any plant, including, but not limited to corn (Zea mays), canola (Brassica napus, Brassica rapa ssp.), alfalfa (Medicago sativa), rice (Oryza sativa), rye (Secale cereale), sorghum (Sorghum bicolor, Sorghum vulgare), sunflower (Helianthus annuus), wheat (Triticum aestivum), soybean (Glycine max), tobacco (Nicotiana tabacum), millet (Panicum spp.), potato (Solanum tuberosum), peanuts (Arachis hypogaea), cotton (Gossypium hirsutum), sweet potato (Ipomoea batatus), cassaya (Manihot esculenta), coffee (Cofea spp.), coconut (Cocos nucifera), pineapple (Ananas comosus), citrus trees (Citrus spp.), cocoa (Theobroma cacao), tea (Camellia sinensis), banana (Musa spp.), avocado (Persea americana), fig (Ficus casica), guava (Psidium guajava), mango (Mangifera indica), olive (Olea europaea), oats (Avena sativa), barley (Hordeum vulgare), vegetables, ornamentals, and conifers. Preferably, plants include corn, soybean, sunflower, safflower, canola, wheat, barley, rye, alfalfa, rice, cotton and sorghum.
[0051] Sequences from other plants may be isolated according to well-known techniques based on their sequence homology to the homologous coding region of the coding sequences set forth herein. In these techniques, all or part of the known coding sequence is used as a probe which selectively hybridizes to other sequences present in a population of cloned genomic DNA fragments (i.e., genomic libraries) from a chosen organism. Methods are readily available in the art for the hybridization of nucleic acid sequences. An extensive guide to the hybridization of nucleic acids is found in Tijssen, Laboratory Techniques in Biochemistry and Molecular Biology--Hybridization with Nucleic Acid Probes, Part I, Chapter 2 "Overview of principles of hybridization and the strategy of nucleic acid probe assays", Elsevier, New York (1993); and Current Protocols in Molecular Biology, Chapter 2, Ausubel, et al., Eds., Greene Publishing and Wiley-Interscience, New York (1995).
[0052] Thus the invention also includes those nucleotide sequences which selectively hybridize to the Ms26 nucleotide sequences under stringent conditions. In referring to a sequence that "selectively hybridizes" with Ms26, the term includes reference to hybridization, under stringent hybridization conditions, of a nucleic acid sequence to the specified nucleic acid target sequence to a detectably greater degree (e.g., at least 2-fold over background) than its hybridization to non-target nucleic acid.
[0053] The terms "stringent conditions" or "stringent hybridization conditions" includes reference to conditions under which a probe will hybridize to its target sequence, to a detectably greater degree than to other sequences (e.g., at least 2-fold over background). Stringent conditions are target-sequence-dependent and will differ depending on the structure of the polynucleotide. By controlling the stringency of the hybridization and/or washing conditions, target sequences can be identified which are 100% complementary to a probe (homologous probing). Alternatively, stringency conditions can be adjusted to allow some mismatching in sequences so that lower degrees of similarity are detected (heterologous probing). Generally, probes of this type are in a range of about 1000 nucleotides in length to about 250 nucleotides in length.
[0054] An extensive guide to the hybridization of nucleic acids is found in Tijssen, Laboratory Techniques in Biochemistry and Molecular Biology--Hybridization with Nucleic Acid Probes, Part I, Chapter 2 "Overview of principles of hybridization and the strategy of nucleic acid probe assays", Elsevier, New York (1993); and Current Protocols in Molecular Biology, Chapter 2, Ausubel, et al., Eds., Greene Publishing and Wiley-Interscience, New York (1995). See also, Sambrook, et al., (1989) Molecular Cloning: A Laboratory Manual (2nd ed. Cold Spring Harbor Laboratory, Cold Spring Harbor, N.Y.).
[0055] In general, sequences that correspond to the nucleotide sequences of the present invention and hybridize to the nucleotide sequence disclosed herein will be at least 50% homologous, 70% homologous and even 85% homologous or more with the disclosed sequence. That is, the sequence similarity between probe and target may range, sharing at least about 50%, about 70%, and even about 85% sequence similarity.
[0056] Specificity is typically the function of post-hybridization washes, the critical factors being the ionic strength and temperature of the final wash solution. Generally, stringent wash temperature conditions are selected to be about 5° C. to about 2° C. lower than the melting point (Tm) for the specific sequence at a defined ionic strength and pH. The melting point or denaturation of DNA occurs over a narrow temperature range and represents the disruption of the double helix into its complementary single strands. The process is described by the temperature of the midpoint of transition, Tm, which is also called the melting temperature. Formulas are available in the art for the determination of melting temperatures.
[0057] Preferred hybridization conditions for the nucleotide sequence of the invention include hybridization at 42° C. in 50% (w/v) formamide, 6×SSC, 0.5% (w/v) SDS, 100 (g/ml salmon sperm DNA. Exemplary low stringency washing conditions include hybridization at 42° C. in a solution of 2×SSC, 0.5% (w/v) SDS for 30 minutes and repeating. Exemplary moderate stringency conditions include a wash in 2×SSC, 0.5% (w/v) SDS at 50° C. for 30 minutes and repeating. Exemplary high stringency conditions include a wash in 0.1×SSC, 0.1% (w/v) SDS, at 65° C. for 30 minutes to one hour and repeating. Sequences that correspond to the promoter of the present invention may be obtained using all the above conditions. For purposes of defining the invention, the high stringency conditions are used.
[0058] The following terms are used to describe the sequence relationships between two or more nucleic acids or polynucleotides: (a) "reference sequence", (b) "comparison window", (c) "sequence identity", and (d) "percentage of sequence identity."
[0059] (a) As used herein, "reference sequence" is a defined sequence used as a basis for sequence comparison. A reference sequence may be a subset or the entirety of a specified sequence; for example, as a segment of a full-length cDNA or gene sequence, or the complete cDNA or gene sequence.
[0060] (b) As used herein, "comparison window" makes reference to a contiguous and specified segment of a polynucleotide sequence, wherein the polynucleotide sequence in the comparison window may comprise additions or deletions (i.e., gaps) compared to the reference sequence (which does not comprise additions or deletions) for optimal alignment of the two sequences. Generally, the comparison window is at least 20 contiguous nucleotides in length, and optionally can be 30, 40, 50 or 100 nucleotides in length, or longer. Those of skill in the art understand that to avoid a high similarity to a reference sequence due to inclusion of gaps in the polynucleotide sequence a gap penalty is typically introduced and is subtracted from the number of matches.
[0061] Methods of aligning sequences for comparison are well-known in the art. Thus, the determination of percent sequence identity between any two sequences can be accomplished using a mathematical algorithm. Non-limiting examples of such mathematical algorithms are the algorithm of Myers and Miller, (1988) CABIOS 4:11-17; the local alignment algorithm of Smith, et al., (1981) Adv. Appl. Math. 2:482; the global alignment algorithm of Needleman and Wunsch, (1970) J. Mol. Biol. 48:443-453; the search-for-local-alignment-method of Pearson and Lipman, (1988) Proc. Natl. Acad. Sci. 85:2444-2448; the algorithm of Karlin and Altschul, (1990) Proc. Natl. Acad. Sci. USA 87:2264, modified as in Karlin and Altschul, (1993) Proc. Natl. Acad. Sci. USA 90:5873-5877.
[0062] Computer implementations of these mathematical algorithms can be utilized for comparison of sequences to determine sequence identity. Such implementations include, but are not limited to: CLUSTAL in the PC/Gene program (available from Intelligenetics, Mountain View, Calif.); the ALIGN program (Version 2.0) and GAP, BESTFIT, BLAST, FASTA and TFASTA in the GCG Wisconsin Genetics Software Package, Version 10 (available from Accelrys Inc., 9685 Scranton Road, San Diego, Calif., USA). Alignments using these programs can be performed using the default parameters. The CLUSTAL program is well described by Higgins, et al., (1988) Gene 73:237-244; Higgins, et al., (1989) CABIOS 5:151-153; Corpet, et al., (1988) Nucleic Acids Res. 16:10881-90; Huang, et al., (1992) CABIOS 8:155-65; and Pearson, et al., (1994) Meth. Mol. Biol. 24:307-331. The ALIGN program is based on the algorithm of Myers and Miller, (1988) supra. A PAM120 weight residue table, a gap length penalty of 12 and a gap penalty of 4 can be used with the ALIGN program when comparing amino acid sequences. The BLAST programs of Altschul, et al., (1990) J. Mol. Biol. 215:403 are based on the algorithm of Karlin and Altschul, (1990) supra. BLAST nucleotide searches can be performed with the BLASTN program, score=100, wordlength=12, to obtain nucleotide sequences homologous to a nucleotide sequence encoding a protein of the invention. BLAST protein searches can be performed with the BLASTX program, score=50, wordlength=3, to obtain amino acid sequences homologous to a protein or polypeptide of the invention. To obtain gapped alignments for comparison purposes, Gapped BLAST (in BLAST 2.0) can be utilized as described in Altschul, et al., (1997) Nucleic Acids Res. 25:3389. Alternatively, PSI-BLAST (in BLAST 2.0) can be used to perform an iterated search that detects distant relationships between molecules. See, Altschul, et al., (1997) supra. When utilizing BLAST, Gapped BLAST, PSI-BLAST, the default parameters of the respective programs (e.g., BLASTN for nucleotide sequences, BLASTX for proteins) can be used. See, http://www.ncbi.nlm.nih.gov. Alignment may also be performed manually by inspection.
[0063] Unless otherwise stated, sequence identity/similarity values provided herein refer to the value obtained using GAP Version 10 using the following parameters: % identity and % similarity for a nucleotide sequence using GAP Weight of 50 and Length Weight of 3 and the nwsgapdna.cmp scoring matrix; % identity and % similarity for an amino acid sequence using GAP Weight of 8 and Length Weight of 2; and the BLOSUM62 scoring matrix or any equivalent program thereof. By "equivalent program" is intended any sequence comparison program that, for any two sequences in question, generates an alignment having identical nucleotide or amino acid residue matches and an identical percent sequence identity when compared to the corresponding alignment generated by GAP Version 10.
[0064] GAP uses the algorithm of Needleman and Wunsch, (1970) J. Mol. Biol. 48: 443-453, to find the alignment of two complete sequences that maximizes the number of matches and minimizes the number of gaps. GAP considers all possible alignments and gap positions and creates the alignment with the largest number of matched bases and the fewest gaps. It allows for the provision of a gap creation penalty and a gap extension penalty in units of matched bases. GAP must make a profit of gap creation penalty number of matches for each gap it inserts. If a gap extension penalty greater than zero is chosen, GAP must, in addition, make a profit for each gap inserted of the length of the gap times the gap extension penalty. Default gap creation penalty values and gap extension penalty values in Version 10 of the GCG Wisconsin Genetics Software Package for protein sequences are 8 and 2, respectively. For nucleotide sequences the default gap creation penalty is 50 while the default gap extension penalty is 3. The gap creation and gap extension penalties can be expressed as an integer selected from the group of integers consisting of from 0 to 200. Thus, for example, the gap creation and gap extension penalties can be 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65 or greater.
[0065] GAP presents one member of the family of best alignments. There may be many members of this family, but no other member has a better quality. GAP displays four figures of merit for alignments: Quality, Ratio, Identity and Similarity. The Quality is the metric maximized in order to align the sequences. Ratio is the quality divided by the number of bases in the shorter segment. Percent Identity is the percent of the symbols that actually match. Percent Similarity is the percent of the symbols that are similar. Symbols that are across from gaps are ignored. A similarity is scored when the scoring matrix value for a pair of symbols is greater than or equal to 0.50, the similarity threshold. The scoring matrix used in Version 10 of the GCG Wisconsin Genetics Software Package is BLOSUM62 (see, Henikoff and Henikoff, (1989) Proc. Natl. Acad. Sci. USA 89:10915).
[0066] (c) As used herein, "sequence identity" or "identity" in the context of two nucleic acid or polypeptide sequences makes reference to the residues in the two sequences that are the same when aligned for maximum correspondence over a specified comparison window. When percentage of sequence identity is used in reference to proteins it is recognized that residue positions which are not identical often differ by conservative amino acid substitutions, where amino acid residues are substituted for other amino acid residues with similar chemical properties (e.g., charge or hydrophobicity) and therefore do not change the functional properties of the molecule. When sequences differ in conservative substitutions, the percent sequence identity may be adjusted upwards to correct for the conservative nature of the substitution. Sequences that differ by such conservative substitutions are said to have "sequence similarity" or "similarity". Means for making this adjustment are well known to those of skill in the art. Typically this involves scoring a conservative substitution as a partial rather than a full mismatch, thereby increasing the percentage sequence identity. Thus, for example, where an identical amino acid is given a score of 1 and a non-conservative substitution is given a score of zero, a conservative substitution is given a score between zero and 1. The scoring of conservative substitutions is calculated, e.g., as implemented in the program PC/GENE (Intelligenetics, Mountain View, Calif.).
[0067] (d) As used herein, "percentage of sequence identity" means the value determined by comparing two optimally aligned sequences over a comparison window, wherein the portion of the polynucleotide sequence in the comparison window may comprise additions or deletions (i.e., gaps) as compared to the reference sequence (which does not comprise additions or deletions) for optimal alignment of the two sequences. The percentage is calculated by determining the number of positions at which the identical nucleic acid base or amino acid residue occurs in both sequences to yield the number of matched positions, dividing the number of matched positions by the total number of positions in the window of comparison and multiplying the result by 100 to yield the percentage of sequence identity.
[0068] The use of the term "polynucleotide" is not intended to limit the present invention to polynucleotides comprising DNA. Those of ordinary skill in the art will recognize that polynucleotides can comprise ribonucleotides and combinations of ribonucleotides and deoxyribonucleotides. Such deoxyribonucleotides and ribonucleotides include both naturally occurring molecules and synthetic analogues. The polynucleotides of the invention also encompass all forms of sequences including, but not limited to, single-stranded forms, double-stranded forms, hairpins, stem-and-loop structures, and the like.
[0069] Identity to the sequence of the present invention would mean a polynucleotide sequence having at least 65% sequence identity, more preferably at least 70% sequence identity, more preferably at least 75% sequence identity, more preferably at least 80% identity, more preferably at least 85% sequence identity, more preferably at least 90% sequence identity and most preferably at least 95% sequence identity.
[0070] Promoter regions can be readily identified by one skilled in the art. The putative start codon containing the ATG motif is identified and upstream from the start codon is the presumptive promoter. By "promoter" is intended a regulatory region of DNA usually comprising a TATA box capable of directing RNA polymerase II to initiate RNA synthesis at the appropriate transcription initiation site for a particular coding sequence. A promoter can additionally comprise other recognition sequences generally positioned upstream or 5' to the TATA box, referred to as upstream promoter elements, which influence the transcription initiation rate. It is recognized that having identified the nucleotide sequences for the promoter region disclosed herein, it is within the state of the art to isolate and identify further regulatory elements in the region upstream of the TATA box from the particular promoter region identified herein. Thus the promoter region disclosed herein is generally further defined by comprising upstream regulatory elements such as those responsible for tissue and temporal expression of the coding sequence, enhancers and the like. In the same manner, the promoter elements which enable expression in the desired tissue such as male tissue can be identified, isolated and used with other core promoters to confirm male tissue-preferred expression. By core promoter is meant the minimal sequence required to initiate transcription, such as the sequence called the TATA box which is common to promoters in genes encoding proteins. Thus the upstream promoter of Ms26 can optionally be used in conjunction with its own or core promoters from other sources. The promoter may be native or non-native to the cell in which it is found.
[0071] The isolated promoter sequence of the present invention can be modified to provide for a range of expression levels of the heterologous nucleotide sequence. Less than the entire promoter region can be utilized and the ability to drive anther-preferred expression retained. However, it is recognized that expression levels of mRNA can be decreased with deletions of portions of the promoter sequence. Thus, the promoter can be modified to be a weak or strong promoter. Generally, by "weak promoter" is intended a promoter that drives expression of a coding sequence at a low level. By "low level" is intended levels of about 1/10,000 transcripts to about 1/100,000 transcripts to about 1/500,000 transcripts. Conversely, a strong promoter drives expression of a coding sequence at a high level, or at about 1/10 transcripts to about 1/100 transcripts to about 1/1,000 transcripts. Generally, at least about 30 nucleotides of an isolated promoter sequence will be used to drive expression of a nucleotide sequence. It is recognized that to increase transcription levels, enhancers can be utilized in combination with the promoter regions of the invention. Enhancers are nucleotide sequences that act to increase the expression of a promoter region. Enhancers are known in the art and include the SV40 enhancer region, the 35S enhancer element, and the like.
[0072] The promoter of the present invention can be isolated from the 5' region of its native coding region of 5' untranslation region (5'UTR). Likewise the terminator can be isolated from the 3' region flanking its respective stop codon. The term "isolated" refers to material such as a nucleic acid or protein which is substantially or essentially free from components which normally accompany or interact with the material as found in it naturally occurring environment or if the material is in its natural environment, the material has been altered by deliberate human intervention to a composition and/or placed at a locus in a cell other than the locus native to the material. Methods for isolation of promoter regions are well known in the art.
[0073] "Functional variants" of the regulatory sequences are also encompassed by the compositions of the present invention. Functional variants include, for example, the native regulatory sequences of the invention having one or more nucleotide substitutions, deletions or insertions. Functional variants of the invention may be created by site-directed mutagenesis, induced mutation or may occur as allelic variants (polymorphisms).
[0074] As used herein, a "functional fragment" is a regulatory sequence variant formed by one or more deletions from a larger regulatory element. For example, the 5' portion of a promoter up to the TATA box near the transcription start site can be deleted without abolishing promoter activity, as described by Opsahl-Sorteberg, et al., (2004) Gene 341:49-58. Such variants should retain promoter activity, particularly the ability to drive expression in male tissues. Activity can be measured by Northern blot analysis, reporter activity measurements when using transcriptional fusions, and the like. See, for example, Sambrook, et al., (1989) Molecular Cloning: A Laboratory Manual (2nd ed. Cold Spring Harbor Laboratory, Cold Spring Harbor, N.Y.), herein incorporated by reference.
[0075] Functional fragments can be obtained by use of restriction enzymes to cleave the naturally occurring regulatory element nucleotide sequences disclosed herein; by synthesizing a nucleotide sequence from the naturally occurring DNA sequence or can be obtained through the use of PCR technology. See particularly, Mullis, et al., (1987) Methods Enzymol. 155:335-350 and Erlich, ed. (1989) PCR Technology (Stockton Press, New York).
[0076] Sequences which hybridize to the regulatory sequences of the present invention are within the scope of the invention. Sequences that correspond to the promoter sequences of the present invention and hybridize to the promoter sequences disclosed herein will be at least 50% homologous, 70% homologous and even 85% homologous or more with the disclosed sequence.
[0077] Smaller fragments may yet contain the regulatory properties of the promoter so identified and deletion analysis is one method of identifying essential regions. Deletion analysis can occur from both the 5' and 3' ends of the regulatory region. Fragments can be obtained by site-directed mutagenesis, mutagenesis using the polymerase chain reaction and the like. (See, Directed Mutagenesis: A Practical Approach IRL Press (1991)). The 3' deletions can delineate the essential region and identify the 3' end so that this region may then be operably linked to a core promoter of choice. Once the essential region is identified, transcription of an exogenous gene may be controlled by the essential region plus a core promoter. By core promoter is meant the sequence called the TATA box which is common to promoters in all genes encoding proteins. Thus the upstream promoter of Ms26 can optionally be used in conjunction with its own or core promoters from other sources. The promoter may be native or non-native to the cell in which it is found.
[0078] The core promoter can be any one of known core promoters such as the Cauliflower Mosaic Virus 35S or 19S promoter (U.S. Pat. No. 5,352,605), ubiquitin promoter (U.S. Pat. No. 5,510,474) the IN2 core promoter (U.S. Pat. No. 5,364,780) or a Figwort Mosaic Virus promoter (Gruber, et al., "Vectors for Plant Transformation" Methods in Plant Molecular Biology and Biotechnology CRC Press pp. 89-119 (1993)).
[0079] The regulatory region of Ms26 has been identified as including the 1005 bp region upstream of the putative TATA box. See, FIG. 7. Further, using the procedures outlined above, it has been determined that an essential region of the promoter includes the -180 bp upstream of the TATA box and specifically, the -176 to -44 region is particularly essential.
[0080] Promoter sequences from other plants may be isolated according to well-known techniques based on their sequence homology to the promoter sequence set forth herein. In these techniques, all or part of the known promoter sequence is used as a probe which selectively hybridizes to other sequences present in a population of cloned genomic DNA fragments (i.e., genomic libraries) from a chosen organism. Methods are readily available in the art for the hybridization of nucleic acid sequences.
[0081] The entire promoter sequence or portions thereof can be used as a probe capable of specifically hybridizing to corresponding promoter sequences. To achieve specific hybridization under a variety of conditions, such probes include sequences that are unique and are preferably at least about 10 nucleotides in length, and most preferably at least about 20 nucleotides in length. Such probes can be used to amplify corresponding promoter sequences from a chosen organism by the well-known process of polymerase chain reaction (PCR). This technique can be used to isolate additional promoter sequences from a desired organism or as a diagnostic assay to determine the presence of the promoter sequence in an organism. Examples include hybridization screening of plated DNA libraries (either plaques or colonies; see e.g., Innis, et al., eds., (1990) PCR Protocols, A Guide to Methods and Applications, Academic Press).
[0082] Further, a promoter of the present invention can be linked with nucleotide sequences other than the Ms26 gene to express other heterologous nucleotide sequences. The nucleotide sequence for the promoter of the invention, as well as fragments and variants thereof, can be provided in expression cassettes along with heterologous nucleotide sequences for expression in the plant of interest, more particularly in the male tissue of the plant. Such an expression cassette is provided with a plurality of restriction sites for insertion of the nucleotide sequence to be under the transcriptional regulation of the promoter. These expression cassettes are useful in the genetic manipulation of any plant to achieve a desired phenotypic response. Examples of other nucleotide sequences which can be used as the exogenous gene of the expression vector with the Ms26 promoter include complementary nucleotidic units such as antisense molecules (callase antisense RNA, barnase antisense RNA and chalcone synthase antisense RNA, Ms45 antisense RNA), ribozymes and external guide sequences, an aptamer or single stranded nucleotides. The exogenous nucleotide sequence can also encode auxins, rol B, cytotoxins, diptheria toxin, DAM methylase, avidin or may be selected from a prokaryotic regulatory system. By way of example, Mariani, et al., (1990) Nature 347:737, have shown that expression in the tapetum of either Aspergillus oryzae RNase-T1 or an RNase of Bacillus amyloliquefaciens, designated "barnase," induced destruction of the tapetal cells, resulting in male infertility. Quaas, et al., (1988) Eur. J. Biochem. 173:617, describe the chemical synthesis of the RNase-T1, while the nucleotide sequence of the barnase gene is disclosed in Hartley, (1988) J. Molec. Biol.; 202:913. The rolB gene of Agrobacterium rhizogenes codes for an enzyme that interferes with auxin metabolism by catalyzing the release of free indoles from indoxyl-β-glucosides. Estruch, et al., (1991) EMBO J. 11:3125 and Spena, et al., (1992) Theor. Appl. Genet. 84:520, have shown that the anther-specific expression of the rolB gene in tobacco resulted in plants having shriveled anthers in which pollen production was severely decreased and the rolB gene is an example of a gene that is useful for the control of pollen production. Slightom, et al., (1985) J. Biol. Chem. 261:108, disclose the nucleotide sequence of the rolB gene. DNA molecules encoding the diphtheria toxin gene can be obtained from the American Type Culture Collection (Rockville, Md.), ATCC Number 39359 or ATCC Number 67011 and see, Fabijanski, et al., EP Application Number 90902754.2 for examples and methods of use. The DAM methylase gene is used to cause sterility in the methods discussed at U.S. Pat. No. 5,689,049 and PCT/US95/15229 by Cigan and Albertsen, "Reversible Nuclear Genetic System for Male Sterility in Transgenic Plants". Also see, discussion of use of the avidin gene to cause sterility at U.S. Pat. No. 5,962,769 "Induction of Male Sterility in Plants by Expression of High Levels of Avidin" by Albertsen, et al.
[0083] The invention includes vectors with the Ms26 gene. A vector is prepared comprising Ms26 a promoter that will drive expression of the gene in the plant and a terminator region. As noted, the promoter in the construct may be the native promoter or a substituted promoter which will provide expression in the plant. Selection of the promoter will depend upon the use intended of the gene. The promoter in the construct may be an inducible promoter, so that expression of the sense or antisense molecule in the construct can be controlled by exposure to the inducer.
[0084] Other components of the vector may be included, also depending upon intended use of the gene. Examples include selectable markers, targeting or regulatory sequences, stabilizing or leader sequences, etc. General descriptions and examples of plant expression vectors and reporter genes can be found in Gruber, et al., "Vectors for Plant Transformation" in Method in Plant Molecular Biology and Biotechnology, Glick, et al., eds; CRC Press pp. 89-119 (1993). The selection of an appropriate expression vector will depend upon the host and the method of introducing the expression vector into the host. The expression cassette will also include at the 3' terminus of the heterologous nucleotide sequence of interest, a transcriptional and translational termination region functional in plants. The termination region can be native with the promoter nucleotide sequence of the present invention, can be native with the DNA sequence of interest, or can be derived from another source. Convenient termination regions are available from the Ti-plasmid of A. tumefaciens, such as the octopine synthase and nopaline synthase termination regions. See also, Guerineau, et al., (1991) Mol. Gen. Genet. 262:141-144; Proudfoot, (1991) Cell 64:671-674; Sanfacon, et al., (1991) Genes Dev. 5:141-149; Mogen, et al., (1990) Plant Cell 2:1261-1272; Munroe, et al., (1990) Gene 91:151-158; Ballas, et al., (1989) Nucleic Acids Res. 17:7891-7903; Joshi, et al., (1987) Nucleic Acid Res. 15:9627-9639.
[0085] The expression cassettes can additionally contain 5' leader sequences. Such leader sequences can act to enhance translation. Translation leaders are known in the art and include: picornavirus leaders, for example, EMCV leader (Encephalomyocarditis 5' noncoding region), Elroy-Stein, et al., (1989) Proc. Nat. Acad. Sci. USA 86:6126-6130; potyvirus leaders, for example, TEV leader (Tobacco Etch Virus), Allison, et al.; MDMV leader (Maize Dwarf Mosaic Virus), Virology 154:9-20 (1986); human immunoglobulin heavy-chain binding protein (BiP), Macejak, et al., (1991) Nature 353:90-94; untranslated leader from the coat protein mRNA of alfalfa mosaic virus (AMV RNA 4), Jobling, et al., (1987) Nature 325:622-625; Tobacco mosaic virus leader (TMV), Gallie, et al., (1989) Molecular Biology of RNA, pages 237-256; and maize chlorotic mottle virus leader (MCMV) Lommel, et al., (1991) Virology 81:382-385. See also, Della-Cioppa, et al., (1987) Plant--Physiology 84:965-968. The cassette can also contain sequences that enhance translation and/or mRNA stability such as introns.
[0086] In those instances where it is desirable to have the expressed product of the heterologous nucleotide sequence directed to a particular organelle, particularly the plastid, amyloplast or to the endoplasmic reticulum or secreted at the cell's surface or extracellularly, the expression cassette can further comprise a coding sequence for a transit peptide. Such transit peptides are well known in the art and include, but are not limited to, the transit peptide for the acyl carrier protein, the small subunit of RUBISCO, plant EPSP synthase, and the like. One skilled in the art will readily appreciate the many options available in expressing a product to a particular organelle. For example, the barley alpha amylase sequence is often used to direct expression to the endoplasmic reticulum (Rogers, (1985) J. Biol. Chem. 260:3731-3738). Use of transit peptides is well known (e.g., see, U.S. Pat. Nos. 5,717,084 and 5,728,925).
[0087] In preparing the expression cassette, the various DNA fragments can be manipulated, so as to provide for the DNA sequences in the proper orientation and, as appropriate, in the proper reading frame. Toward this end, adapters or linkers can be employed to join the DNA fragments or other manipulations can be involved to provide for convenient restriction sites, removal of superfluous DNA, removal of restriction sites, or the like. For this purpose, in vitro mutagenesis, primer repair, restriction digests, annealing and resubstitutions, such as transitions and transversions, can be involved.
[0088] As noted herein, the present invention provides vectors capable of expressing genes of interest under the control of the promoter. In general, the vectors should be functional in plant cells. At times, it may be preferable to have vectors that are functional in E. coli (e.g., production of protein for raising antibodies, DNA sequence analysis, construction of inserts, obtaining quantities of nucleic acids). Vectors and procedures for cloning and expression in E. coli are discussed in Sambrook, et al., (supra).
[0089] The transformation vector comprising the promoter sequence of the present invention operably linked to a heterologous nucleotide sequence in an expression cassette, can also contain at least one additional nucleotide sequence for a gene to be cotransformed into the organism. Alternatively, the additional sequence(s) can be provided on another transformation vector.
[0090] Reporter genes can be included in the transformation vectors. Examples of suitable reporter genes known in the art can be found in, for example, Jefferson, et al., (1991) in Plant Molecular Biology Manual, ed. Gelvin, et al., (Kluwer Academic Publishers), pp. 1-33; DeWet, et al., (1987) Mol. Cell. Biol. 7:725-737; Goff, et al., (1990) EMBO J. 9:2517-2522; Kain, et al., (1995) BioTechniques 19:650-655 and Chiu, et al., (1996) Current Biology 6:325-330.
[0091] Selectable marker genes for selection of transformed cells or tissues can be included in the transformation vectors. These can include genes that confer antibiotic resistance or resistance to herbicides. Examples of suitable selectable marker genes include, but are not limited to, genes encoding resistance to chloramphenicol, Herrera Estrella, et al., (1983) EMBO J. 2:987-992; methotrexate, Herrera Estrella, et al., (1983) Nature 303:209-213; Meijer, et al., (1991) Plant Mol. Biol. 16:807-820; hygromycin, Waldron, et al., (1985) Plant Mol. Biol. 5:103-108; Zhijian, et al., (1995) Plant Science 108:219-227; streptomycin, Jones, et al., (1987) Mol. Gen. Genet. 210:86-91; spectinomycin, Bretagne-Sagnard, et al., (1996) Transgenic Res. 5:131-137; bleomycin, Hille, et al., (1990) Plant Mol. Biol. 7:171-176; sulfonamide, Guerineau, et al., (1990) Plant Mol. Biol. 15:127-136; bromoxynil, Stalker, et al., (1988) Science 242:419-423; glyphosate, Shaw, et al., (1986) Science 233:478-481; phosphinothricin, DeBlock, et al., (1987) EMBO J. 6:2513-2518.
[0092] The method of transformation/transfection is not critical to the instant invention; various methods of transformation or transfection are currently available. As newer methods are available to transform crops or other host cells they may be directly applied. Accordingly, a wide variety of methods have been developed to insert a DNA sequence into the genome of a host cell to obtain the transcription or transcript and translation of the sequence to effect phenotypic changes in the organism. Thus, any method which provides for efficient transformation/transfection may be employed.
[0093] Methods for introducing expression vectors into plant tissue available to one skilled in the art are varied and will depend on the plant selected. Procedures for transforming a wide variety of plant species are well known and described throughout the literature. See, for example, Miki, et al., "Procedures for Introducing Foreign DNA into Plants" in Methods in Plant Molecular Biotechnology, supra; Klein, et al., (1992) Bio/Technology 10:268 and Weising, et al., (1988) Ann. Rev. Genet. 22:421-477. For example, the DNA construct may be introduced into the genomic DNA of the plant cell using techniques such as microprojectile-mediated delivery, Klein, et al., (1987) Nature 327:70-73; electroporation, Fromm, et al., (1985) Proc. Natl. Acad. Sci. 82:5824; polyethylene glycol (PEG) precipitation, Paszkowski, et al., (1984) EMBO J. 3:2717-2722; direct gene transfer, WO 1985/01856 and EP Number 0 275 069; in vitro protoplast transformation, U.S. Pat. No. 4,684,611 and microinjection of plant cell protoplasts or embryogenic callus, Crossway, (1985) Mol. Gen. Genetics 202:179-185. Co-cultivation of plant tissue with Agrobacterium tumefaciens is another option, where the DNA constructs are placed into a binary vector system. See e.g., U.S. Pat. No. 5,591,616; Ishida, et al., (1996) Nature Biotechnology 14:745-750. The virulence functions of the Agrobacterium tumefaciens host will direct the insertion of the construct into the plant cell DNA when the cell is infected by the bacteria. See, for example, Horsch, et al., (1984) Science 233:496-498 and Fraley, et al., (1983) Proc. Natl. Acad. Sci. 80:4803.
[0094] Standard methods for transformation of canola are described at Moloney, et al., (1989) Plant Cell Reports 8:238-242. Corn transformation is described by Fromm, et al., (1990) Bio/Technology 8:833 and Gordon-Kamm, et al., supra. Agrobacterium is primarily used in dicots, but certain monocots such as maize can be transformed by Agrobacterium. See, supra and U.S. Pat. No. 5,550,318. Rice transformation is described by Hiei, et al., (1994) The Plant Journal 6(2):271-282, Christou, et al., (1992) Trends in Biotechnology 10:239 and Lee, et al., (1991) Proc. Nat'l Acad. Sci. USA 88:6389. Wheat can be transformed by techniques similar to those used for transforming corn or rice. Sorghum transformation is described at Casas, et al., supra and sorghum by Wan, et al., (1994) Plant Physicol. 104:37. Soybean transformation is described in a number of publications, including U.S. Pat. No. 5,015,580.
[0095] Further detailed description is provided below by way of instruction and illustration and is not intended to limit the scope of the invention.
EXAMPLE 1
Identification and Cosegregation of ms26-m2::Mu8
[0096] Families of plants from a Mutator (Mu) population were identified that segregated for plants that were mostly male sterile, with none or only a few extruded abnormal anthers, none of which had pollen present. Male sterility is expected to result from those instances where a Mu element has randomly integrated into a gene responsible for some step in microsporogenesis, disrupting its expression. Plants from a segregating F2 family in which the male sterile mutation was designated ms26*-SBMu200, were grown and classified for male fertility/sterility based on the above criteria. Leaf samples were taken and DNA subsequently isolated on approximately 20 plants per phenotypic classification, that is male fertility vs. male sterility.
[0097] Southern analysis was performed to confirm association of Mu with sterility. Southern analysis is a well known technique to those skilled in the art. This common procedure involves isolating the plant DNA, cutting with restriction endonucleases, fractioning the cut DNA by molecular weight on an agarose gel and transferring to nylon membranes to fix the separated DNA. These membranes are subsequently hybridized with a probe fragment that was radioactively labeled with P32P-dCTP, and washed in an SDS solution. Southern, (1975) J. Mol. Biol. 98:503-317. Plants from a segregating F2 ms26*-SBMu200 family were grown and classified for male fertility/sterility. Leaf samples and subsequent DNA isolation was conducted on approximately 20 plants per phenotypic classification. DNA (˜7 ug) from 5 fertile and 12 sterile plants was digested with EcoRI and electrophoresed through a 0.75% agarose gel. The digested DNA was transferred to nylon membrane via Southern transfer. The membrane was hybridized with an internal fragment from the Mu8 transposon. Autoradiography of the membrane revealed cosegregation of a 5.5 Kb EcoRI fragment with the sterility phenotype as shown in FIG. 1. This EcoRI band segregated in the fertile plants suggesting a heterozygous wild type condition for the allele.
EXAMPLE 2
Library Construction, Screening, and Mapping
[0098] The process of genomic library screenings is commonly known among those skilled in the art and is described at Sambrook, et al., Molecular Cloning: A Laboratory Manual, Cold Spring Harbor Laboratory, Cold Spring Harbor Lab Press, Plainview, N.Y. (1989). Libraries were created as follows.
[0099] DNA from a sterile plant was digested with EcoRI and run on a preparative gel. DNA with a molecular weight between 5.0 and 6.0 Kb was excised from the gel, electroeluted and ethanol precipitated. This DNA was ligated into the Lambda Zap vector (Stratagene®) using the manufacturer's protocol. The ligated DNA was packaged into phage particles using Gigapack Gold (Stratagene®). Approximately 500,000 PFU were plated and lifted onto nitrocellulose membranes. Membranes were hybridized with the Mu8 probe. A pure clone was obtained after 3 rounds of screening. The insert was excised from the phage as a plasmid and designated SBMu200-3.1. A PstI border fragment from this clone was isolated and used to reprobe the orginal EcoRI cosegregation blot as shown in FIG. 2B. The 5.5 kb EcoRI fragment is homozygous in all the sterile plants, which confirms that the correct Mu fragment was isolated. Three of the fertile plants are heterozygous for the 5.5 kb EcoRI band and a 4.3 Kb EcoRI band. Two of the fertile plants are homozygous for the 4.3 kb EcoRI band, presumably the wild type allele.
[0100] The PstI probe was used to map the ms*-SBMu200 mutation in an RFLP mapping population. The mutant mapped to the short arm of chromosome 1, near the male sterile locus, Ms26 (Loukides, et al., (1995) Amer. J. Bot 82, 1017-1023). To test whether ms*-SBMu200 was an allele of ms26-ref, ms*-SBMu200 and ms26-ref were crossed with each other using a known heterozygote as the pollen donor. The testcross progeny segregated male-sterile and wild-type plants in a 1:1 ratio, indicating allelism between ms*-SBMu200 and ms26-ref. The ms*-SBMu200 allele was designated ms26-m2::Mu8. The map location is shown in FIG. 12.
EXAMPLE 3
Identification and Cloning of Additional ms26 Alleles
[0101] Three additional Mu insertion mutations in Ms26 were identified by using a polymerase chain reaction (PCR) primer for Mu and a gene specific primer for Ms26. Sequence analyses of the PCR products showed that all three Mu insertions occurred in the second exon (FIG. 1). The F2 seeds from one of these families were grown and examined for male fertility/sterility. Southern blot analyses of this family confirmed the cosegregation of the Mu insertion in Ms26 with the male-sterile phenotype.
[0102] The ms26 allele described in Loukides, et al., (1995) Amer. J. Bot 82:1017-1023 and designated ms26-ref was also investigated. To analyze the mutation in ms26-ref, Ms26 genomic sequences were cloned from ms26-ref sterile and fertile plants. Ms26 was cloned as a ˜4.2 kb EcoRI fragment and ms26-ref cloned as a ˜6 kb HindIII fragment and an overlapping ˜2.3 kb EcoRI fragment from the sterile plant. Sequence analysis revealed the presence of a new segment (1,430 bp) in the last exon of the ms26-ref allele shown in FIG. 1. An 8 by host site duplication (GCCGGAGC) was found that flanks the inserted element and the element also contains a 15 bp terminal inverted repeat (TIR) (TAGGGGTGAAAACGG). The transposon sequence is shown in SEQ ID NO: 10. The ms26-ref genomic sequence in its entirety is shown in SEQ ID NO: 11. A variant of the ms26-ref allele was also found. Sequence analysis of this allele, designated ms26'-0406, was found to have lost the 1430 bp segment found in the last exon of the ms26-ref allele but left an 8 bp footprint at the site of insertion. Plants homozygous for the ms26'-0406 allele were male sterile. A comparison of the excision allele, ms26'-0406 (SEQ ID NO: 8) with the region in the wild-type Ms26 gene (SEQ ID NO: 9) is shown in FIG. 13.
EXAMPLE 4
Expression Analysis and cDNA Isolation
[0103] Northern analysis can be used to detect expression of genes characteristic of anther development at various states of microsporogenesis. Northern analysis is also a commonly used technique known to those skilled in the art and is similar to Southern analysis except that mRNA rather than DNA is isolated and placed on the gel. The RNA is then hybridzed with the labeled probe. Potter, et al., (1981) Proc. Nat. Acad. Sci. USA 78:6662-6666, Lechelt, et al., (1989) Mol. Gen. Genet. 219:225-234. The PstI fragment from the SBMu200-3.1 clone was used to probe a Northern blot containing kernel, immature ear, seedling and tassel RNA. A signal was seen only in tassel RNA at approximately the quartet stage of microsporogenesis, as reflected in FIG. 3. The transcript is about 2.3 kb in length. The same probe was also used to screen a cDNA library constructed from mRNA isolated from meiotic to late uninucleate staged anthers. One clone, designated Ms26-8.1, was isolated from the library.
EXAMPLE 5
Sequence and Expression Analysis
[0104] The SBMu200-3.1 genomic clone and the Ms26-8.1 cDNA clone were sequenced by Loftstrand Labs Limited. Sanger, et al., (1977) Proc. Natl. Acad. Sci. USA 74:5463-5467. The sequences are set forth in FIG. 4 and the comparison is at FIG. 5. The cDNA/genomic comparison reveals five introns are present in the genomic clone. The Mu8 insertion occurs in exon 1. Testing for codon preference and non-randomness in the third position of each codon was consistent with the major ORF in the cDNA being the likely protein-coding ORF. There is a putative Met start codon at position 1089 in the genomic clone. The cDNA homology with respect to the genomic clone begins at nucleotide 1094. Thus Ms26-8.1 does not represent a full length clone and lacks 5 bases up to the putative Met start codon. A database search revealed significant homology to P450 enzymes found in yeast, plants and mammals. P450 enzymes have been widely studied and three characteristic protein domains have been elucidated. The Ms26 protein contains several structural motifs characteristic of eukaryotic P450's, including the heme-binding domain FxxGxRxCxG (domain D), domain A A/GGXD/ETT/S (dioxygen-binding), domain B (steroid-binding), and domain C. The highly conserved heme-binding motif was found in MS26 as FQAGPRICLG, 51 amino acids away from C-terminus. The dioxygen binding domain AGRDTT was located between amino acids 320-325. The steroid-binding domain was found as LVYLHACVTETLR, amino acids 397-409.
[0105] The most significant homologous sequence detected in Genebank database is a deduced protein sequence from rice (GeneBank Accession Number 19071651). The second highest homologous sequence is a putative Arabidopsis P450 gene (CYP704B1) whose function is also unknown. FIG. 14A shows a sequence alignment between CYP704B1 (SEQ ID NO: 12) and Ms26 (SEQ ID NO: 13). Phylogenetic tree analysis of some P450 genes revealed that Ms26 is most closely related to P450s involved in fatty acid omega-hydroxylation found in Arabidopsis thaliana and Vicia sativa (FIG. 14B). The translational frame shift caused in the ms26'-0406 excision mutation is believed to destroy the activity of the heme binding domain, thus resulting in sterility. See, the comparison at FIG. 15 (Ms26 cDNA at SEQ ID NO: 14; fertile exon 5 region at SEQ ID NO: 15 and sterile exon 5 region is SEQ ID NO: 16).
[0106] Further expression studies were done using the Ms26 cDNA probe against a northern containing mRNA at discrete stages of microsporogenesis. FIG. 6A shows a Northern blot with RNA samples from different tissues including root (1), leaf (2), husk (3), cob (4), ear spikelet (5), silk (6), immature embryo (7), mature embryo (8), and tassel from fertile plant (9), ms26-m2::Mu8 sterile plant (10), ms26-ref sterile plant (11) and fertile plant (12). A hybridization signal using Ms26 cDNA was detected only in tassel tissues. FIG. 6B shows a Northern blot containing mRNA at discrete stages of microsporogenesis. Hybridization signals using Ms26 cDNA were detected from meiosis II/quartet stage (4) to late-uninucleate stage (10), with the maximal signal being observed from early-uninucleate through late-uninucleate stage (10).
EXAMPLE 6
Identification of Promoter and Its Essential Regions
[0107] A putative TATA box can be identified by primer extension analysis as described in by Current Protocols in Molecular Biology, Ausubel, et al., eds; John Wiley and Sons, New York pp. 4.8.1-4.8.5 (1987).
[0108] Regulatory regions of anther genes, such as promoters, may be identified in genomic subclones using functional analysis, usually verified by the observation of reporter gene expression in anther tissue and a lower level or absence of reporter gene expression in non-anther tissue. The possibility of the regulatory regions residing "upstream" or 5' ward of the translational start site can be tested by subcloning a DNA fragment that contains the upstream region into expression vectors for transient expression experiments. It is expected that smaller subgenomic fragments may contain the regions essential for male-tissue preferred expression. For example, the essential regions of the CaMV 19S and 35S promoters have been identified in relatively small fragments derived from larger genomic pieces as described in U.S. Pat. No. 5,352,605.
[0109] The selection of an appropriate expression vector with which to test for functional expression will depend upon the host and the method of introducing the expression vector into the host and such methods are well known to one skilled in the art. For eukaryotes, the regions in the vector include regions that control initiation of transcription and control processing. These regions are operably linked to a reporter gene such as UidA, encoding β-glucuronidase (GUS), or luciferase. General descriptions and examples of plant expression vectors and reporter genes can be found in Gruber, et al., "Vectors for Plant Transformation" in Methods in Plant Molecular Biology and Biotechnology; Glick, et al., eds; CRC Press; pp. 89-119; (1993). GUS expression vectors and GUS gene cassettes are commercially available from Clonetech, Palo Alto, Calif., while luciferase expression vectors and luciferase gene cassettes are available from Promega Corporation, Madison, Wis. Ti plasmids and other Agrobacterium vectors are described in Ishida, et al., (1996) Nature Biotechnology 14:745-750 and in U.S. Pat. No. 5,591,616 "Method for Transforming Monocotyledons" (1994).
[0110] Expression vectors containing putative regulatory regions located in genomic fragments can be introduced into intact tissues such as staged anthers, embryos or into callus. Methods of DNA delivery include microprojectile bombardment, DNA injection, electroporation and Agrobacterium-mediated gene transfer (see, Gruber, et al., "Vectors for Plant Transformation," in Methods in Plant Molecular Biology and Biotechnology, Glick, et al., eds.; CRC Press; (1993); U.S. Pat. No. 5,591,616; and Ishida, et al., (1996) Nature Biotechnology 14:745-750). General methods of culturing plant tissues are found in Gruber, et al., supra and Glick, supra.
[0111] For the transient assay system, staged, isolated anthers are immediately placed onto tassel culture medium (Pareddy and Petelino, (1989) Crop Sci. J.; 29:1564-1566) solidified with 0.5% Phytagel (Sigma, St. Louis) or other solidifying media. The expression vector DNA is introduced within 5 hours preferably by microprojectile-mediated delivery with 1.2 μm particles at 1000-1100 Psi. After DNA delivery, the anthers are incubated at 26° C. upon the same tassel culture medium for 17 hours and analyzed by preparing a whole tissue homogenate and assaying for GUS or for lucifierase activity (see, Gruber, et al., supra).
[0112] Upstream of the likely translational start codon of Ms26, 1088 bp of DNA was present in the genomic clone ms26-m2::Mu8. Translational fusions via an engineered NcoI site were generated with reporter genes encoding luciferase and β-glucuronidase to test whether this fragment of DNA had promoter activity in transient expression assays of bombarded plant tissues. Activity was demonstrated in anthers and not in coleoptiles, roots and calli, suggesting anther-preferred or anther-specific promoter activity.
[0113] A reasonable TATA box was observed by inspection, about 83-77 bp upstream of the translational start codon. The genomic clone ms26-m2::Mu8 thus includes about 1005 bp upstream of the possible TATA box. For typical plant genes, the start of transcription is 26-36 bp downstream of the TATA box, which would give the Ms26 mRNA a 5'-nontranslated leader of about 48-58 nt. The total ms26-m2::Mu8 subgenomic fragment of 1088 bp, including nontranslated leader, start of transcription, TATA box and sequences upstream of the TATA box, was thus shown to be sufficient for promoter activity. See, SEQ. ID NO: 5. The putative TATA box (TATATCA) is underlined. Thus, the present invention encompasses a DNA molecule having a nucleotide sequence of SEQ ID NO: 5 (or those with sequence identity) and having the function of a male tissue-preferred regulatory region.
[0114] Deletion analysis can occur from both the 5' and 3' ends of the regulatory region: fragments can be obtained by site-directed mutagenesis, mutagenesis using the polymerase chain reaction, and the like (Directed Mutagenesis: A Practical Approach; IRL Press; (1991)). The 3' end of the male tissue-preferred regulatory region can be delineated by proximity to the putative TATA box or by 3' deletions if necessary. The essential region may then be operably linked to a core promoter of choice. Once the essential region is identified, transcription of an exogenous gene may be controlled by the male tissue-preferred region of Ms26 plus a core promoter. The core promoter can be any one of known core promoters such as a Cauliflower Mosaic Virus 35S or 19S promoter (U.S. Pat. No. 5,352,605), Ubiquitin (U.S. Pat. No. 5,510,474), the IN2 core promoter (U.S. Pat. No. 5,364,780) or a Figwort Mosaic Virus promoter (Gruber, et al., "Vectors for Plant Transformation" in Methods in Plant Molecular Biology and Biotechnology; Glick, et al., eds.; CRC Press; pp. 89-119; (1993)). Preferably, the promoter is the core promoter of a male tissue-preferred gene or the CaMV 35S core promoter. More preferably, the promoter is a promoter of a male tissue-preferred gene and in particular, the Ms26 core promoter.
[0115] Further mutational analysis, for example by linker scanning, a method well known to the art, can identify small segments containing sequences required for anther-preferred expression. These mutations may introduce modifications of functionality such as in the levels of expression, in the timing of expression, or in the tissue of expression. Mutations may also be silent and have no observable effect.
[0116] The foregoing procedures were used to identify essential regions of the Ms26 promoter. After linking the promoter with the luciferase marker gene deletion analysis was performed on the regions of the promoter upstream of the putative TATA box, as represented in FIG. 8. The x-axis of the bar graph indicates the number of base pairs immediately upstream of the putative TATA box retained in a series of deletion derivatives starting from the 5' end of the promoter. The y-axis shows the normalized luciferase activity as a percent of full-length promoter activity.
[0117] As is evident from the graph, approximately 176 bp immediately upstream of the TATA box was sufficient, when coupled to the core promoter (putative TATA box through start of transcription), plus 5' nontranslated leader, for transient expression in anthers. By contrast, luciferase activity was minimal upon further deletion from the 5' end to 91 bp upstream of the putative TATA box. This 176 bp upstream of the putative TATA box through the nontranslated leader can be considered a minimal promoter, which is further represented at FIG. 9. The TATA box is underlined. Deletion within the full-length promoter from -176 through -92 relative to the TATA box reduced activity to about 1% of wild type. Deletion of -39 through -8 did not greatly reduce activity. Therefore the -176 to -44 bp region contains an essential region and thus would constitute an upstream enhancer element conferring anther expression on the promoter, which we refer to as an "anther box".
[0118] Linker scanning analysis was conducted across the anther box in 9-10 bp increments. The locations of the linker scanning substitutions in this region are shown in FIG. 9 and the expression levels of the mutants relative to the wild type sequence are shown in FIG. 10. The most drastic effect on transient expression in anthers was observed for mutants LS12 and LS13, in the region 52-71 bp upstream of the putative TATA box. A major effect on transient expression in anthers was also observed for mutants LS06, LS07, LS08 and LS10, within the region 82-131 bp upstream of the putative TATA box. Sequences within the anther box required for wild type levels of transient expression in anthers are thus demonstrated in the -52 to -131 region relative to the putative TATA box, particularly the -52 to -71 region.
EXAMPLE 7
Ms26 Sorghum, Rice and Maize Comparison
[0119] As noted above, Ms26 is a male fertility gene in maize. When it is mutated, and made homozygous recessive, male sterility will result. An orthologue of Ms26 was identified in sorghum. The sorghum orthologue of the Ms26 cDNA was isolated by using the maize Ms26 gene primers in a polymerase chain reaction with sorghum tassel cDNA as the template. The resultant cDNA fragment was sequenced by methods described supra and then compared to the Ms26 cDNA from maize. Nucleotide sequence comparisons are set forth in FIG. 11 and show 90% identity. An orthologue from rice was also identified and the predicted coding sequence is SEQ ID NO: 17 and protein is SEQ ID NO: 18 is set forth in FIG. 19. It has one intron less than the maize and sorghum Ms26, and the coding sequences are highly conserved.
[0120] Identification of the sorghum and rice promoters was accomplished. FIG. 16 shows an alignment of the Ms26 promoter of corn (SEQ ID NO: 5), sorghum (SEQ ID NO: 19) and rice (SEQ ID NO: 20). The last three bases of the corn promoter shown in the figure is the ATG start of translation.
[0121] Alignment as reflected in FIG. 17 of the maize Ms26 protein (SEQ ID NO: 2), rice Ms26 protein (SEQ ID NO: 18) and sorghum Ms26 protein (SEQ ID NO: 22) and a consensus sequence (SEQ ID NO: 35). The comparison of protein sequences shows the protein is highly conserved among the orthologues, with the rice protein sharing 92% similarity and 86% identity when compared to the maize orthologue. The predicted tissue specificity in rice and sorghum is further reflected in a comparison of the Ms26 protein in the sorghum and rice EST database derived from panicle (flower) libraries. Sorghum sequences producing significant alignments (GenBank accession numbers BI075441.1; BI075273.1; BI246000.1; BI246162.1; BG948686.1; BI099541.1 and BG948366.1, among others) all were sequences from immature panicle of sorghum, and sequences showing significant alignment in rice (GenBank accession numbers C73892.1; CR290740.1, among others) were also from rice immature panicle.
[0122] As is evident from the above, nucleotide sequences which map to the short arm of chromosome 1 of the Zea mays genome, at the same site as the Ms26 gene, ms26-m2::Mu8 and its alleles, are genes critical to male fertility in plants, that is, are necessary for fertility of a plant, or, when mutated from the sequence found in a fertile plant, cause sterility in the plant.
[0123] Thus it can be seen that the invention achieves at least all of its objectives.
Sequence CWU
1
1
3511906DNAZea maysCDS(1)..(1638) 1gaa ttc ggc acg agg gaa gct cac ctc acg
ccg gcg acg cca tcg cca 48Glu Phe Gly Thr Arg Glu Ala His Leu Thr
Pro Ala Thr Pro Ser Pro1 5 10
15 ttc ttc cca cta gca ggg cct cac aag tac atc gcg ctc ctt ctg
gtt 96Phe Phe Pro Leu Ala Gly Pro His Lys Tyr Ile Ala Leu Leu Leu
Val 20 25 30 gtc ctc
tca tgg atc ctg gtc cag agg tgg agc ctg agg aag cag aaa 144Val Leu
Ser Trp Ile Leu Val Gln Arg Trp Ser Leu Arg Lys Gln Lys 35
40 45 ggc ccg aga tca tgg cca gtc
atc ggc gca acg gtg gag cag ctg agg 192Gly Pro Arg Ser Trp Pro Val
Ile Gly Ala Thr Val Glu Gln Leu Arg 50 55
60 aac tac cac cgg atg cac gac tgg ctt gtc ggg tac
ctg tca cgg cac 240Asn Tyr His Arg Met His Asp Trp Leu Val Gly Tyr
Leu Ser Arg His65 70 75
80agg aca gtg acc gtc gac atg ccg ttc act tcc tac acc tac atc gct
288Arg Thr Val Thr Val Asp Met Pro Phe Thr Ser Tyr Thr Tyr Ile Ala
85 90 95 gac ccg gtg aat gtc
gag cat gtc ctc aag act aac ttc acc aat tac 336Asp Pro Val Asn Val
Glu His Val Leu Lys Thr Asn Phe Thr Asn Tyr 100
105 110 ccc aag gga atc gtg tac aga tcc tac atg
gac gtg ctc ctc ggt gac 384Pro Lys Gly Ile Val Tyr Arg Ser Tyr Met
Asp Val Leu Leu Gly Asp 115 120
125 ggc atc ttc aac gcc gac ggc gag ctg tgg agg aag cag agg
aag acg 432Gly Ile Phe Asn Ala Asp Gly Glu Leu Trp Arg Lys Gln Arg
Lys Thr 130 135 140 gcg
agt ttc gag ttc gcc tcc aag aac ctg agg gat ttc agc gcc att 480
Ala Ser Phe Glu Phe Ala Ser Lys Asn Leu Arg Asp Phe Ser Ala Ile145
150 155 160gtg ttc aga gag tac
tcc ctg aag ctg tcg ggt ata ctg agc cag gca 528 Val Phe Arg Glu
Tyr Ser Leu Lys Leu Ser Gly Ile Leu Ser Gln Ala 165
170 175 tcc aag gca ggc aaa gtt gtg gac atg
cag gaa ctt tac atg agg atg 576 Ser Lys Ala Gly Lys Val Val Asp
Met Gln Glu Leu Tyr Met Arg Met 180 185
190 acg ctg gac tcc atc tgc aag gtt ggg ttc ggg gtc gag
atc ggc acg 624 Thr Leu Asp Ser Ile Cys Lys Val Gly Phe Gly Val
Glu Ile Gly Thr 195 200 205
ctg tcg cca gat ctc ccc gag aac agc ttc gcg cag gcg ttc gat gcc
672 Leu Ser Pro Asp Leu Pro Glu Asn Ser Phe Ala Gln Ala Phe Asp Ala
210 215 220 gcc aac atc
atc atc acg ctg cgg ttc atc gac ccg ctg tgg cgc atc 720 Ala Asn
Ile Ile Ile Thr Leu Arg Phe Ile Asp Pro Leu Trp Arg Ile225
230 235 240aag agg ttc ttc cac gtc ggg
tca gag gcc ctc cta gcg cag agc atc 768 Lys Arg Phe Phe His Val
Gly Ser Glu Ala Leu Leu Ala Gln Ser Ile 245
250 255 aag ctc gtg gac gag ttc acc tac agc gtg atc
cgc cgg agg aag gcc 816 Lys Leu Val Asp Glu Phe Thr Tyr Ser Val
Ile Arg Arg Arg Lys Ala 260 265
270 gag atc gtc gag gtc cgg gcc agc ggc aaa cag gag aag atg aag
cac 864 Glu Ile Val Glu Val Arg Ala Ser Gly Lys Gln Glu Lys Met
Lys His 275 280 285 gac
atc ctg tca cgg ttc atc gag ctg ggc gag gcc ggc gac gac ggc 912
Asp Ile Leu Ser Arg Phe Ile Glu Leu Gly Glu Ala Gly Asp Asp Gly 290
295 300 ggc ggc ttc ggg gac
gat aag agc ctc cgg gac gtg gtg ctc aac ttc 960 Gly Gly Phe Gly
Asp Asp Lys Ser Leu Arg Asp Val Val Leu Asn Phe305 310
315 320gtg atc gcc ggg cgg gac acg acg gcg
acg acg ctg tcg tgg ttc acg 1008 Val Ile Ala Gly Arg Asp Thr Thr Ala
Thr Thr Leu Ser Trp Phe Thr 325 330
335 cac atg gcc atg tcc cac ccg gac gtg gcc gag aag ctg cgc
cgc gag 1056 His Met Ala Met Ser His Pro Asp Val Ala Glu Lys Leu Arg
Arg Glu 340 345 350 ctg
tgc gcg ttc gag gcg gag cgc gcg cgc gag gag ggc gtc acg ctc 1104 Leu
Cys Ala Phe Glu Ala Glu Arg Ala Arg Glu Glu Gly Val Thr Leu 355
360 365 gtg ctc tgc ggc ggc gct
gac gcc gac gac aag gcg ttc gcc gcc cgc 1152 Val Leu Cys Gly Gly Ala
Asp Ala Asp Asp Lys Ala Phe Ala Ala Arg 370 375
380 gtg gcg cag ttc gcg ggc ctc ctc acc tac gac
agc ctc ggc aag ctg 1200 Val Ala Gln Phe Ala Gly Leu Leu Thr Tyr Asp
Ser Leu Gly Lys Leu385 390 395
400gtc tac ctc cac gcc tgc gtc acc gag acg ctc cgc ctg tac ccc gcc
1248 Val Tyr Leu His Ala Cys Val Thr Glu Thr Leu Arg Leu Tyr Pro Ala
405 410 415 gtc cct cag
gac ccc aag ggg atc ctg gag gac gac gtg ctg ccg gac 1296 Val Pro Gln
Asp Pro Lys Gly Ile Leu Glu Asp Asp Val Leu Pro Asp 420
425 430 ggg acg aag gtg agg gcc ggc ggg
atg gtg acg tac gtg ccc tac tcg 1344 Gly Thr Lys Val Arg Ala Gly Gly
Met Val Thr Tyr Val Pro Tyr Ser 435 440
445 atg ggg cgg atg gag tac aac tgg ggc ccc gac gcg gcg
agc ttc cgg 1392 Met Gly Arg Met Glu Tyr Asn Trp Gly Pro Asp Ala Ala
Ser Phe Arg 450 455 460
ccg gag cgg tgg atc aac gag gat ggc gcg ttc cgc aac gcg tcg ccg 1440
Pro Glu Arg Trp Ile Asn Glu Asp Gly Ala Phe Arg Asn Ala Ser Pro465
470 475 480ttc aag ttc acg gcg
ttc cag gcg ggg ccg agg atc tgc ctg ggc aag 1488 Phe Lys Phe Thr Ala
Phe Gln Ala Gly Pro Arg Ile Cys Leu Gly Lys 485
490 495 gac tcg gcg tac ctg cag atg aag atg gcg
ctg gcc atc ctc ttc cgc 1536 Asp Ser Ala Tyr Leu Gln Met Lys Met Ala
Leu Ala Ile Leu Phe Arg 500 505
510 ttc tac agc ttc cgg ctg ctg gag ggg cac ccg gtg cag tac cgc
atg 1584 Phe Tyr Ser Phe Arg Leu Leu Glu Gly His Pro Val Gln Tyr Arg
Met 515 520 525 atg acc
atc ctc tcc atg gcg cac ggc ctc aag gtc cgc gtc tct agg 1632 Met Thr
Ile Leu Ser Met Ala His Gly Leu Lys Val Arg Val Ser Arg 530
535 540 gcc gtc tgatgtcatg gcgatttgga
tatggatatc gtcccgctta atccacgaca 1688 Ala Val
545
aataacgctc gtgttacaaa tttgcatgca tgcatgtaag
ggaaagcgat gggtttcatt 1748ggtggcttgg cttaagcctt aaaaactccg tcgggtcttg
cgaaccacca catcactagt 1808gttttgtact ctactcctca gtggaagtgt agtgacagca
tacaagttca tcatatatat 1868tatcctcttt cttaaaaaaa aaaaaaaaaa aactcgag
19062546PRTZea mays 2Glu Phe Gly Thr Arg Glu Ala
His Leu Thr Pro Ala Thr Pro Ser Pro 1 5
10 15 Phe Phe Pro Leu Ala Gly Pro His Lys Tyr Ile
Ala Leu Leu Leu Val 20 25
30 Val Leu Ser Trp Ile Leu Val Gln Arg Trp Ser Leu Arg Lys Gln
Lys 35 40 45 Gly
Pro Arg Ser Trp Pro Val Ile Gly Ala Thr Val Glu Gln Leu Arg 50
55 60 Asn Tyr His Arg Met
His Asp Trp Leu Val Gly Tyr Leu Ser Arg His 65 70
75 80 Arg Thr Val Thr Val Asp Met Pro Phe
Thr Ser Tyr Thr Tyr Ile Ala 85 90
95 Asp Pro Val Asn Val Glu His Val Leu Lys Thr Asn Phe
Thr Asn Tyr 100 105 110
Pro Lys Gly Ile Val Tyr Arg Ser Tyr Met Asp Val Leu Leu Gly Asp
115 120 125 Gly Ile Phe
Asn Ala Asp Gly Glu Leu Trp Arg Lys Gln Arg Lys Thr 130
135 140 Ala Ser Phe Glu Phe Ala Ser
Lys Asn Leu Arg Asp Phe Ser Ala Ile 145 150
155 160 Val Phe Arg Glu Tyr Ser Leu Lys Leu Ser Gly
Ile Leu Ser Gln Ala 165 170
175 Ser Lys Ala Gly Lys Val Val Asp Met Gln Glu Leu Tyr Met Arg
Met 180 185 190 Thr
Leu Asp Ser Ile Cys Lys Val Gly Phe Gly Val Glu Ile Gly Thr 195
200 205 Leu Ser Pro Asp Leu
Pro Glu Asn Ser Phe Ala Gln Ala Phe Asp Ala 210 215
220 Ala Asn Ile Ile Ile Thr Leu Arg Phe
Ile Asp Pro Leu Trp Arg Ile 225 230 235
240 Lys Arg Phe Phe His Val Gly Ser Glu Ala Leu Leu Ala
Gln Ser Ile 245 250 255
Lys Leu Val Asp Glu Phe Thr Tyr Ser Val Ile Arg Arg Arg Lys Ala
260 265 270 Glu Ile Val
Glu Val Arg Ala Ser Gly Lys Gln Glu Lys Met Lys His 275
280 285 Asp Ile Leu Ser Arg Phe Ile
Glu Leu Gly Glu Ala Gly Asp Asp Gly 290 295
300 Gly Gly Phe Gly Asp Asp Lys Ser Leu Arg Asp
Val Val Leu Asn Phe 305 310 315
320 Val Ile Ala Gly Arg Asp Thr Thr Ala Thr Thr Leu Ser Trp Phe
Thr 325 330 335 His
Met Ala Met Ser His Pro Asp Val Ala Glu Lys Leu Arg Arg Glu
340 345 350 Leu Cys Ala Phe Glu
Ala Glu Arg Ala Arg Glu Glu Gly Val Thr Leu 355
360 365 Val Leu Cys Gly Gly Ala Asp Ala Asp
Asp Lys Ala Phe Ala Ala Arg 370 375
380 Val Ala Gln Phe Ala Gly Leu Leu Thr Tyr Asp Ser Leu
Gly Lys Leu 385 390 395
400 Val Tyr Leu His Ala Cys Val Thr Glu Thr Leu Arg Leu Tyr Pro Ala
405 410 415 Val Pro Gln
Asp Pro Lys Gly Ile Leu Glu Asp Asp Val Leu Pro Asp 420
425 430 Gly Thr Lys Val Arg Ala Gly
Gly Met Val Thr Tyr Val Pro Tyr Ser 435 440
445 Met Gly Arg Met Glu Tyr Asn Trp Gly Pro Asp
Ala Ala Ser Phe Arg 450 455 460
Pro Glu Arg Trp Ile Asn Glu Asp Gly Ala Phe Arg Asn Ala Ser
Pro 465 470 475 480 Phe
Lys Phe Thr Ala Phe Gln Ala Gly Pro Arg Ile Cys Leu Gly Lys
485 490 495 Asp Ser Ala Tyr Leu
Gln Met Lys Met Ala Leu Ala Ile Leu Phe Arg 500
505 510 Phe Tyr Ser Phe Arg Leu Leu Glu Gly
His Pro Val Gln Tyr Arg Met 515 520
525 Met Thr Ile Leu Ser Met Ala His Gly Leu Lys Val Arg
Val Ser Arg 530 535 540
Ala Val 545
3494DNASorghum sp.modified_base(351)a, c, t, g, unknown or other
3ggaattcggc ttatgccgtt cacttcctac acctacatcg ctgacccggt gaatgtcgag
60catgtcctca agactaactt caccaattac cccaaggggg acgtgtacag atcctacatg
120gatgtgctcc tcggtgacgg catattcaac gctgacggcg agctgtggag gaagcagagg
180aagacggcga gtttcgagtt cgcctccaag aacctgaggg atttcagtgc caatgttttc
240agagagtact ccctgaagct gtcgggcata ctgagtcagg catccaaggc aggcaaagtt
300gttgacatgc aggaacttta catgaggatg acactggact cgatctgcaa ngttgggttc
360ggggtcnana tcggcacgct gtcnccggat ctccccgaga acagcttcnc ccaagcgttc
420gatgccgcta acatcatcgt cacnctgcgg ttcatccacc cnctgtggcg catccagaag
480ttcttccccn gtca
4944158PRTSorghum sp.MOD_RES(113)Any amino acid 4Met Pro Phe Thr Ser Tyr
Thr Tyr Ile Ala Asp Pro Val Asn Val Glu 1 5
10 15 His Val Leu Lys Thr Asn Phe Thr Asn Tyr Pro
Lys Gly Asp Val Tyr 20 25
30 Arg Ser Tyr Met Asp Val Leu Leu Gly Asp Gly Ile Phe Asn Ala Asp
35 40 45 Gly Glu
Leu Trp Arg Lys Gln Arg Lys Thr Ala Ser Phe Glu Phe Ala 50
55 60 Ser Lys Asn Leu Arg Asp Phe
Ser Ala Asn Val Phe Arg Glu Tyr Ser 65 70
75 80Leu Lys Leu Ser Gly Ile Leu Ser Gln Ala Ser Lys
Ala Gly Lys Val 85 90
95 Val Asp Met Gln Glu Leu Tyr Met Arg Met Thr Leu Asp Ser Ile Cys
100 105 110 Xaa Val Gly
Phe Gly Val Xaa Ile Gly Thr Leu Ser Pro Asp Leu Pro 115
120 125 Glu Asn Ser Phe Xaa Gln Ala Phe
Asp Ala Ala Asn Ile Ile Val Thr 130 135
140 Leu Arg Phe Ile His Pro Leu Trp Arg Ile Gln Lys Phe
Phe 145 150 155
51092DNAZea mays 5gaattccaag cgaggccctt gtagcagaga gtgttgctga tgcagtcggc
ggaaatgagt 60gcgtgctgag agcaacgctg aggggttcca gggatggcaa tggctatggc
aatcggctag 120aggtggagga caaggtggtg aggattggga gggcaaccta tggcaagttg
gtgaagaggc 180acgcaatgag agatctattc agacttacac tggatgccgc caacaaattc
aacctttaga 240ttttgatact gtcactccta ctttattcct tggttgggca acttccaata
ggctcatgtt 300aatcaatgat tagtgattat tcagcaaata ttcttgtttg tttgacattt
ataatatgtg 360gggtgagacg gattaaatat catccatgag agctttatct tcatgctctc
ttgattttgg 420tttcagatca ttctttcagt gttcacaaga attttctcag tttggtccat
gtaatttttg 480aagtgaggtt ccttaaattt cattatgctt cctttctttt ctagactagc
aactgcatga 540cttttcactt tgggttcaca aattgactca caagaaaaca aattcacttt
tgggttcaca 600aattcctctt caggatgtac ttttcacttg aactgtcatg tataggaaca
aggaatggct 660cagtttttaa ggaacaatgt acagatttca tttcagaact ctttctggtt
ggttgagttt 720cagacttttt gtaccaagct gatggatcac aatacttgtt tccaaagtct
gataacagaa 780actggcaact cctaattgat aataaaaaga ataaaataca gtatcagata
tctcattttc 840ttggttggca gatcacaaaa aggaacacaa aggctaagcc tcctacttgt
tcgggagtta 900ggtcagggac accatatgaa tgaaagaaat cttaatttgg ggtcacacca
agattgtctc 960tctcgaggtt ggggggtccc taaggttggt agtagcaata cccaatatat
cacctaacaa 1020acccaatcca tgctacatac atacatagca tccatcactt gtagactgga
cccttcatca 1080agagcaccat gg
10926267DNAZea mays 6ccccatctca ttttcttggt tggcagatca
caaaaaggaa cacaaaggct aagcctccta 60cttgttcggg agttaggtca gggacaccat
atgaatgaaa gaaatcttaa tttggggtca 120caccaagatt gtctctctcg aggttggggg
gtccctaagg ttggtagtag caatacccaa 180tatatcacct aacaaaccca atccatgcta
catacataca tagcatccat cacttgtaga 240ctggaccctt catcaagagc accatgg
26773897DNAZea mays 7gaattccaag
cgaggccctt gtagcagaga gtgttgctga tgcagtcggc ggaaatgagt 60gcgtgctgag
agcaacgctg aggggttcca gggatggcaa tggctatggc aatcggctag 120aggtggagga
caaggtggtg aggattggga gggcaaccta tggcaagttg gtgaagaggc 180acgcaatgag
agatctattc agacttacac tggatgccgc caacaaattc aacctttaga 240ttttgatact
gtcactccta ctttattcct tggttgggca acttccaata ggctcatgtt 300aatcaatgat
tagtgattat tcagcaaata ttcttgtttg tttgacattt ataatatgtg 360gggtgagacg
gattaaatat catccatgag agctttatct tcatgctctc ttgattttgg 420tttcagatca
ttctttcagt gttcacaaga attttctcag tttggtccat gtaatttttg 480aagtgaggtt
ccttaaattt cattatgctt cctttctttt ctagactagc aactgcatga 540cttttcactt
tgggttcaca aattgactca caagaaaaca aattcacttt tgggttcaca 600aattcctctt
caggatgtac ttttcacttg aactgtcatg tataggaaca aggaatggct 660cagtttttaa
ggaacaatgt acagatttca tttcagaact ctttctggtt ggttgagttt 720cagacttttt
gtaccaagct gatggatcac aatacttgtt tccaaagtct gataacagaa 780actggcaact
cctaattgat aataaaaaga ataaaataca gtatcagata tctcattttc 840ttggttggca
gatcacaaaa aggaacacaa aggctaagcc tcctacttgt tcgggagtta 900ggtcagggac
accatatgaa tgaaagaaat cttaatttgg ggtcacacca agattgtctc 960tctcgaggtt
ggggggtccc taaggttggt agtagcaata cccaatatat cacctaacaa 1020acccaatcca
tgctacatac atacatagca tccatcactt gtagactgga cccttcatca 1080agagcaccat
ggaggaagct cacatcacgc cggcgacgcc atcgccattc ttcccactag 1140cagggcctca
caagtacatc gcgctcctcc tggttgtcct ctcatggatc ctggtccaga 1200ggtggagcct
gaggaagcag aaaggcccga gatcatggcc agtcatcggt gcaacggtgg 1260agcagctgag
gaactaccac cggatgcacg actggcttgt cgggtacctg tcacggcaca 1320ggacagtgac
cgtcgacatg ccgttcactt cctacaccta catcgctgac ccggtgaatg 1380tcgagcatgt
cctcaagact aacttcacca attaccccaa ggtaaatgac ctgaactcac 1440tgatgttcag
tcttcggaaa tcagagctga aagctgaatc gaatgtgcct gaacaccgtg 1500tagggaatcg
tgtacagatc ctacatggac gtgctcctcg gtgacggcat cttcaacgcc 1560gacggcgagc
tgtggaggaa gcagaggaag acggcgagtt tcgagttcgc ctccaagaac 1620ctgagggatt
tcagcgccat tgtgttcaga gagtactccc tgaagctgtc gggtatactg 1680agccaggcat
ccaaggcagg caaagttgtg gacatgcagg tgagatcact gctcccttgc 1740cattgccaac
atgagcattt caacctgaga cacgagagct accttgccga ttcaggaact 1800ttacatgagg
atgacgctgg actccatctg caaggttggg ttcggggtcg agatcggcac 1860gctgtcgccg
gatctccccg agaacagctt cgcgcaggcg ttcgatgccg ccaacatcat 1920cgtcacgctg
cggttcatcg acccgctgtg gcgcatcaag aggttcttcc acgtcgggtc 1980agaggccctc
ctagcgcaga gcatcaagct cgtggacgag ttcacctaca gcgtgatccg 2040ccggaggaag
gccgagatcg tcgaggcccg ggccagcggc aaacaggaga aggtacgtgc 2100acatgactgt
ttcgattctt cagttcatcg tcttggccgg gatggacctg atcctgattg 2160attatatatc
cgtgtgactt gtgaggacaa attaaaatgg gcagatgaag cacgacatcc 2220tgtcacggtt
catcgagcta ggcgaggccg gcgacgacgg cggcggcttc ggggacgaca 2280agagcctccg
ggacgtggtg ctcaacttcg tgatcgccgg gcgggacacg acggcgacga 2340cgctgtcgtg
gttcacgcac atggccatgt cccacccgga cgtggccgag aagctgcgcc 2400gcgagctgtg
cgcgttcgag gcggagcgcg cgcgcgagga gggcgtcgcg ctcgtgccct 2460gcggcggcgc
tgacgccgac gacaaggcgt tcgccgcccg cgtggcgcag ttcgcgggcc 2520tcctcaccta
cgacagcctc ggcaagctgg tctacctcca cgcctgcgtc accgagacgc 2580tccgcctgta
ccccgccgtc cctcaggtga gcgcgcccga cacgcgacct ccggtccaga 2640gcacagcatg
cagtgagtgg acctgaatgc aatgcacatg cacttgcgcg cgcgcaggac 2700cccaagggga
tcctggagga cgacgtgctg ccggacggga cgaaggtgag ggccggcggg 2760atggtgacgt
acgtgcccta ctcgatgggg cggatggagt acaactgggg ccccgacgcg 2820gcgagcttcc
ggccggagcg gtggatcaac gaggatggcg cgttccgcaa cgcgtcgccg 2880ttcaagttca
cggcgttcca ggcggggccg aggatctgcc tgggcaagga ctcggcgtac 2940ctgcagatga
agatggcgct ggccatcctc ttgcgcttct acagcttccg gctgctggag 3000gggcacccgg
tgcagtaccg catgatgacc atcctctcca tggcgcacgg cctcaaggtc 3060cgcgtctcta
gggccgtctg atgtcatggc gatttgggat atcatcccgc ttaatcctta 3120aaaatttgca
tgcatgcatg taagggaaag cgatgggttt cattggtggc ttggcttaag 3180ccttaaaaac
tccgtcgggt cttgcgaacc accacatcac tagtgttttg tactctactc 3240ctcagtggaa
gtgtagtgac agcatacaag ttcatcatat atattatcct ctttcttcgc 3300cggatgcttc
ccgggacctt ttggagacca ttactgacag gcgtgtgaaa aaaaggcttc 3360ttctgcggcg
aagttttggg ttcagagtct tggcgtcttt gcagcagaaa aaaggtttgg 3420aaggatctga
accctgaacc gaaaatggct tcggaaatat gctcgcatcg gggcggggcc 3480gtcactcggg
atgacgacaa gcccacaagc agtgagagcg aagcgatctt tggagtttgg 3540agacactctc
ggacccctcg gcgctccgcg agctcatctt cgcctcctct gtcgtgtccg 3600tggcggcacc
gcgcccgccc gcctcgtgtt cgaccaaatc ccgcgccccg accggttcgt 3660gtacaacacc
ctcatccgcg gcgccgcgcg cagtgacacg ccccgggacg ccgtatacat 3720ctataaatca
tggtattgta ctttattttc aaacggcctt aacacaacca tatttttatg 3780gtaaacacgt
tcaaaattga cacaaattta aaacaggcac aaaccgtagc taaacataag 3840agaatgagag
acaacccaaa ggttagagat gaaataagct gagtaaacga cgaattc 38978360DNAZea
mays 8caggacccca aggggatcct ggaggacgac gtgctgccgg acgggacgaa ggtgagggcc
60ggcgggatgg tgacgtacgt gccctactcg atggggcgga tggagtacaa ctggggcccc
120gacgcggcga gcttccggcc ggaggcccgg agcggtggat caacgaggat ggcgcgttcc
180gcaacgcgtc gccgttcaag ttcacggcgt tccaggcggg gccgaggatc tgcctgggca
240aggactcggc gtacctgcag atgaagatgg cgctggccat cctcttgcgc ttctacagct
300tccggctgct ggaggggcac ccggtgcagt accgcatgat gaccatcctc tccatggcgc
3609352DNAZea mays 9caggacccca aggggatcct ggaggacgac gtgctgccgg
acgggacgaa ggtgagggcc 60ggcgggatgg tgacgtacgt gccctactcg atggggcgga
tggagtacaa ctggggcccc 120gacgcggcga gcttccggcc ggagcggtgg atcaacgagg
atggcgcgtt ccgcaacgcg 180tcgccgttca agttcacggc gttccaggcg gggccgagga
tctgcctggg caaggactcg 240gcgtacctgc agatgaagat ggcgctggcc atcctcttcc
gcttctacag cttccggctg 300ctggaggggc acccggtgca gtaccgcatg atgaccatcc
tctccatggc gc 352101440DNAZea mays 10cggagctagg ggtgaaaacg
ggtagggtac ccgaaacggg taccggatac ggatactgat 60tcgggaccat ttttcggata
cggatacggg tattttttag attcgggacg gatacgggta 120atacccggat agtatggctt
cggattcggg tcggatacgg agcgagtact acccggtaaa 180tacccggata ctcgggtcgg
ataccgggta cccggaattc gggtacccgt tttttctttt 240tctgcaaaat aatatagtta
taaaatcata acttttacat atgaaatcgg atgaagataa 300agtttatatg aaaattgtag
agctcgaaga gatctataac tttgtagtac atcacatttt 360tgtttaaaca tatctttagg
ccaaaatcat taaaataatg tctaaattta tatcaaaata 420atagacttta tcattttcat
gtggggactt aagattatat ccatgtggga acttaggatt 480atctttttat aaactattta
ttaatattgg taacttattt gcaattttcg gtcgacgcta 540caatattttt atgaatttaa
ttgtattttg atgattttct acaacaagaa attaataata 600caccaaatag cctaaaaaat
tcatggattt ttacggggac acaacatata tccacatata 660gttctcaaaa acatttggac
tataaaatcc acaagatgtt ggtgtttctt ccattctact 720cccacttatt gcgtgagtta
catgtgaaat cattttatgt atcgaagttt caacataatt 780aatatttcac ttatcatttt
catgtggcga cttgaggttt tatttgaata gaatgtttat 840ttgttttggt aagctttttg
cattttggat caaactagtg tatttatgaa ttttaattat 900actttgatga ttttatgtag
aaagaaatta ataatgtata aatagcctca gaaatctatg 960aaattatacg aaggtacaac
atatggccac atatagtcat aacaaataat gggaccataa 1020aatccacagg atgtcaacgt
ttcttctatt ttatttccac ttattgcgtg agttacacgt 1080gaaatcactc taagtatcca
agtttcaaca taatcaatac ttcactttac catttttacg 1140tgggaacttg agattatctt
ctattaaatg cttattagta ttaatttact tgcaatttcg 1200tggtcgaaca agaatatttt
ttgataacca attaatgcat tatccgacaa gtatccgata 1260tccgatcaaa taatatccgt
atccgtcact tatccgctcg gataaatatc cggtccctgt 1320atccgtatcc gtcccgtttc
taactatccg tatccgatcc cgaatccgtt ttaaatacat 1380tagggtagga tacaggatga
gctaatatcc gtccgtatcc gcccgttttc acccctagcc 1440114182DNAZea mays
11aactgcatga cttttcactt tgggttcaca aattgactca caagaaaaca aattcacttt
60tgggttcaca aattcctctt caggatgtac ttttcacttg aaactgtcat gtataggaac
120aaggaatggc tcagttttta aggaacaatg tacagatttc atttcagaac tctttctggt
180tggttgagtt tcagactttt tgtaccaagc tgatggatca caatacttgt ttccaaagtc
240tgataacaga aactggcaac tcctaattga taataaaaag aataaaatac agtatcagat
300atctcatttt cttggttggc agatcacaaa aaggaacaca aaggctaagc ctcctacttg
360ttcgggagtt aggtcaggga caccatatga atgaaagaaa tcttaatttg gggtcacacc
420aagattgtct ctctcgaggt tggggggtcc ctaaggttgg tagtagcaat acccaatata
480tcacctaaca aacccaatcc atgctacata catacatagc atccatcact tgtagactgg
540acccttcatc aagagcacca tggaggaagc tcacatcacg ccggcgacgc catcgccatt
600cttcccacta gcagggcctc acaagtacat cgcgctcctc ctggttgtcc tctcatggat
660cctggtccag aggtggagcc tgaggaagca gaaaggcccg agatcatggc cagtcatcgg
720tgcaacggtg gagcagctga ggaactacca ccggatgcac gactggcttg tcgggtacct
780gtcgcggcac aggacagtga ccgtcgacat gccgttcact tcctacacct acatcgctga
840cccggtgaat gtcgagcatg tcctcaagac taacttcacc aattacccca aggtaaatga
900cctgaactca ctgatgttca gtcttcggaa atcagagctg aaagctgaat cgaatgtgcc
960tgaacaccgt gtagggaatc gtgtacagat cctacatgga cgtgctcctc ggtgacggca
1020tcttcaacgc cgacggcgag ctgtggagga agcagaggaa gacggcgagt ttcgagttcg
1080cctccaagaa cctgagggat ttcagcgcca ttgtgttcag agagtactcc ctgaagctgt
1140cgggtatact gagccaggca tccaaggcag gcaaagttgt ggacatgcag gtgagatcac
1200tgctcccttg ccattgccaa catgagcatt tcaacctgag acacgagagc taccttgccg
1260attcaggaac tttacatgag gatgacgctg gactccatct gcaaggttgg gttcggggtc
1320gagatcggca cgctgtcgcc ggatctcccc gagaacagct tcgcgcaggc gttcgatgcc
1380gccaacatca tcgtcacgct gcggttcatc gacccgctgt ggcgcatcaa gaggttcttc
1440cacgtcgggt cagaggccct cctagcgcag agcatcaagc tcgtggacga gttcacctac
1500agcgtgatcc gccggaggaa ggccgagatc gtcgaggtcc gggccagcgg caaacaggag
1560aaggtacgtg tacatgactg tttcgattct tcagttcatc gtcttggccg ggatggacct
1620gatcctgatt gattatatat ccgtgtgact tgtgaggaca aattaaaatg ggcagatgaa
1680gcacgacatc ctgtcacggt tcatcgagct aggcgaggcc ggcgacgacg gcggcggctt
1740cggggacgac aagagcctcc gggacgtggt gctcaacttc gtgatcgccg ggcgggacac
1800gacggcgacg acgctgtcgt ggttcacgca catggccatg tcccacccgg acgtggccga
1860gaagctgcgc cgcgagctgt gcgcgttcga ggcggagcgc gcgcgcgagg agggcgtcgc
1920gctcgtgccc tgcggcggcg ctgacgccga cgacaaggcg ttcgccgccc gcgtggcgca
1980gttcgcgggc ctcctcacct acgacagcct cggcaagctg gtctacctcc acgcctgcgt
2040caccgagacg ctccgcctgt accccgccgt ccctcaggtg agcgcgcccg acacgacctc
2100cggtccgcga tgcaacgcat atgtggctgt ccgcagagca cagcatgcag tgagtggacc
2160tgaatgcact atgcaatgca cttgcgcgcg cgcaggaccc caaggggatc ctggaggacg
2220acgtgctgcc ggacgggacg aaggtgaggg ccggcgggat ggtgacgtac gtgccctact
2280cgatggggcg gatggagtac aactggggcc ccgacgcggc gagcttccgg ccggagctag
2340gggtgaaaac gggtagggta cccgaaacgg gtaccggata cggatactga ttcgggacca
2400tttttcggat acggatacgg gtatttttta gattcgggac ggatacgggt aatacccgga
2460tagtatggct tcggattcgg gtcggatacg gagcgagtac tacccggtaa atacccggat
2520actcgggtcg gataccgggt acccggaatt cgggtacccg ttttttcttt ttctgcaaaa
2580taatatagtt ataaaatcat aacttttaca tatgaaatcg gatgaagata aagtttatat
2640gaaaattgta gagctcgaag agatctataa ctttgtagta catcacattt ttgtttaaac
2700atatctttag gccaaaatca ttaaaataat gtctaaattt atatcaaaat aatagacttt
2760atcattttca tgtggggact taagattata tccatgtggg aacttaggat tatcttttta
2820taaactattt attaatattg gtaacttatt tgcaattttc ggtcgacgct acaatatttt
2880tatgaattta attgtatttt gatgattttc tacaacaaga aattaataat acaccaaata
2940gcctaaaaaa ttcatggatt tttacgggga cacaacatat atccacatat agttctcaaa
3000aacatttgga ctataaaatc cacaagatgt tggtgtttct tccattctac tcccacttat
3060tgcgtgagtt acatgtgaaa tcattttatg tatcgaagtt tcaacataat taatatttca
3120cttatcattt tcatgtggcg acttgaggtt ttatttgaat agaatgttta tttgttttgg
3180taagcttttt gcattttgga tcaaactagt gtatttatga attttaatta tactttgatg
3240attttatgta gaaagaaatt aataatgtat aaatagcctc agaaatctat gaaattatac
3300gaaggtacaa catatggcca catatagtca taacaaataa tgggaccata aaatccacag
3360gatgtcaacg tttcttctat tttatttcca cttattgcgt gagttacacg tgaaatcact
3420ctaagtatcc aagtttcaac ataatcaata cttcacttta ccatttttac gtgggaactt
3480gagattatct tctattaaat gcttattagt attaatttac ttgcaatttc gtggtcgaac
3540aagaatattt tttgataacc aattaatgca ttatccgaca agtatccgat atccgatcaa
3600ataatatccg tatccgtcac ttatccgctc ggataaatat ccggtccctg tatccgtatc
3660cgtcccgttt ctaactatcc gtatccgatc ccgaatccgt tttaaataca ttagggtagg
3720atacaggatg agctaatatc cgtccgtatc cgcccgtttt cacccctagc cggagcggtg
3780gatcaacgag gatggcgcgt tccgcaacgc gtcgccgttc aagttcacgg cgttccaggc
3840ggggccgagg atctgcctgg gcaaggactc ggcgtacctg cagatgaaga tggcgctggc
3900catccttctt gcgcttctac agcttccggc tgctggaggg gcacccggtg cagtaccgca
3960tgatgaccat cctctccatg gcgcacggcc tcaaggtccg cgtctctagg gccgtctgat
4020gtcatggcga tttgggatat catcccgctt aatccacgac aaataacgtt cgtgttacaa
4080atttgcatgc atgcatgtaa gggaaagcga tgggtttcat tggtggcttg gcttaagcct
4140taaaaactcc gtcgggttct tgcgaaccac cacatcacta ga
418212505PRTArabidopsis thaliana 12Leu Val Ile Ala Cys Met Val Thr Ser
Trp Ile Phe Leu His Arg Trp 1 5 10
15 Gly Gln Arg Asn Lys Ser Gly Pro Lys Thr Trp Pro Leu Val
Gly Ala 20 25 30
Ala Ile Glu Gln Leu Thr Asn Phe Asp Arg Met His Asp Trp Leu Val
35 40 45 Glu Tyr Leu Tyr Asn
Ser Arg Thr Val Val Val Pro Met Pro Phe Thr 50 55
60 Thr Tyr Thr Tyr Ile Ala Asp Pro Ile Asn
Val Glu Tyr Val Leu Lys 65 70 75
80Thr Asn Phe Ser Asn Tyr Pro Lys Gly Glu Thr Tyr His Ser Tyr
Met 85 90 95 Glu
Val Leu Leu Gly Asp Gly Ile Phe Asn Ser Asp Gly Glu Leu Trp
100 105 110 Arg Lys Gln Arg Lys
Thr Ala Ser Phe Glu Phe Ala Ser Lys Asn Leu 115
120 125 Arg Asp Phe Ser Thr Val Val Phe Lys
Glu Tyr Ser Leu Lys Leu Phe 130 135
140 Thr Ile Leu Ser Gln Ala Ser Phe Lys Glu Gln Gln Val
Asp Met Gln 145 150 155
160Glu Leu Leu Met Arg Met Thr Leu Asp Ser Ile Cys Lys Val Gly Phe
165 170 175 Gly Val Glu Ile
Gly Thr Leu Ala Pro Glu Leu Pro Glu Asn His Phe 180
185 190 Ala Lys Ala Phe Asp Thr Ala Asn Ile
Ile Val Thr Leu Arg Phe Ile 195 200
205 Asp Pro Leu Trp Lys Met Lys Lys Phe Leu Asn Ile Gly Ser
Glu Ala 210 215 220
Leu Leu Gly Lys Ser Ile Lys Val Val Asn Asp Phe Thr Tyr Ser Val 225
230 235 240Ile Arg Arg Arg Lys
Ala Glu Leu Leu Glu Ala Gln Val Lys His Asp 245
250 255 Ile Leu Ser Arg Phe Ile Glu Ile Ser Asp
Asp Pro Asp Ser Lys Glu 260 265
270 Thr Glu Lys Ser Leu Arg Asp Ile Val Leu Asn Phe Val Ile Ala
Gly 275 280 285 Arg
Asp Thr Thr Ala Thr Thr Leu Thr Trp Ala Ile Tyr Met Ile Met 290
295 300 Met Asn Glu Asn Val Ala
Glu Lys Leu Tyr Ser Glu Leu Gln Glu Leu 305 310
315 320Glu Lys Glu Ser Ala Glu Ala Thr Asn Thr Ser
Leu His Gln Tyr Asp 325 330
335 Thr Glu Asp Phe Asn Ser Phe Asn Glu Lys Val Thr Glu Phe Ala Gly
340 345 350 Leu Leu
Asn Tyr Asp Ser Leu Gly Lys Leu His Tyr Leu His Ala Val 355
360 365 Ile Thr Glu Thr Leu Arg Leu
Tyr Pro Ala Val Pro Gln Asp Pro Lys 370 375
380 Gly Val Leu Glu Asp Asp Met Leu Pro Asn Gly Thr
Lys Val Lys Ala 385 390 395
400Gly Gly Met Val Thr Tyr Val Pro Tyr Ser Met Gly Arg Met Glu Tyr
405 410 415 Asn Trp Gly
Ser Asp Ala Ala Leu Phe Lys Pro Glu Arg Trp Leu Lys 420
425 430 Asp Gly Val Phe Gln Asn Ala Ser
Pro Phe Lys Phe Thr Ala Phe Gln 435 440
445 Ala Gly Pro Arg Ile Cys Leu Gly Lys Asp Ser Ala Tyr
Leu Gln Met 450 455 460
Lys Met Ala Met Ala Ile Leu Cys Arg Phe Tyr Lys Phe His Leu Val 465
470 475 480Pro Asn His Pro
Val Lys Tyr Arg Met Met Thr Ile Leu Ser Met Ala 485
490 495 His Gly Leu Lys Val Thr Val Ser Arg
500
50513518PRTZea mays 13Ile Ala Leu Leu Leu Val Val Leu Ser Trp Ile Leu Val
Gln Arg Trp 1 5 10 15
Ser Leu Arg Lys Gln Lys Gly Pro Arg Ser Trp Pro Val Ile Gly Ala
20 25 30 Thr Val Glu Gln Leu
Arg Asn Tyr His Arg Met His Asp Trp Leu Val 35
40 45 Gly Tyr Leu Ser Arg His Arg Thr Val Thr
Val Asp Met Pro Phe Thr 50 55 60
Ser Tyr Thr Tyr Ile Ala Asp Pro Val Asn Val Glu His Val Leu
Lys 65 70 75 80Thr
Asn Phe Thr Asn Tyr Pro Lys Gly Ile Val Tyr Arg Ser Tyr Met
85 90 95 Asp Val Leu Leu Gly Asp
Gly Ile Phe Asn Ala Asp Gly Glu Leu Trp 100
105 110 Arg Lys Gln Arg Lys Thr Ala Ser Phe Glu
Phe Ala Ser Lys Asn Leu 115 120
125 Arg Asp Phe Ser Ala Ile Val Phe Arg Glu Tyr Ser Leu Lys
Leu Ser 130 135 140
Gly Ile Leu Ser Gln Ala Ser Lys Ala Gly Lys Val Val Asp Met Gln 145
150 155 160Glu Leu Tyr Met Arg
Met Thr Leu Asp Ser Ile Cys Lys Val Gly Phe 165
170 175 Gly Val Glu Ile Gly Thr Leu Ser Pro Asp
Leu Pro Glu Asn Ser Phe 180 185
190 Ala Gln Ala Phe Asp Ala Ala Asn Ile Ile Ile Thr Leu Arg Phe
Ile 195 200 205 Asp
Pro Leu Trp Arg Ile Lys Arg Phe Phe His Val Gly Ser Glu Ala 210
215 220 Leu Leu Ala Gln Ser Ile
Lys Leu Val Asp Glu Phe Thr Tyr Ser Val 225 230
235 240Ile Arg Arg Arg Lys Ala Glu Ile Val Glu Val
Arg Ala Ser Gly Lys 245 250
255 Gln Glu Lys Met Lys His Asp Ile Leu Ser Arg Phe Ile Glu Leu Gly
260 265 270 Glu Ala
Gly Asp Asp Gly Gly Gly Phe Gly Asp Asp Lys Ser Leu Arg 275
280 285 Asp Val Val Leu Asn Phe Val
Ile Ala Gly Arg Asp Thr Thr Ala Thr 290 295
300 Thr Leu Ser Trp Phe Thr His Met Ala Met Ser His
Pro Asp Val Ala 305 310 315
320Glu Lys Leu Arg Arg Glu Leu Cys Ala Phe Glu Ala Glu Arg Ala Arg
325 330 335 Glu Glu Gly
Val Thr Leu Val Leu Cys Gly Gly Ala Asp Ala Asp Asp 340
345 350 Lys Ala Phe Ala Ala Arg Val Ala
Gln Phe Ala Gly Leu Leu Thr Tyr 355 360
365 Asp Ser Leu Gly Lys Leu Val Tyr Leu His Ala Cys Val
Thr Glu Thr 370 375 380
Leu Arg Leu Tyr Pro Ala Val Pro Gln Asp Pro Lys Gly Ile Leu Glu 385
390 395 400Asp Asp Val Leu
Pro Asp Gly Thr Lys Val Arg Ala Gly Gly Met Val 405
410 415 Thr Tyr Val Pro Tyr Ser Met Gly Arg
Met Glu Tyr Asn Trp Gly Pro 420 425
430 Asp Ala Ala Ser Phe Arg Pro Glu Arg Trp Ile Asn Glu Asp
Gly Ala 435 440 445
Phe Arg Asn Ala Ser Pro Phe Lys Phe Thr Ala Phe Gln Ala Gly Pro 450
455 460 Arg Ile Cys Leu Gly
Lys Asp Ser Ala Tyr Leu Gln Met Lys Met Ala 465 470
475 480Leu Ala Ile Leu Phe Arg Phe Tyr Ser Phe
Arg Leu Leu Glu Gly His 485 490
495 Pro Val Gln Tyr Arg Met Met Thr Ile Leu Ser Met Ala His Gly
Leu 500 505 510 Lys
Val Arg Val Ser Arg 515
14128PRTZea mays 14Gln Asp Pro Lys Gly Ile Leu Glu Asp Asp Val
Leu Pro Asp Gly Thr 1 5 10
15 Lys Val Arg Ala Gly Gly Met Val Thr Tyr Val Pro Tyr Ser Met Gly
20 25 30 Arg Met Glu
Tyr Asn Trp Gly Pro Asp Ala Ala Ser Phe Arg Pro Glu 35
40 45 Arg Trp Ile Asn Glu Asp Gly Ala
Phe Arg Asn Ala Ser Pro Phe Lys 50 55
60 Phe Thr Ala Phe Gln Ala Gly Pro Arg Ile Cys Leu Gly
Lys Asp Ser 65 70 75
80Ala Tyr Leu Gln Met Lys Met Ala Leu Ala Ile Leu Phe Arg Phe Tyr
85 90 95 Ser Phe Arg Leu
Leu Glu Gly His Pro Val Gln Tyr Arg Met Met Thr 100
105 110 Ile Leu Ser Met Ala His Gly Leu Lys
Val Arg Val Ser Arg Ala Val 115 120
125 15128PRTZea mays 15Gln Asp Pro Lys Gly Ile Leu Glu Asp
Asp Val Leu Pro Asp Gly Thr 1 5 10
15 Lys Val Arg Ala Gly Gly Met Val Thr Tyr Val Pro Tyr Ser
Met Gly 20 25 30
Arg Met Glu Tyr Asn Trp Gly Pro Asp Ala Ala Ser Phe Arg Pro Glu
35 40 45 Arg Trp Ile Asn Glu
Asp Gly Ala Phe Arg Asn Ala Ser Pro Phe Lys 50 55
60 Phe Thr Ala Phe Gln Ala Gly Pro Arg Ile
Cys Leu Gly Lys Asp Ser 65 70 75
80Ala Tyr Leu Gln Met Lys Met Ala Leu Ala Ile Leu Phe Arg Phe
Tyr 85 90 95 Ser
Phe Arg Leu Leu Glu Gly His Pro Val Gln Tyr Arg Met Met Thr
100 105 110 Ile Leu Ser Met Ala
His Gly Leu Lys Val Arg Val Ser Arg Ala Val 115
120 125 1687PRTZea mays 16Gln Asp Pro Lys Gly
Ile Leu Glu Asp Asp Val Leu Pro Asp Gly Thr 1 5
10 15 Lys Val Arg Ala Gly Gly Met Val Thr Tyr
Val Pro Tyr Ser Met Gly 20 25
30 Arg Met Glu Tyr Asn Trp Gly Pro Asp Ala Ala Ser Phe Arg Pro
Glu 35 40 45 Ala
Arg Ser Gly Gly Ser Thr Arg Met Ala Arg Ser Ala Thr Arg Arg 50
55 60 Arg Ser Ser Ser Arg Arg
Ser Arg Arg Gly Arg Gly Ser Ala Trp Ala 65 70
75 80Arg Thr Arg Arg Thr Cys Arg
85 171635DNAOryza sativa
17atgaagagcc ccatggagga agctcatgca atgccagtga catcattctt cccagtagca
60ggaatccaca agctcatagc tatcttcctt gttgtcctct catggatctt ggtccacaag
120tggagcctga ggaaccagaa agggccaaga tcatggccaa tcatcggcgc gacagtggag
180caactgaaga actaccacag gatgcatgac tggcttgtcg agtacttgtc gaaggacagg
240acggtgaccg tcgacatgcc tttcacctcc tacacctaca ttgccgaccc ggtgaacgtc
300gagcatgtcc tgaagaccaa cttcaccaat taccccaagg gtgaagtgta caggtcttac
360atggatgtgc tgctcggtga tggcatattc aatgccgacg gcgagatgtg gaggaagcaa
420aggaagacgg cgagcttcga gtttgcctcc aagaacttga gagacttcag cactgtggtg
480ttcagggagt actccctgaa gctatcaagc attctgagcc aagcatgcaa ggccggcaga
540gttgtagaca tgcaggaatt gttcatgagg atgacactgg actcgatctg caaggtcggg
600tttggggttg agatcgggac gctgtcacct gatctcccgg agaacagctt tgcccaggca
660ttcgacgctg ccaacatcat cgtcacgctg cggttcatcg atcctctgtg gcgtctgaag
720aagttcttgc acgtcggatc agaggctctc ctcgagcaga gcatgaagct ggttgatgac
780ttcacctaca gcgtgatccg ccgccgcaag gctgagatct tgcaggctcg agccagcggc
840aagcaagaga agatcaagca cgacatactg tcgcggttca tcgagctcgg ggaggccggc
900ggcgacgagg ggggcggcag cttcggggac gacaagagcc tccgcgacgt ggtgctcaac
960ttcgtgatcg ccgggcgtga cacgacggcg acgacgctgt cgtggttcac gtacatggcg
1020atgacgcacc cggccgtcgc cgacaagctc cggcgcgagc tggccgcgtt cgaggatgag
1080cgcgcgcgcg aggagggcgt cgcgctcgcc gacgccgccg gcgaggcgtc gttcgcggcg
1140cgcgtggcgc agttcgcgtc gctgctgagc tacgacgcgg tggggaagct ggtgtacctg
1200cacgcgtgcg tgacggagac gctccgcctc tacccggcgg tgccgcagga ccccaagggg
1260atcgtggagg acgacgtgct ccccgacggc accaaggtgc gcgccggcgg gatggtgacg
1320tacgtgccct actccatggg gaggatggag tacaactggg gccccgacgc ggcgagcttc
1380cggccggagc ggtggctcag cggcgacggc ggcgcgttcc ggaacgcgtc gccgttcaag
1440ttcaccgcgt tccaggccgg gccgcggatc tgcctcggca aggactccgc ctacctccag
1500atgaagatgg cgctcgccat cctcttccgc ttctacacct tcgacctcgt cgaggaccac
1560cccgtcaagt accggatgat gaccatcctc tccatggctc acggcctcaa ggtccgcgtc
1620tccacctccg tctga
163518544PRTOryza sativa 18Met Lys Ser Pro Met Glu Glu Ala His Ala Met
Pro Val Thr Ser Phe 1 5 10
15 Phe Pro Val Ala Gly Ile His Lys Leu Ile Ala Ile Phe Leu Val Val
20 25 30 Leu Ser Trp
Ile Leu Val His Lys Trp Ser Leu Arg Asn Gln Lys Gly 35
40 45 Pro Arg Ser Trp Pro Ile Ile Gly
Ala Thr Val Glu Gln Leu Lys Asn 50 55
60 Tyr His Arg Met His Asp Trp Leu Val Glu Tyr Leu Ser
Lys Asp Arg 65 70 75
80Thr Val Thr Val Asp Met Pro Phe Thr Ser Tyr Thr Tyr Ile Ala Asp
85 90 95 Pro Val Asn Val
Glu His Val Leu Lys Thr Asn Phe Thr Asn Tyr Pro 100
105 110 Lys Gly Glu Val Tyr Arg Ser Tyr Met
Asp Val Leu Leu Gly Asp Gly 115 120
125 Ile Phe Asn Ala Asp Gly Glu Met Trp Arg Lys Gln Arg Lys
Thr Ala 130 135 140
Ser Phe Glu Phe Ala Ser Lys Asn Leu Arg Asp Phe Ser Thr Val Val 145
150 155 160Phe Arg Glu Tyr Ser
Leu Lys Leu Ser Ser Ile Leu Ser Gln Ala Cys 165
170 175 Lys Ala Gly Arg Val Val Asp Met Gln Glu
Leu Phe Met Arg Met Thr 180 185
190 Leu Asp Ser Ile Cys Lys Val Gly Phe Gly Val Glu Ile Gly Thr
Leu 195 200 205 Ser
Pro Asp Leu Pro Glu Asn Ser Phe Ala Gln Ala Phe Asp Ala Ala 210
215 220 Asn Ile Ile Val Thr Leu
Arg Phe Ile Asp Pro Leu Trp Arg Leu Lys 225 230
235 240Lys Phe Leu His Val Gly Ser Glu Ala Leu Leu
Glu Gln Ser Met Lys 245 250
255 Leu Val Asp Asp Phe Thr Tyr Ser Val Ile Arg Arg Arg Lys Ala Glu
260 265 270 Ile Leu
Gln Ala Arg Ala Ser Gly Lys Gln Glu Lys Ile Lys His Asp 275
280 285 Ile Leu Ser Arg Phe Ile Glu
Leu Gly Glu Ala Gly Gly Asp Glu Gly 290 295
300 Gly Gly Ser Phe Gly Asp Asp Lys Ser Leu Arg Asp
Val Val Leu Asn 305 310 315
320Phe Val Ile Ala Gly Arg Asp Thr Thr Ala Thr Thr Leu Ser Trp Phe
325 330 335 Thr Tyr Met
Ala Met Thr His Pro Ala Val Ala Asp Lys Leu Arg Arg 340
345 350 Glu Leu Ala Ala Phe Glu Asp Glu
Arg Ala Arg Glu Glu Gly Val Ala 355 360
365 Leu Ala Asp Ala Ala Gly Glu Ala Ser Phe Ala Ala Arg
Val Ala Gln 370 375 380
Phe Ala Ser Leu Leu Ser Tyr Asp Ala Val Gly Lys Leu Val Tyr Leu 385
390 395 400His Ala Cys Val
Thr Glu Thr Leu Arg Leu Tyr Pro Ala Val Pro Gln 405
410 415 Asp Pro Lys Gly Ile Val Glu Asp Asp
Val Leu Pro Asp Gly Thr Lys 420 425
430 Val Arg Ala Gly Gly Met Val Thr Tyr Val Pro Tyr Ser Met
Gly Arg 435 440 445
Met Glu Tyr Asn Trp Gly Pro Asp Ala Ala Ser Phe Arg Pro Glu Arg 450
455 460 Trp Leu Ser Gly Asp
Gly Gly Ala Phe Arg Asn Ala Ser Pro Phe Lys 465 470
475 480Phe Thr Ala Phe Gln Ala Gly Pro Arg Ile
Cys Leu Gly Lys Asp Ser 485 490
495 Ala Tyr Leu Gln Met Lys Met Ala Leu Ala Ile Leu Phe Arg Phe
Tyr 500 505 510 Thr
Phe Asp Leu Val Glu Asp His Pro Val Lys Tyr Arg Met Met Thr 515
520 525 Ile Leu Ser Met Ala His
Gly Leu Lys Val Arg Val Ser Thr Ser Val 530 535
540 19436DNASorghum sp. 19aacgaatgta tcattgtgcc
taaattttta aagaattgtg gacaatttct ggtaggctga 60gtttcagact ttcagtacca
agctgatgga tcacattctg gatccgaagt atgataacat 120aatctggcaa ctcctaattg
taataacaat gaataacctg caaatacagt ataagagtgg 180ctcattttct tggttggcag
atcacaaaaa ggaacacaaa ggctaagcgc caacttgtcc 240gggagttagg tcatggatac
catatgaatg aaagaaatct taatttccgg tcacaccaag 300attgtctctc tcaaggttgg
taacagcaat acccaatata tcacctaaca aacccagaca 360acactacata cataacatcc
atcacttgga gactggaccc ttcatcaaga gcaccatgga 420ggaagctcac ctcatg
43620450DNAOryza sativa
20aagcctggtt tcagttggtg acaatttaac agaattcaga tggatatggt tctgatatta
60gaaggtggca tacctttagt cgctgcaaac gcttcagtta tctgaacaaa acaacgaact
120tggctgagca ggggaaaaaa atactgtagc attcattttg tgtttacatg agtaacgatt
180cttttctagg tggacagatc acaaaaagaa aactaaagct aagatccaac tcctaagggt
240gttaggttag ggacaccata tgaatgagac aatcttaatt cttggtcaca caaagattgt
300ctcaaggttg gtagcatcag tgcccaatat atcacctaac tatgccatcc aaaatgctac
360atagcatctc ttgtagactg aacccttcat gaagagcccc atggaggaag ctcatgcaat
420gccagtgaca tcattcttcc cagtagcagg
45021538PRTZea mays 21Met Glu Glu Ala His Leu Thr Pro Ala Thr Pro Ser Pro
Phe Phe Pro 1 5 10 15
Leu Ala Gly Pro His Lys Tyr Ile Ala Leu Leu Leu Val Val Leu Ser
20 25 30 Trp Ile Leu Val Gln
Arg Trp Ser Leu Arg Lys Gln Lys Gly Pro Arg 35
40 45 Ser Trp Pro Val Ile Gly Ala Thr Val Glu
Gln Leu Arg Asn Tyr His 50 55 60
Arg Met His Asp Trp Leu Val Gly Tyr Leu Ser Arg His Arg Thr
Val 65 70 75 80Thr
Val Asp Met Pro Phe Thr Ser Tyr Thr Tyr Ile Ala Asp Pro Val
85 90 95 Asn Val Glu His Val Leu
Lys Thr Asn Phe Thr Asn Tyr Pro Lys Gly 100
105 110 Ile Val Tyr Arg Ser Tyr Met Asp Val Leu
Leu Gly Asp Gly Ile Phe 115 120
125 Asn Ala Asp Gly Glu Leu Trp Arg Lys Gln Arg Lys Thr Ala
Ser Phe 130 135 140
Glu Phe Ala Ser Lys Asn Leu Arg Asp Phe Ser Ala Ile Val Phe Arg 145
150 155 160Glu Tyr Ser Leu Lys
Leu Ser Gly Ile Leu Ser Gln Ala Ser Lys Ala 165
170 175 Gly Lys Val Val Asp Met Gln Glu Leu Tyr
Met Arg Met Thr Leu Asp 180 185
190 Ser Ile Cys Lys Val Gly Phe Gly Val Glu Ile Gly Thr Leu Ser
Pro 195 200 205 Asp
Leu Pro Glu Asn Ser Phe Ala Gln Ala Phe Asp Ala Ala Asn Ile 210
215 220 Ile Ile Thr Leu Arg Phe
Ile Asp Pro Leu Trp Arg Ile Lys Arg Phe 225 230
235 240Phe His Val Gly Ser Glu Ala Leu Leu Ala Gln
Ser Ile Lys Leu Val 245 250
255 Asp Glu Phe Thr Tyr Ser Val Ile Arg Arg Arg Lys Ala Glu Ile Val
260 265 270 Glu Val
Arg Ala Ser Gly Lys Gln Glu Lys Met Lys His Asp Ile Leu 275
280 285 Ser Arg Phe Ile Glu Leu Gly
Glu Ala Gly Phe Gly Asp Asp Lys Ser 290 295
300 Leu Arg Asp Val Val Leu Asn Phe Val Ile Ala Gly
Arg Asp Thr Thr 305 310 315
320Ala Thr Thr Leu Ser Trp Phe Thr His Met Ala Met Ser His Pro Asp
325 330 335 Val Ala Glu
Lys Leu Arg Arg Glu Leu Cys Ala Phe Glu Ala Glu Arg 340
345 350 Ala Arg Glu Glu Gly Val Thr Leu
Val Leu Cys Gly Gly Ala Asp Ala 355 360
365 Asp Asp Lys Ala Phe Ala Ala Arg Val Ala Gln Phe Ala
Gly Leu Leu 370 375 380
Thr Tyr Asp Ser Leu Gly Lys Leu Val Tyr Leu His Ala Cys Val Thr 385
390 395 400Glu Thr Leu Arg
Leu Tyr Pro Ala Val Pro Gln Asp Pro Lys Gly Ile 405
410 415 Leu Glu Asp Asp Val Leu Pro Asp Gly
Thr Lys Val Arg Ala Gly Gly 420 425
430 Met Val Thr Tyr Val Pro Tyr Ser Met Gly Arg Met Glu Tyr
Asn Trp 435 440 445
Gly Pro Asp Ala Ala Ser Phe Arg Pro Glu Arg Trp Ile Asn Glu Asp 450
455 460 Gly Ala Phe Arg Asn
Ala Ser Pro Phe Lys Phe Thr Ala Phe Gln Ala 465 470
475 480Gly Pro Arg Ile Cys Leu Gly Lys Asp Ser
Ala Tyr Leu Gln Met Lys 485 490
495 Met Ala Leu Ala Ile Leu Phe Arg Phe Tyr Ser Phe Arg Leu Leu
Glu 500 505 510 Gly
His Pro Val Gln Tyr Arg Met Met Thr Ile Leu Ser Met Ala His 515
520 525 Gly Leu Lys Val Arg Val
Ser Arg Ala Val 530 535
22532PRTSorghum sp. 22Met Pro Ala Thr Pro Leu Phe Pro Leu Ala Gly
Leu His Lys Tyr Ile 1 5 10
15 Ala Ile Leu Leu Val Val Leu Ser Trp Ala Leu Val His Arg Trp Ser
20 25 30 Leu Arg Lys
Gln Lys Gly Pro Arg Ser Trp Pro Val Ile Gly Ala Thr 35
40 45 Leu Glu Gln Leu Arg Asn Tyr His
Arg Met His Asp Trp Leu Val Gly 50 55
60 Tyr Leu Ser Arg His Lys Thr Val Thr Val Asp Met Pro
Phe Thr Ser 65 70 75
80Tyr Thr Tyr Ile Ala Asp Pro Val Asn Val Glu His Val Leu Lys Thr
85 90 95 Asn Phe Thr Asn
Tyr Pro Lys Gly Asp Val Tyr Arg Ser Tyr Met Asp 100
105 110 Val Leu Leu Gly Asp Gly Ile Phe Asn
Ala Asp Gly Glu Leu Trp Arg 115 120
125 Lys Gln Arg Lys Thr Ala Ser Phe Glu Phe Ala Ser Lys Asn
Leu Arg 130 135 140
Asp Phe Ser Ala Asn Val Phe Arg Glu Tyr Ser Leu Lys Leu Ser Gly 145
150 155 160Ile Leu Ser Gln Ala
Ser Lys Ala Gly Lys Val Val Asp Met Gln Glu 165
170 175 Leu Tyr Met Arg Met Thr Leu Asp Ser Ile
Cys Lys Val Gly Phe Gly 180 185
190 Val Glu Ile Gly Thr Leu Ser Pro Asp Leu Pro Glu Asn Ser Phe
Ala 195 200 205 Gln
Ala Phe Asp Ala Ala Asn Ile Ile Val Thr Leu Arg Phe Ile Asp 210
215 220 Pro Leu Trp Arg Val Lys
Arg Phe Phe His Val Gly Ser Glu Ala Leu 225 230
235 240Leu Ala Gln Ser Ile Lys Leu Val Asp Glu Phe
Thr Tyr Ser Val Ile 245 250
255 Arg Arg Arg Lys Ala Glu Ile Val Glu Ala Arg Ala Ser Gly Lys Gln
260 265 270 Glu Lys
Met Lys His Asp Ile Leu Ser Arg Phe Ile Glu Leu Gly Glu 275
280 285 Ala Gly Asp Asp Gly Gly Phe
Gly Asp Asp Lys Ser Leu Arg Asp Val 290 295
300 Val Leu Asn Phe Val Ile Ala Gly Arg Asp Thr Thr
Ala Thr Thr Leu 305 310 315
320Ser Trp Phe Thr His Met Ala Met Ser His Pro Asp Val Ala Glu Lys
325 330 335 Leu Arg Arg
Glu Leu Cys Ala Phe Glu Ala Glu Arg Ala Arg Glu Glu 340
345 350 Gly Val Ala Val Pro Cys Cys Gly
Pro Asp Asp Asp Lys Ala Phe Ala 355 360
365 Ala Arg Val Ala Gln Phe Ala Gly Leu Leu Thr Tyr Asp
Ser Leu Gly 370 375 380
Lys Leu Val Tyr Leu His Ala Cys Val Thr Glu Thr Leu Arg Leu Tyr 385
390 395 400Pro Ala Val Pro
Gln Asp Pro Lys Gly Ile Leu Glu Asp Asp Val Leu 405
410 415 Pro Asp Gly Thr Lys Val Arg Ala Gly
Gly Met Val Thr Tyr Val Pro 420 425
430 Tyr Ser Met Gly Arg Met Glu Tyr Asn Trp Gly Pro Asp Ala
Ala Ser 435 440 445
Phe Arg Pro Glu Arg Trp Ile Asn Glu Glu Gly Ala Phe Arg Asn Ala 450
455 460 Ser Pro Phe Lys Phe
Thr Ala Phe Gln Ala Gly Pro Arg Ile Cys Leu 465 470
475 480Gly Lys Asp Ser Ala Tyr Leu Gln Met Lys
Met Ala Leu Ala Ile Leu 485 490
495 Phe Arg Phe Tyr Ser Phe Gln Leu Leu Glu Gly His Pro Val Gln
Tyr 500 505 510 Arg
Met Met Thr Ile Leu Ser Met Ala His Gly Leu Lys Val Arg Val 515
520 525 Ser Arg Ala Val
530 2315DNAArtificial
SequenceDescription of Artificial Sequence Synthetic terminal
inverted repeat sequence 23taggggtgaa aacgg
152410PRTZea maysMOD_RES(2)..(3)Any amino acid
24Phe Xaa Xaa Gly Xaa Arg Xaa Cys Xaa Gly 1
5 102510PRTZea mays 25Phe Gln Ala Gly Pro Arg
Ile Cys Leu Gly 1 5
10266PRTZea mays 26Ala Gly Arg Asp Thr Thr
1 5 2713PRTZea mays 27Leu Val Tyr Leu His
Ala Cys Val Thr Glu Thr Leu Arg 1 5
10 2842PRTZea mays 28Cys His Gly Asp Leu Asp Met Asp
Ile Val Pro Leu Asn Pro Arg Gln 1 5 10
15 Ile Thr Leu Val Leu Gln Ile Cys Met His Ala Cys Lys
Gly Lys Arg 20 25 30
Trp Val Ser Leu Val Ala Trp Leu Lys Pro
35 40 2928PRTZea mays 29Lys Leu Arg Arg Val Leu
Arg Thr Thr Thr Ser Leu Val Phe Cys Thr 1 5
10 15 Leu Leu Leu Ser Gly Ser Val Val Thr Ala Tyr
Lys 20 25
3042PRTZea mays 30Cys His Gly Asp Leu Asp Met Asp Ile Val Pro Leu Asn Pro
Arg Gln 1 5 10 15
Ile Thr Leu Val Leu Gln Ile Cys Met His Ala Cys Lys Gly Lys Arg
20 25 30 Trp Val Ser Leu Val
Ala Trp Leu Lys Pro 35
40 3114PRTZea mays 31Lys Leu Arg Arg Val Leu Arg Thr Thr Thr Ser
Leu Val Phe 1 5 10
3224PRTZea mays 32Arg Trp Arg Trp Pro Ser Ser Cys Ala Ser Thr Ala Ser
Gly Cys Trp 1 5 10 15
Arg Gly Thr Arg Cys Ser Thr Ala
20 3311PRTZea mays 33Pro Ser Ser Pro Trp Arg Thr Lys
Gly Glu Phe 1 5 10
3442PRTZea mays 34Cys His Gly Asp Leu Asp Met Asp Ile Val Pro Leu Asn
Pro Arg Gln 1 5 10 15
Ile Thr Leu Val Leu Gln Ile Cys Met His Ala Cys Lys Gly Lys Arg
20 25 30 Trp Val Ser Leu Val
Ala Trp Leu Lys Pro 35
40 35548PRTArtificial SequenceDescription of Artificial Sequence
Synthetic consensus sequence 35Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa
Xaa Met Pro Xaa Thr Pro Phe 1 5 10
15 Phe Pro Leu Ala Gly Ile His Lys Tyr Ile Ala Ile Leu Leu
Val Val 20 25 30
Leu Ser Trp Ile Leu Val His Arg Trp Ser Leu Arg Lys Gln Lys Gly
35 40 45 Pro Arg Ser Trp Pro
Val Ile Gly Ala Thr Val Glu Gln Leu Arg Asn 50 55
60 Tyr His Arg Met His Asp Trp Leu Val Gly
Tyr Leu Ser Arg His Arg 65 70 75
80Thr Val Thr Val Asp Met Pro Phe Thr Ser Tyr Thr Tyr Ile Ala
Asp 85 90 95 Pro
Val Asn Val Glu His Val Leu Lys Thr Asn Phe Thr Asn Tyr Pro
100 105 110 Lys Gly Asp Val Tyr
Arg Ser Tyr Met Asp Val Leu Leu Gly Asp Gly 115
120 125 Ile Phe Asn Ala Asp Gly Glu Leu Trp
Arg Lys Gln Arg Lys Thr Ala 130 135
140 Ser Phe Glu Phe Ala Ser Lys Asn Leu Arg Asp Phe Ser
Ala Ile Val 145 150 155
160Phe Arg Glu Tyr Ser Leu Lys Leu Ser Gly Ile Leu Ser Gln Ala Ser
165 170 175 Lys Ala Gly Lys
Val Val Asp Met Gln Glu Leu Tyr Met Arg Met Thr 180
185 190 Leu Asp Ser Ile Cys Lys Val Gly Phe
Gly Val Glu Ile Gly Thr Leu 195 200
205 Ser Pro Asp Leu Pro Glu Asn Ser Phe Ala Gln Ala Phe Asp
Ala Ala 210 215 220
Asn Ile Ile Val Thr Leu Arg Phe Ile Asp Pro Leu Trp Arg Ile Lys 225
230 235 240Arg Phe Phe His Val
Gly Ser Glu Ala Leu Leu Ala Gln Ser Ile Lys 245
250 255 Leu Val Asp Glu Phe Thr Tyr Ser Val Ile
Arg Arg Arg Lys Ala Glu 260 265
270 Ile Val Glu Ala Arg Ala Ser Gly Lys Gln Glu Lys Met Lys His
Asp 275 280 285 Ile
Leu Ser Arg Phe Ile Glu Leu Gly Glu Ala Gly Asp Asp Gly Gly 290
295 300 Gly Xaa Xaa Phe Gly Asp
Asp Lys Ser Leu Arg Asp Val Val Leu Asn 305 310
315 320Phe Val Ile Ala Gly Arg Asp Thr Thr Ala Thr
Thr Leu Ser Trp Phe 325 330
335 Thr His Met Ala Met Ser His Pro Asp Val Ala Glu Lys Leu Arg Arg
340 345 350 Glu Leu
Cys Ala Phe Glu Ala Glu Arg Ala Arg Glu Glu Gly Val Ala 355
360 365 Leu Xaa Xaa Cys Gly Xaa Xaa
Xaa Xaa Asp Asp Lys Ala Phe Ala Ala 370 375
380 Arg Val Ala Gln Phe Ala Gly Leu Leu Thr Tyr Asp
Ser Leu Gly Lys 385 390 395
400Leu Val Tyr Leu His Ala Cys Val Thr Glu Thr Leu Arg Leu Tyr Pro
405 410 415 Ala Val Pro
Gln Asp Pro Lys Gly Ile Leu Glu Asp Asp Val Leu Pro 420
425 430 Asp Gly Thr Lys Val Arg Ala Gly
Gly Met Val Thr Tyr Val Pro Tyr 435 440
445 Ser Met Gly Arg Met Glu Tyr Asn Trp Gly Pro Asp Ala
Ala Ser Phe 450 455 460
Arg Pro Glu Arg Trp Ile Asn Glu Asp Gly Xaa Ala Phe Arg Asn Ala 465
470 475 480Ser Pro Phe Lys
Phe Thr Ala Phe Gln Ala Gly Pro Arg Ile Cys Leu 485
490 495 Gly Lys Asp Ser Ala Tyr Leu Gln Met
Lys Met Ala Leu Ala Ile Leu 500 505
510 Phe Arg Phe Tyr Ser Phe Xaa Leu Leu Glu Gly His Pro Val
Gln Tyr 515 520 525
Arg Met Met Thr Ile Leu Ser Met Ala His Gly Leu Lys Val Arg Val 530
535 540 Ser Arg Ala Val
545
User Contributions:
Comment about this patent or add new information about this topic: