Patent application title: Gene Knockout Mesophilic and Thermophilic Organisms, and Methods of Use Thereof
Inventors:
David Anthony Hogsett (Grantham, NH, US)
Vineet Badriphrajad Rajgarhia (Lebanon, NH, US)
Assignees:
Mascoma Corporation
IPC8 Class: AC12P710FI
USPC Class:
435165
Class name: Ethanol produced as by-product, or from waste, or from cellulosic material substrate substrate contains cellulosic material
Publication date: 2015-01-22
Patent application number: 20150024450
Abstract:
One aspect of the invention relates to a genetically modified
thermophilic or mesophilic microorganism, wherein a first native gene is
partially, substantially, or completely deleted, silenced, inactivated,
or down-regulated, which first native gene encodes a first native enzyme
involved in the metabolic production of an organic acid or a salt
thereof, thereby increasing the native ability of said thermophilic or
mesophilic microorganism to produce ethanol as a fermentation product. In
certain embodiments, the aforementioned microorganism further comprises a
first non-native gene, which first non-native gene encodes a first
non-native enzyme involved in the metabolic production of ethanol.
Another aspect of the invention relates to a process for converting
lignocellulosic biomass to ethanol, comprising contacting lignocellulosic
biomass with a genetically modified thermophilic or mesophilic
microorganism.Claims:
1. An isolated nucleic acid molecule comprising the nucleotide sequence
of SEQ ID NO:2, or a full complement thereof.
2. An isolated nucleic acid molecule comprising a nucleotide sequence which shares at least 98% identity to a nucleotide sequence of SEQ ID NO:2, or a full complement thereof; wherein the nucleotide sequence, when transformed in a whole cell, aids in the process of converting biomass to ethanol.
3. A genetic construct comprising SEQ ID NO:2 operably linked to a promoter expressible in a thermophilic or mesophilic bacterium.
4. A recombinant thermophilic or mesophilic bacterium comprising the genetic construct of claim 3.
5. A vector comprising the nucleic acid molecule of claim 1.
6. A host cell comprising the nucleic acid molecule of claim 1.
7. A vector comprising the nucleic acid molecule of claim 2.
8. A host cell comprising the nucleic acid molecule of claim 2.
9. A genetically modified thermophilic or mesophilic microorganism, wherein the genetically modified microorganism has been transformed by a nucleotide sequence of SEQ ID NO:2; thereby partially, substantially, or completely deleting, silencing, inactivating, or down-regulating a gene that encodes acetate kinase, and increasing the native ability of said thermophilic or mesophilic microorganism to produce ethanol as a fermentation product.
10. The genetically modified microorganism according to claim 9, wherein said microorganism is a species of the genera Thermoanaerobacterium, Thermoanaerobacter, Clostridium, Geobacillus, Saccharococcus, Paenibacillus, Bacillus, Caldicellulosiruptor, Anaerocellum, or Anoxybacillus.
11. The genetically modified microorganism according to claim 9, wherein said microorganism is a bacterium selected from the group consisting of: Thermoanaerobacterium thermosulfurigenes, Thermoanaerobacterium aotearoense, Thermoanaerobacterium polysaccharolyticum, Thermoanaerobacterium zeae, Thermoanaerobacterium xylanolyticum, Thermoanaerobacterium saccharolyticum, Thermoanaerobium brockii, Thermoanaerobacterium thermosaccharolyticum, Thermoanaerobacter thermohydrosulfuricus, Thermoanaerobacter ethanolicus, Thermoanaerobacter brocki, Clostridium thermocellum, Clostridium cellulolyticum, Clostridium phytofermentans, Clostridium straminosolvens, Geobacillus thermoglucosidasius, Geobacillus stearothermophilus, Saccharococcus caldoxylosilyticus, Saccharoccus thermophilus, Paenibacillus campinasensis, Bacillus flavothermus, Anoxybacillus kamchatkensis, Anoxybacillus gonensis, Caldicellulosiruptor acetigenus, Caldicellulosiruptor saccharolyticus, Caldicellulosiruptor kristjanssonii, Caldicellulosiruptor owensensis, Caldicellulosiruptor lactoaceticus, and Anaerocellum thermophilum.
12. The genetically modified microorganism according to claim 9, wherein said microorganism is selected from the group consisting of: (a) a thermophilic or mesophilic microorganism with a native ability to metabolize a hexose sugar; (b) a thermophilic or mesophilic microorganism with a native ability to metabolize a pentose sugar; (c) a thermophilic or mesophilic microorganism with a native ability to metabolize a hexose sugar and a pentose sugar; (d) a thermophilic or mesophilic microorganism with a native ability to hydrolyze cellulose; (e) a thermophilic or mesophilic microorganism with a native ability to hydrolyze xylan; and (f) a thermophilic or mesophilic microorganism with a native ability to hydrolyze cellulose and xylan.
13. The genetically modified microorganism according to claim 9, wherein said microorganism has a native ability to metabolize a hexose sugar; and a first non-native gene is inserted, which first non-native gene encodes a first non-native enzyme that confers the ability to metabolize a pentose sugar, thereby allowing said thermophilic or mesophilic microorganism to produce ethanol as a fermentation product from a pentose sugar.
14. The genetically modified microorganism according to claim 9, wherein said microorganism has a native ability to metabolize a pentose sugar; and a first non-native gene is inserted, which first non-native gene encodes a first non-native enzyme that confers the ability to metabolize a hexose sugar, thereby allowing said thermophilic or mesophilic microorganism to produce ethanol as a fermentation product from a hexose sugar.
15. The genetically modified microorganism according to claim 9, wherein a second native gene is partially, substantially, or completely deleted, silenced, inactivated, or down-regulated, which second native gene encodes a second native enzyme involved in the metabolic production of an organic acid or a salt thereof.
16. The genetically modified microorganism according to claim 15, wherein said second native enzyme is lactate dehydrogenase or phosphotransacetylase.
17. The genetically modified microorganism according to claim 9, wherein said microorganism has a native ability to hydrolyze cellulose; and a first non-native gene is inserted, which first non-native gene encodes a first non-native enzyme that confers the ability to hydrolyze xylan.
18. The genetically modified microorganism according to claim 9, wherein said microorganism has a native ability to hydrolyze xylan; and a first non-native gene is inserted, which first non-native gene encodes a first non-native enzyme that confers the ability to hydrolyze cellulose.
19. The genetically-modified microorganism according to claim 9, wherein said microorganism is mesophilic.
20. The genetically-modified microorganism according to claim 9, wherein said microorganism is thermophilic.
21. A process for converting lignocellulosic biomass to ethanol, comprising contacting lignocellulosic biomass with a genetically modified thermophilic or mesophilic microorganism according to claim 9.
Description:
RELATED APPLICATIONS
[0001] This application is a continuation of U.S. patent application Ser. No. 12/599,458, filed Jun. 30, 2010, now U.S. Pat. No. 8,435,770, issued May 7, 2013, which is the National Stage of Patent Cooperation Treaty Application serial number PCT/US2008/063237, filed May 9, 2008, which claims the benefit of U.S. Provisional Patent Application Ser. No. 60/916,978, filed May 9, 2007; the entire contents of all of which are incorporated herein by reference.
BACKGROUND OF THE INVENTION
[0002] Energy conversion, utilization and access underlie many of the great challenges of our time, including those associated with sustainability, environmental quality, security, and poverty. New applications of emerging technologies are required to respond to these challenges. Biotechnology, one of the most powerful of the emerging technologies, can give rise to important new energy conversion processes. Plant biomass and derivatives thereof are a resource for the biological conversion of energy to forms useful to humanity.
[0003] Among forms of plant biomass, lignocellulosic biomass (biomass') is particularly well-suited for energy applications because of its large-scale availability, low cost, and environmentally benign production. In particular, many energy production and utilization cycles based on cellulosic biomass have near-zero greenhouse gas emissions on a life-cycle basis. The primary obstacle impeding the more widespread production of energy from biomass feedstocks is the general absence of low-cost technology for overcoming the recalcitrance of these materials to conversion into useful fuels. Lignocellulosic biomass contains carbohydrate fractions (e.g., cellulose and hemicellulose) that can be converted into ethanol. In order to convert these fractions, the cellulose and hemicellulose must ultimately be converted or hydrolyzed into monosaccharides; it is the hydrolysis that has historically proven to be problematic.
[0004] Biologically mediated processes are promising for energy conversion, in particular for the conversion of lignocellulosic biomass into fuels. Biomass processing schemes involving enzymatic or microbial hydrolysis commonly involve four biologically mediated transformations: (1) the production of saccharolytic enzymes (cellulases and hemicellulases); (2) the hydrolysis of carbohydrate components present in pretreated biomass to sugars; (3) the fermentation of hexose sugars (e.g., glucose, mannose, and galactose); and (4) the fermentation of pentose sugars (e.g., xylose and arabinose). These four transformations occur in a single step in a process configuration called consolidated bioprocessing (CBP), which is distinguished from other less highly integrated configurations in that it does not involve a dedicated process step for cellulase and/or hemicellulase production.
[0005] CBP offers the potential for lower cost and higher efficiency than processes featuring dedicated cellulase production. The benefits result in part from avoided capital costs, substrate and other raw materials, and utilities associated with cellulase production. In addition, several factors support the realization of higher rates of hydrolysis, and hence reduced reactor volume and capital investment using CBP, including enzyme-microbe synergy and the use of thermophilic organisms and/or complexed cellulase systems. Moreover, cellulose-adherent cellulolytic microorganisms are likely to compete successfully for products of cellulose hydrolysis with non-adhered microbes, e.g., contaminants, which could increase the stability of industrial processes based on microbial cellulose utilization. Progress in developing CBP-enabling microorganisms is being made through two strategies: engineering naturally occurring cellulolytic microorganisms to improve product-related properties, such as yield and titer; and engineering non-cellulolytic organisms that exhibit high product yields and titers to express a heterologous cellulase and hemicellulase system enabling cellulose and hemicellulose utilization.
[0006] Many bacteria have the ability to ferment simple hexose sugars into a mixture of acidic and pH-neutral products via the process of glycolysis. The glycolytic pathway is abundant and comprises a series of enzymatic steps whereby a six carbon glucose molecule is broken down, via multiple intermediates, into two molecules of the three carbon compound pyruvate. This process results in the net generation of ATP (biological energy supply) and the reduced cofactor NADH.
[0007] Pyruvate is an important intermediary compound of metabolism. For example, under aerobic conditions pyruvate may be oxidized to acetyl coenzyme A (acetyl CoA), which then enters the tricarboxylic acid cycle (TCA), which in turn generates synthetic precursors, CO2 and reduced cofactors. The cofactors are then oxidized by donating hydrogen equivalents, via a series of enzymatic steps, to oxygen resulting in the formation of water and ATP. This process of energy formation is known as oxidative phosphorylation.
[0008] Under anaerobic conditions (no available oxygen), fermentation occurs in which the degradation products of organic compounds serve as hydrogen donors and acceptors. Excess NADH from glycolysis is oxidized in reactions involving the reduction of organic substrates to products, such as lactate and ethanol. In addition, ATP is regenerated from the production of organic acids, such as acetate, in a process known as substrate level phosphorylation. Therefore, the fermentation products of glycolysis and pyruvate metabolism include a variety of organic acids, alcohols and CO2.
[0009] The majority of facultative anaerobic bacteria do not produce high yields of ethanol under either aerobic or anaerobic conditions. Most facultative anaerobes metabolize pyruvate aerobically via pyruvate dehydrogenase (PDH) and the tricarboxylic acid cycle (TCA). Under anaerobic conditions, the main energy pathway for the metabolism of pyruvate is via pyruvate-formate-lyase (PFL) pathway to give formate and acetyl-CoA. Acetyl-CoA is then converted to acetate, via phosphotransacetylase (PTA) and acetate kinase (ACK) with the co-production of ATP, or reduced to ethanol via acetalaldehyde dehydrogenase (AcDH) and alcohol dehydrogenase (ADH). In order to maintain a balance of reducing equivalents, excess NADH produced from glycolysis is re-oxidized to NAD.sup.+ by lactate dehydrogenase (LDH) during the reduction of pyruvate to lactate. NADH can also be re-oxidized by AcDH and ADH during the reduction of acetyl-CoA to ethanol, but this is a minor reaction in cells with a functional LDH. Theoretical yields of ethanol are therefore not achieved since most acetyl CoA is converted to acetate to regenerate ATP and excess NADH produced during glycolysis is oxidized by LDH.
[0010] Metabolic engineering of microorganisms could also result in the creation of a targeted knockout of the genes encoding for the production of enzymes, such as lactate dehydrogenase. In this case, "knock out" of the genes means partial, substantial, or complete deletion, silencing, inactivation, or down-regulation. If the conversion of pyruvate to lactate (the salt form of lactic acid) by the action of LDH was not available in the early stages of the glycolytic pathway, then the pyruvate could be more efficiently converted to acetyl CoA by the action of pyruvate dehydrogenase or pyruvate-ferredoxin oxidoreductase. If the further conversion of acetyl CoA to acetate (the salt form of acetic acid) by phosphotransacetylase and acetate kinase was also not available, i.e., if the genes encoding for the production of PTA and ACK were knocked out, then the acetyl CoA could be more efficiently converted to ethanol by AcDH and ADH. Accordingly, a genetically modified strain of microorganism with such targeted gene knockouts, which eliminates the production of organic acids, would have an increased ability to produce ethanol as a fermentation product.
[0011] Ethanologenic organisms, such as Zymomonas mobilis, Zymobacter palmae, Acetobacter pasteurianus, or Sarcina ventriculi, and some yeasts (e.g., Saccharomyces cerevisiae), are capable of a second type of anaerobic fermentation, commonly referred to as alcoholic fermentation, in which pyruvate is metabolized to acetaldehyde and CO2 by pyruvate decarboxylase (PDC). Acetaldehyde is then reduced to ethanol by ADH regenerating NAD.sup.+. Alcoholic fermentation results in the metabolism of one molecule of glucose to two molecules of ethanol and two molecules of CO2. If the conversion of pyruvate to undesired organic acids could be avoided, as detailed above, then such a genetically modified microorganism would have an increased ability to produce ethanol as a fermentation product.
SUMMARY OF THE INVENTION
[0012] One aspect of the invention relates to an isolated nucleic acid molecule comprising the nucleotide sequence of any one of SEQ ID NOS:1-5, 30-31, and 47-61, or a complement thereof. Another aspect of the invention relates to an isolated nucleic acid molecule comprising a nucleotide sequence which shares at least 80% identity to a nucleotide sequence of any one of SEQ ID NOS:1-5, 30-31, and 47-61, or a complement thereof. In certain embodiments, the invention relates to the aforementioned nucleic acid molecule which shares at least about 95% sequence identity to the nucleotide sequence of any one of SEQ ID NOS:1-5, 30-31, and 47-61, or a complement thereof.
[0013] Another aspect of the present invention relates to a genetic construct comprising any one of SEQ ID NOS:1-5, 30-31, and 47-61 operably linked to a promoter expressible in a thermophilic or mesophilic bacterium. The present invention also relates to a recombinant thermophilic or mesophilic bacterium comprising the aforementioned genetic construct.
[0014] The present invention also encompasses a vector comprising any one of the aforementioned nucleic acid molecules. The present invention also encompasses a host cell comprising any one of the aforementioned nucleic acid molecules. In certain embodiments, the invention relates to the aforementioned host cell, wherein said host cell is a thermophilic or mesophilic bacterial cell.
[0015] Another aspect of the invention relates to a genetically modified thermophilic or mesophilic microorganism, wherein a first native gene is partially, substantially, or completely deleted, silenced, inactivated, or down-regulated, which first native gene encodes a first native enzyme involved in the metabolic production of an organic acid or a salt thereof, thereby increasing the native ability of said thermophilic or mesophilic microorganism to produce ethanol as a fermentation product. In certain embodiments, the present invention relates to the aforementioned genetically modified microorganism, wherein said microorganism is a Gram-negative bacterium or a Gram-positive bacterium. In certain embodiments, the present invention relates to the aforementioned genetically modified microorganism, wherein said microorganism is a species of the genera Thermoanaerobacterium, Thermoanaerobacter, Clostridium, Geobacillus, Saccharococcus, Paenibacillus, Bacillus, Caldicellulosiruptor, Anaerocellum, or Anoxybacillus. In certain embodiments, the present invention relates to the aforementioned genetically modified microorganism, wherein said microorganism is a bacterium selected from the group consisting of: Thermoanaerobacterium thermosulfurigenes, Thermoanaerobacterium aotearoense, Thermoanaerobacterium polysaccharolyticum, Thermoanaerobacterium zeae, Thermoanaerobacterium xylanolyticum, Thermoanaerobacterium saccharolyticum, Thermoanaerobium brockii, Thermoanaerobacterium thermosaccharolyticum, Thermoanaerobacter thermohydrosulfuricus, Thermoanaerobacter ethanolicus, Thermoanaerobacter brocki, Clostridium thermocellum, Clostridium cellulolyticum, Clostridium phytofermentans, Clostridium straminosolvens, Geobacillus thermoglucosidasius, Geobacillus stearothermophilus, Saccharococcus caldoxylosilyticus, Saccharoccus thermophilus, Paenibacillus campinasensis, Bacillus flavothermus, Anoxybacillus kamchatkensis, Anoxybacillus gonensis, Caldicellulosiruptor acetigenus, Caldicellulosiruptor saccharolyticus, Caldicellulosiruptor kristjanssonii, Caldicellulosiruptor owensensis, Caldicellulosiruptor lactoaceticus, and Anaerocellum thermophilum. In certain embodiments, the present invention relates to the aforementioned genetically modified microorganism, wherein said microorganism is Thermoanaerobacterium saccharolyticum. In certain embodiments, the present invention relates to the aforementioned genetically modified microorganism, wherein said microorganism is selected from the group consisting of: (a) a thermophilic or mesophilic microorganism with a native ability to metabolize a hexose sugar; (b) a thermophilic or mesophilic microorganism with a native ability to metabolize a pentose sugar; and (c) a thermophilic or mesophilic microorganism with a native ability to metabolize a hexose sugar and a pentose sugar. In certain embodiments, the present invention relates to the aforementioned genetically modified microorganism, wherein said microorganism has a native ability to metabolize a hexose sugar. In certain embodiments, the present invention relates to the aforementioned genetically modified microorganism, wherein said microorganism is Clostridium straminisolvens or Clostridium thermocellum. In certain embodiments, the present invention relates to the aforementioned genetically modified microorganism, wherein said microorganism has a native ability to metabolize a hexose sugar and a pentose sugar. In certain embodiments, the present invention relates to the aforementioned genetically modified microorganism, wherein said microorganism is Clostridium cellulolyticum, Clostridium kristjanssonii, or Clostridium stercorarium subsp. leptosaprartum. In certain embodiments, the present invention relates to the aforementioned genetically modified microorganism, wherein a first non-native gene is inserted, which first non-native gene encodes a first non-native enzyme that confers the ability to metabolize a pentose sugar, thereby allowing said thermophilic or mesophilic microorganism to produce ethanol as a fermentation product from a pentose sugar. In certain embodiments, the present invention relates to the aforementioned genetically modified microorganism, wherein said microorganism has a native ability to metabolize a pentose sugar. In certain embodiments, the present invention relates to the aforementioned genetically modified microorganism, wherein said microorganism is selected from the group consisting of Thermoanaerobacterium saccharolyticum, Thermoanaerobacterium xylanolyticum, Thermoanaerobacterium polysaccharolyticum, and Thermoanaerobacterium thermosaccharolyticum. In certain embodiments, the present invention relates to the aforementioned genetically modified microorganism, wherein a first non-native gene is inserted, which first non-native gene encodes a first non-native enzyme that confers the ability to metabolize a hexose sugar, thereby allowing said thermophilic or mesophilic microorganism to produce ethanol as a fermentation product from a hexose sugar. In certain embodiments, the present invention relates to the aforementioned genetically modified microorganism, wherein said organic acid is selected from the group consisting of lactic acid and acetic acid. In certain embodiments, the present invention relates to the aforementioned genetically modified microorganism, wherein said organic acid is lactic acid. In certain embodiments, the present invention relates to the aforementioned genetically modified microorganism, wherein said organic acid is acetic acid. In certain embodiments, the present invention relates to the aforementioned genetically modified microorganism, wherein said first native enzyme is selected from the group consisting of lactate dehydrogenase, acetate kinase, and phosphotransacetylase. In certain embodiments, the present invention relates to the aforementioned genetically modified microorganism, wherein said first native enzyme is lactate dehydrogenase. In certain embodiments, the present invention relates to the aforementioned genetically modified microorganism, wherein said first native enzyme is acetate kinase. In certain embodiments, the present invention relates to the aforementioned genetically modified microorganism, wherein said first native enzyme is phosphotransacetylase. In certain embodiments, the present invention relates to the aforementioned genetically modified microorganism, wherein a second native gene is partially, substantially, or completely deleted, silenced, inactivated, or down-regulated, which second native gene encodes a second native enzyme involved in the metabolic production of an organic acid or a salt thereof. In certain embodiments, the present invention relates to the aforementioned genetically modified microorganism, wherein said second native enzyme is acetate kinase or phosphotransacetylase. In certain embodiments, the present invention relates to the aforementioned genetically modified microorganism, wherein said second native enzyme is lactate dehydrogenase.
[0016] Yet another aspect of the invention relates to a genetically modified thermophilic or mesophilic microorganism, wherein (a) a first native gene is partially, substantially, or completely deleted, silenced, inactivated, or down-regulated, which first native gene encodes a first native enzyme involved in the metabolic production of an organic acid or a salt thereof, and (b) a first non-native gene is inserted, which first non-native gene encodes a first non-native enzyme involved in the metabolic production of ethanol, thereby allowing said thermophilic or mesophilic microorganism to produce ethanol as a fermentation product. In certain embodiments, the present invention relates to the aforementioned genetically modified microorganism, wherein said first non-native gene encodes a first non-native enzyme that confers the ability to metabolize a hexose sugar, thereby allowing said thermophilic or mesophilic microorganism to metabolize a hexose sugar. In certain embodiments, the present invention relates to the aforementioned genetically modified microorganism, wherein said first non-native gene encodes a first non-native enzyme that confers the ability to metabolize a pentose sugar, thereby allowing said thermophilic or mesophilic microorganism to metabolize a pentose sugar. In certain embodiments, the present invention relates to the aforementioned genetically modified microorganism, wherein said first non-native gene encodes a first non-native enzyme that confers the ability to metabolize a hexose sugar; and a second non-native gene is inserted, which second non-native gene encodes a second non-native enzyme that confers the ability to metabolize a pentose sugar, thereby allowing said thermophilic or mesophilic microorganism to metabolize a hexose sugar and a pentose sugar. In certain embodiments, the present invention relates to the aforementioned genetically modified microorganism, wherein said organic acid is lactic acid. In certain embodiments, the present invention relates to the aforementioned genetically modified microorganism, wherein said organic acid is acetic acid. In certain embodiments, the present invention relates to the aforementioned genetically modified microorganism, wherein said first non-native enzyme is pyruvate decarboxylase (PDC) or alcohol dehydrogenase (ADH). In certain embodiments, the present invention relates to the aforementioned genetically modified microorganism, wherein said second non-native enzyme is xylose isomerase. In certain embodiments, the present invention relates to the aforementioned genetically modified microorganism, wherein said first non-native gene corresponds to SEQ ID NOS:6, 10, or 14. In certain embodiments, the present invention relates to the aforementioned genetically modified microorganism, wherein said non-native enzyme is xylulokinase. In certain embodiments, the present invention relates to the aforementioned genetically modified microorganism, wherein said non-native gene corresponds to SEQ ID NOS:7, 11, or 15. In certain embodiments, the present invention relates to the aforementioned genetically modified microorganism, wherein said non-native enzyme is L-arabinose isomerase. In certain embodiments, the present invention relates to the aforementioned genetically modified microorganism, wherein said non-native gene corresponds to SEQ ID NOS:8 or 12. In certain embodiments, the present invention relates to the aforementioned genetically modified microorganism, wherein said non-native enzyme is L-ribulose-5-phosphate 4-epimerase. In certain embodiments, the present invention relates to the aforementioned genetically modified microorganism, wherein said non-native gene corresponds to SEQ ID NO:9 or 13. In certain embodiments, the present invention relates to the aforementioned genetically modified microorganism, wherein said microorganism is able to convert at least 60% of carbon from metabolized biomass into ethanol. In certain embodiments, the present invention relates to the aforementioned genetically modified microorganism, wherein said microorganism is selected from the group consisting of: (a) a thermophilic or mesophilic microorganism with a native ability to hydrolyze cellulose; (b) a thermophilic or mesophilic microorganism with a native ability to hydrolyze xylan; and (c) a thermophilic or mesophilic microorganism with a native ability to hydrolyze cellulose and xylan. In certain embodiments, the present invention relates to the aforementioned genetically modified microorganism, wherein said microorganism has a native ability to hydrolyze cellulose. In certain embodiments, the present invention relates to the aforementioned genetically modified microorganism, wherein said microorganism has a native ability to hydrolyze cellulose and xylan. In certain embodiments, the present invention relates to the aforementioned genetically modified microorganism, wherein a first non-native gene is inserted, which first non-native gene encodes a first non-native enzyme that confers the ability to hydrolyze xylan. In certain embodiments, the present invention relates to the aforementioned genetically modified microorganism, wherein said microorganism has a native ability to hydrolyze xylan. In certain embodiments, the present invention relates to the aforementioned genetically modified microorganism, wherein a first non-native gene is inserted, which first non-native gene encodes a first non-native enzyme that confers the ability to hydrolyze cellulose. In certain embodiments, the present invention relates to the aforementioned genetically modified microorganism, wherein said organic acid is selected from the group consisting of lactic acid and acetic acid. In certain embodiments, the present invention relates to the aforementioned genetically modified microorganism, wherein said organic acid is lactic acid. In certain embodiments, the present invention relates to the aforementioned genetically modified microorganism, wherein said organic acid is acetic acid. In certain embodiments, the present invention relates to the aforementioned genetically modified microorganism, wherein said first native enzyme is selected from the group consisting of lactate dehydrogenase, acetate kinase, and phosphotransacetylase. In certain embodiments, the present invention relates to the aforementioned genetically modified microorganism, wherein said first native enzyme is lactate dehydrogenase. In certain embodiments, the present invention relates to the aforementioned genetically modified microorganism, wherein said first native enzyme is acetate kinase. In certain embodiments, the present invention relates to the aforementioned genetically modified microorganism, wherein said first native enzyme is phosphotransacetylase. In certain embodiments, the present invention relates to the aforementioned genetically modified microorganism, wherein a second native gene is partially, substantially, or completely deleted, silenced, inactivated, or down-regulated, which second native gene encodes a second native enzyme involved in the metabolic production of an organic acid or a salt thereof. In certain embodiments, the present invention relates to the aforementioned genetically modified microorganism, wherein said second native enzyme is acetate kinase or phosphotransacetylase. In certain embodiments, the present invention relates to the aforementioned genetically modified microorganism, wherein said second native enzyme is lactate dehydrogenase. In certain embodiments, the present invention relates to the aforementioned genetically modified microorganism, wherein (a) a first native gene is partially, substantially, or completely deleted, silenced, inactivated, or down-regulated, which first native gene encodes a first native enzyme involved in the metabolic production of an organic acid or a salt thereof, and (b) a first non-native gene is inserted, which first non-native gene encodes a first non-native enzyme involved in the hydrolysis of a polysaccharide, thereby allowing said thermophilic or mesophilic microorganism to produce ethanol as a fermentation product. In certain embodiments, the present invention relates to the aforementioned genetically modified microorganism, wherein said first non-native gene encodes a first non-native enzyme that confers the ability to hydrolyze cellulose, thereby allowing said thermophilic or mesophilic microorganism to hydrolyze cellulose. In certain embodiments, the present invention relates to the aforementioned genetically modified microorganism, wherein said first non-native gene encodes a first non-native enzyme that confers the ability to hydrolyze xylan, thereby allowing said thermophilic or mesophilic microorganism to hydrolyze xylan. In certain embodiments, the present invention relates to the aforementioned genetically modified microorganism, wherein said first non-native gene encodes a first non-native enzyme that confers the ability to hydrolyze cellulose; and a second non-native gene is inserted, which second non-native gene encodes a second non-native enzyme that confers the ability to hydrolyze xylan, thereby allowing said thermophilic or mesophilic microorganism to hydrolyze cellulose and xylan. In certain embodiments, the present invention relates to the aforementioned genetically modified microorganism, wherein said organic acid is lactic acid. In certain embodiments, the present invention relates to the aforementioned genetically modified microorganism, wherein said organic acid is acetic acid. In certain embodiments, the present invention relates to the aforementioned genetically modified microorganism, wherein said first non-native enzyme is pyruvate decarboxylase (PDC) or alcohol dehydrogenase (ADH). In certain embodiments, the present invention relates to the aforementioned genetically modified microorganism, wherein said microorganism is able to convert at least 60% of carbon from metabolized biomass into ethanol.
[0017] In certain embodiments, the present invention relates to any of the aforementioned genetically modified microorganisms, wherein said microorganism is mesophilic. In certain embodiments, the present invention relates to any of the aforementioned genetically modified microorganisms, wherein said microorganism is thermophilic.
[0018] Another aspect of the invention relates to a process for converting lignocellulosic biomass to ethanol, comprising contacting lignocellulosic biomass with any one of the aforementioned genetically modified thermophilic or mesophilic microorganisms. In certain embodiments, the present invention relates to the aforementioned process, wherein said lignocellulosic biomass is selected from the group consisting of grass, switch grass, cord grass, rye grass, reed canary grass, mixed prairie grass, miscanthus, sugar-processing residues, sugarcane bagasse, sugarcane straw, agricultural wastes, rice straw, rice hulls, barley straw, corn cobs, cereal straw, wheat straw, canola straw, oat straw, oat hulls, corn fiber, stover, soybean stover, corn stover, forestry wastes, recycled wood pulp fiber, paper sludge, sawdust, hardwood, softwood, and combinations thereof. In certain embodiments, the present invention relates to the aforementioned process, wherein said lignocellulosic biomass is selected from the group consisting of corn stover, sugarcane bagasse, switchgrass, and poplar wood. In certain embodiments, the present invention relates to the aforementioned process, wherein said lignocellulosic biomass is corn stover. In certain embodiments, the present invention relates to the aforementioned process, wherein said lignocellulosic biomass is sugarcane bagasse. In certain embodiments, the present invention relates to the aforementioned process, wherein said lignocellulosic biomass is switchgrass. In certain embodiments, the present invention relates to the aforementioned process, wherein said lignocellulosic biomass is poplar wood. In certain embodiments, the present invention relates to the aforementioned process, wherein said lignocellulosic biomass is willow. In certain embodiments, the present invention relates to the aforementioned process, wherein said lignocellulosic biomass is paper sludge.
BRIEF DESCRIPTION OF THE DRAWINGS
[0019] FIG. 1 depicts the glycolysis pathway.
[0020] FIGS. 2A-2B depict pentose and glucuronate interconversions and highlights the enzymes, xylose isomerase (XI or 5.3.1.5) and xylulokinase (XK or 2.7.1.17), in the D-xylose to ethanol pathway.
[0021] FIGS. 3A-3B depict pentose and glucuronate interconversions and highlights the enzymes, L-arabinose isomerase (5.3.1.4) and L-ribulose-5-phosphate 4-epimerase (5.1.3.4), in the L-arabinose utilization pathway.
[0022] FIGS. 4A-4B depict pentose and glucuronate interconversions and shows that the genes for xylose isomerase, xylulokinase, L-arabinose isomerase, and L-ribulose-5-phosphate 4-epimerase are present in C. cellulolyticum.
[0023] FIGS. 5A-5B depict pentose and glucuronate interconversions and shows that xylose isomerase and xylulokinase are present, while L-arabinose isomerase and L-ribulose-5-phosphate 4-epimerase are absent in C. phytofermentans.
[0024] FIGS. 6A-6F shows an alignment of Clostridium thermocellum, Clostridium cellulolyticum, Thermoanaerobacterium saccharolyticum, C. stercorarium, C. stercorarium II, Caldiscellulosiruptor kristjanssonii, C. phytofermentans, indicating about 73-89% homology at the level of the 16S rDNA gene.
[0025] FIG. 7 shows the construction of a double crossover knockout vector for inactivation of the ack gene in Clostridium thermocellum based on the plasmid pIKM1.
[0026] FIG. 8 shows the construction of a double crossover knockout vector for inactivation of the ack gene in Clostridium thermocellum based on the replicative plasmid pNW33N.
[0027] FIG. 9 shows the construction of a double crossover knockout vector for inactivation of the ldh gene in Clostridium thermocellum based on the plasmid pIKM 1.
[0028] FIG. 10 shows the construction of a double crossover knockout vector for inactivation of the ldh gene in Clostridium thermocellum based on the replicative plasmid vector pNW33N.
[0029] FIG. 11 shows the construction of a double crossover suicide vector for inactivation of the ldh gene in Clostridium thermocellum based on the plasmid pUC19.
[0030] FIGS. 12A and 12B show product formation and OD600 for C. straminisolvens grown on cellobiose and Avicel®, respectively.
[0031] FIGS. 13A and 13B show product formation and OD600 for C. thermocellum grown on cellobiose and Avicel®, respectively.
[0032] FIGS. 14A and 14B show product formation and OD600 for C. cellulolyticum grown on cellobiose and Avicel®, respectively.
[0033] FIGS. 15A and 15B show product formation and OD600 for C. stercorarium subs. leptospartum grown on cellobiose and Avicel®, respectively.
[0034] FIGS. 16A and 16B show product formation and OD600 for Caldicellulosiruptor kristjanssonii grown on cellobiose and Avicel®, respectively.
[0035] FIGS. 17A and 17B show product formation and OD600 for Clostridium phytofermentans grown on cellobiose and Avicel®, respectively.
[0036] FIG. 18 shows total metabolic byproducts after 48 hours of fermentation of 2.5 g/L xylan and 2.5 g/L cellobiose.
[0037] FIG. 19 shows a map of the ack gene and the region amplified by PCR for gene disruption.
[0038] FIG. 20 shows a map of the ldh 2262 gene and the region amplified by PCR for gene disruption.
[0039] FIG. 21 shows an example of C. cellulolyticum (C. cell.) ldh (2262) double crossover knockout fragment.
[0040] FIG. 22 shows a map of the ack gene of Clostridium phytofermentans and the region amplified by PCR for gene disruption.
[0041] FIG. 23 shows an example of a putative double crossover knockout construct with the mLs gene as a selectable marker in Clostridium phytofermentans.
[0042] FIG. 24 shows a map of the ldh 1389 gene and the region amplified by PCR for gene disruption.
[0043] FIG. 25 shows an example of a putative double crossover knockout construct with the mLs gene as a selectable marker.
[0044] FIG. 26 is a diagram representing by 250-550 of pMOD®-2<MCS>.
[0045] FIG. 27 shows the product concentration profiles for 1% Avicel® using C. straminisolvens. The ethanol-to-acetate ratio is depicted as E/A and the ratio of ethanol-to-total products is depicted as E/T.
[0046] FIG. 28 shows an example of a vector for retargeting the L1.LtrB intron to insert in C. cell. ACK gene (SEQ ID NO:21).
[0047] FIG. 29 shows an example of vector for retargeting the L1.LtrB intron to insert in C. cell. LDH2744 gene (SEQ ID NO:23).
[0048] FIGS. 30A-30E show an alignment of T. pseudoethanolicus 39E, T. sp strain 59, T. saccharolyticum B6A-RI, T. saccharolyticum YS485 and consensus at the level of the 16S rDNA gene.
[0049] FIGS. 31A-31D show an alignment of T sp. strain 59, T pseudoethanolicus, T. saccharolyticum B6A-RI, T saccharolyticum YS485 and consensus at the level of the pta gene.
[0050] FIGS. 32A-32D show an alignment of T sp. strain 59, T pseudoethanolicus, T. saccharolyticum B6A-RI, T saccharolyticum YS485 and consensus at the level of the ack gene.
[0051] FIGS. 33A-33C show an alignment of T sp. strain 59, T pseudoethanolicus 39E, T. saccharolyticum B6A-RI, T. saccharolyticum YS485 and consensus at the level of the ldh gene.
[0052] FIGS. 34A-34B show a schematic of the glycolysis/fermentation pathway.
[0053] FIG. 35 shows an example of a pMU340 plasmid.
[0054] FIG. 36 shows an example of a pMU102 Z. mobilis PDC-ADH plasmid.
[0055] FIG. 37 shows an example of a pMU102 Z. palmae PDC, Z. mobilis ADH plasmid.
[0056] FIG. 38 shows the plasmid map of pMU360. The DNA sequence of pMU360 is set forth as SEQ ID NO:61.
[0057] FIG. 39 shows the lactate levels in nine colonies of thiamphenicol-resistant transformants.
[0058] FIG. 40 shows an example of a T sacch. pfl KO single crossover plasmid (SEQ ID NO:47).
[0059] FIG. 41 shows an example of a T. sacch. pfl KO double crossover plasmid (SEQ ID NO:48).
[0060] FIG. 42 shows an example of a C. therm. pfl KO single crossover plasmid (SEQ ID NO:49).
[0061] FIG. 43 shows an example of a C. therm. pfl KO double crossover plasmid (SEQ ID NO:50).
[0062] FIG. 44 shows an example of a C. phyto. pfl KO single crossover plasmid (SEQ ID NO:51).
[0063] FIG. 45 shows an example of a C. phyto. pfl KO double crossover plasmid (SEQ ID NO:52).
[0064] FIG. 46 shows an example of a T sacch. #59 L-ldh KO single crossover plasmid (SEQ ID NO:53).
[0065] FIG. 47 shows an example of a T. sacch. #59 L-ldh KO double crossover plasmid (SEQ ID NO:54).
[0066] FIG. 48 shows an example of a T. sacch. #59 pta/ack KO single crossover plasmid (SEQ ID NO:55).
[0067] FIG. 49 shows an example of a T sacch. #59 pta/ack KO double crossover plasmid (SEQ ID NO:56).
[0068] FIG. 50 shows an example of a T pseudo. L-ldh KO single crossover plasmid (SEQ ID NO:57).
[0069] FIG. 51 shows an example of a T. pseudo. L-ldh KO double crossover plasmid (SEQ ID NO:58).
[0070] FIG. 52 shows an example of a T. pseudo. ack KO single crossover plasmid (SEQ ID NO:59).
[0071] FIG. 53 shows an example of a T. pseudo. pta/ack KO double crossover plasmid (SEQ ID NO:60).
BRIEF DESCRIPTION OF THE TABLES
[0072] Table 1 summarizes representative highly cellulolytic organisms.
[0073] Table 2 summarizes representative native cellulolytic and xylanolytic organisms.
[0074] Table 3 shows a categorization of bacterial strains based on their substrate utilization.
[0075] Table 4 shows insertion location and primers to retarget Intron to C. cellulolyticum acetate kinase.
[0076] Table 5 shows insertion location and primers to retarget Intron to C. cellulolyticum lactate dehydrogenase.
[0077] Table 6 shows fermentation performance of engineered Thermoanaerobacter and Thermoanaerobacterium strains.
DETAILED DESCRIPTION OF THE INVENTION
[0078] Aspects of the present invention relate to the engineering of thermophilic or mesophilic microorganisms for use in the production of ethanol from lignocellulosic biomass. The use of thermophilic bacteria for ethanol production offers many advantages over traditional processes based upon mesophilic ethanol producers. For example, the use of thermophilic organisms provides significant economic savings over traditional process methods due to lower ethanol separation costs, reduced requirements for external enzyme addition, and reduced processing times.
[0079] Aspects of the present invention relate to a process by which the cost of ethanol production from cellulosic biomass-containing materials can be reduced by using a novel processing configuration. In particular, the present invention provides numerous methods for increasing ethanol production in a genetically modified microorganism.
[0080] In certain other embodiments, the present invention relates to genetically modified thermophilic or mesophilic microorganisms, wherein a gene or a particular polynucleotide sequence is partially, substantially, or completely deleted, silenced, inactivated, or down-regulated, which gene or polynucleotide sequence encodes for an enzyme that confers upon the microorganism the ability to produce organic acids as fermentation products, thereby increasing the ability of the microorganism to produce ethanol as the major fermentation product. Further, by virtue of a novel integration of processing steps, commonly known as consolidated bioprocessing, aspects of the present invention provide for more efficient production of ethanol from cellulosic-biomass-containing raw materials. The incorporation of genetically modified thermophilic or mesophilic microorganisms in the processing of said materials allows for fermentation steps to be conducted at higher temperatures, improving process economics. For example, reaction kinetics are typically proportional to temperature, so higher temperatures are generally associated with increases in the overall rate of production. Additionally, higher temperature facilitates the removal of volatile products from the broth and reduces the need for cooling after pretreatment.
[0081] In certain embodiments, the present invention relates to genetically modified or recombinant thermophilic or mesophilic microorganisms with increased ability to produce enzymes that confer the ability to produce ethanol as a fermentation product, the presence of which enzyme(s) modify the process of metabolizing lignocellulosic biomass materials to produce ethanol as the major fermentation product. In one aspect of the invention, one or more non-native genes are inserted into a genetically modified thermophilic or mesophilic microorganism, wherein said non-native gene encodes an enzyme involved in the metabolic production of ethanol, for example, such enzyme may confer the ability to metabolize a pentose sugar and/or a hexose sugar. For example, in one embodiment, the enzyme may be involved in the D-xylose or L-arabinose pathway, thereby allowing the microorganism to metabolize a pentose sugar, i.e., D-xylose or L-arabinose. By inserting (e.g., introducing or adding) a non-native gene that encodes an enzyme involved in the metabolism or utilization of D-xylose or L-arabinose, the microorganism has an increased ability to produce ethanol relative to the native organism.
[0082] The present invention also provides novel compositions that may be integrated into the microorganisms of the invention. In one embodiment, an isolated nucleic acid molecule of the invention comprises a nucleic acid molecule which is a complement of a nucleotide sequence shown in any one of SEQ ID NOS:1-76. In another embodiment, an isolated nucleic acid molecule of the invention comprises a nucleic acid molecule which is a complement of a nucleotide sequence shown in any one of SEQ ID NOS:1-76, or a portion of any of these nucleotide sequences. A nucleic acid molecule which is complementary to a nucleotide sequence shown in any one of SEQ ID NOS:1-76, or the coding region thereof, is one which is sufficiently complementary to a nucleotide sequence shown in any one of SEQ ID NOS:1-76, or the coding region thereof, such that it can hybridize to a nucleotide sequence shown in any one of SEQ ID NOS:1-76, or the coding region thereof, thereby forming a stable duplex.
[0083] In still another preferred embodiment, an isolated nucleic acid molecule of the present invention comprises a nucleotide sequence which is at least about 50%, 54%, 55%, 60%, 62%, 65%, 70%, 75%, 78%, 80%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or more homologous to the nucleotide sequences (e.g., to the entire length of the nucleotide sequence) shown in any one of SEQ ID NOS:1-76, or a portion of any of these nucleotide sequences.
[0084] Moreover, the nucleic acid molecules of the invention may comprise only a portion of the nucleic acid sequence of any one of SEQ ID NOS:1-76, or the coding region thereof; for example, the nucleic acid molecule may be a fragment which can be used as a probe or primer or a fragment encoding a biologically active portion of a protein. In another embodiment, the nucleic acid molecules may comprise at least about 12 or 15, preferably about 20 or 25, more preferably about 30, 35, 40, 45, 50, 55, 60, 65, or 75 consecutive nucleotides of any one of SEQ ID NOS:1-76.
DEFINITIONS
[0085] The term "heterologous polynucleotide segment" is intended to include a polynucleotide segment that encodes one or more polypeptides or portions or fragments of polypeptides. A heterologous polynucleotide segment may be derived from any source, e.g., eukaryotes, prokaryotes, viruses, or synthetic polynucleotide fragments.
[0086] The terms "promoter" or "surrogate promoter" is intended to include a polynucleotide segment that can transcriptionally control a gene-of-interest that it does not transcriptionally control in nature. In certain embodiments, the transcriptional control of a surrogate promoter results in an increase in expression of the gene-of-interest. In certain embodiments, a surrogate promoter is placed 5' to the gene-of-interest. A surrogate promoter may be used to replace the natural promoter, or may be used in addition to the natural promoter. A surrogate promoter may be endogenous with regard to the host cell in which it is used, or it may be a heterologous polynucleotide sequence introduced into the host cell, e.g., exogenous with regard to the host cell in which it is used.
[0087] The terms "gene(s)" or "polynucleotide segment" or "polynucleotide sequence(s)" are intended to include nucleic acid molecules, e.g., polynucleotides which include an open reading frame encoding a polypeptide, and can further include non-coding regulatory sequences, and introns. In addition, the terms are intended to include one or more genes that map to a functional locus. In addition, the terms are intended to include a specific gene for a selected purpose. The gene may be endogenous to the host cell or may be recombinantly introduced into the host cell, e.g., as a plasmid maintained episomally or a plasmid (or fragment thereof) that is stably integrated into the genome. In addition to the plasmid form, a gene may, for example, be in the form of linear DNA. In certain embodiments, the gene of polynucleotide segment is involved in at least one step in the bioconversion of a carbohydrate to ethanol. Accordingly, the term is intended to include any gene encoding a polypeptide, such as the enzymes acetate kinase (ACK), phosphotransacetylase (PTA), and/or lactate dehydrogenase (LDH), enzymes in the D-xylose pathway, such as xylose isomerase and xylulokinase, enzymes in the L-arabinose pathway, such as L-arabinose isomerase and L-ribulose-5-phosphate 4-epimerase. The term gene is also intended to cover all copies of a particular gene, e.g., all of the DNA sequences in a cell encoding a particular gene product.
[0088] The term "transcriptional control" is intended to include the ability to modulate gene expression at the level of transcription. In certain embodiments, transcription, and thus gene expression, is modulated by replacing or adding a surrogate promoter near the 5' end of the coding region of a gene-of-interest, thereby resulting in altered gene expression. In certain embodiments, the transcriptional control of one or more gene is engineered to result in the optimal expression of such genes, e.g., in a desired ratio. The term also includes inducible transcriptional control as recognized in the art.
[0089] The term "expression" is intended to include the expression of a gene at least at the level of mRNA production.
[0090] The term "expression product" is intended to include the resultant product, e.g., a polypeptide, of an expressed gene.
[0091] The term "increased expression" is intended to include an alteration in gene expression at least at the level of increased mRNA production and, preferably, at the level of polypeptide expression. The term "increased production" is intended to include an increase in the amount of a polypeptide expressed, in the level of the enzymatic activity of the polypeptide, or a combination thereof.
[0092] The terms "activity," "activities," "enzymatic activity," and "enzymatic activities" are used interchangeably and are intended to include any functional activity normally attributed to a selected polypeptide when produced under favorable conditions. Typically, the activity of a selected polypeptide encompasses the total enzymatic activity associated with the produced polypeptide. The polypeptide produced by a host cell and having enzymatic activity may be located in the intracellular space of the cell, cell-associated, secreted into the extracellular milieu, or a combination thereof. Techniques for determining total activity as compared to secreted activity are described herein and are known in the art.
[0093] The term "xylanolytic activity" is intended to include the ability to hydrolyze glycosidic linkages in oligopentoses and polypentoses.
[0094] The term "cellulolytic activity" is intended to include the ability to hydrolyze glycosidic linkages in oligohexoses and polyhexoses. Cellulolytic activity may also include the ability to depolymerize or debranch cellulose and hemicellulose.
[0095] As used herein, the term "lactate dehydrogenase" or "LDH" is intended to include the enzyme capable of converting pyruvate into lactate. It is understood that LDH can also catalyze the oxidation of hydroxybutyrate.
[0096] As used herein the term "alcohol dehydrogenase" or "ADH" is intended to include the enzyme capable of converting acetaldehyde into an alcohol, advantageously, ethanol.
[0097] The term "pyruvate decarboxylase activity" is intended to include the ability of a polypeptide to enzymatically convert pyruvate into acetaldehyde (e.g., "pyruvate decarboxylase" or "PDC"). Typically, the activity of a selected polypeptide encompasses the total enzymatic activity associated with the produced polypeptide, comprising, e.g., the superior substrate affinity of the enzyme, thermostability, stability at different pHs, or a combination of these attributes.
[0098] The term "ethanologenic" is intended to include the ability of a microorganism to produce ethanol from a carbohydrate as a fermentation product. The term is intended to include, but is not limited to, naturally occurring ethanologenic organisms, ethanologenic organisms with naturally occurring or induced mutations, and ethanologenic organisms which have been genetically modified.
[0099] The terms "fermenting" and "fermentation" are intended to include the enzymatic process (e.g., cellular or acellular, e.g., a lysate or purified polypeptide mixture) by which ethanol is produced from a carbohydrate, in particular, as a product of fermentation.
[0100] The term "secreted" is intended to include the movement of polypeptides to the periplasmic space or extracellular milieu. The term "increased secretion" is intended to include situations in which a given polypeptide is secreted at an increased level (i.e., in excess of the naturally-occurring amount of secretion). In certain embodiments, the term "increased secreted" refers to an increase in secretion of a given polypeptide that is at least about 10% or at least about 100%, 200%, 300%, 400%, 500%, 600%, 700%, 800%, 900%, 1000%, or more, as compared to the naturally-occurring level of secretion.
[0101] The term "secretory polypeptide" is intended to include any polypeptide(s), alone or in combination with other polypeptides, that facilitate the transport of another polypeptide from the intracellular space of a cell to the extracellular milieu. In certain embodiments, the secretory polypeptide(s) encompass all the necessary secretory polypeptides sufficient to impart secretory activity to a Gram-negative or Gram-positive host cell. Typically, secretory proteins are encoded in a single region or locus that may be isolated from one host cell and transferred to another host cell using genetic engineering. In certain embodiments, the secretory polypeptide(s) are derived from any bacterial cell having secretory activity. In certain embodiments, the secretory polypeptide(s) are derived from a host cell having Type II secretory activity. In certain embodiments, the host cell is a thermophilic bacterial cell.
[0102] The term "derived from" is intended to include the isolation (in whole or in part) of a polynucleotide segment from an indicated source or the purification of a polypeptide from an indicated source. The term is intended to include, for example, direct cloning, PCR amplification, or artificial synthesis from or based on a sequence associated with the indicated polynucleotide source.
[0103] By "thermophilic" is meant an organism that thrives at a temperature of about 45° C. or higher.
[0104] By "mesophilic" is meant an organism that thrives at a temperature of about 20-45° C.
[0105] The term "organic acid" is art-recognized. The term "lactic acid" refers to the organic acid 2-hydroxypropionic acid in either the free acid or salt form. The salt form of lactic acid is referred to as "lactate" regardless of the neutralizing agent, i.e., calcium carbonate or ammonium hydroxide. The term "acetic acid" refers to the organic acid methanecarboxylic acid, also known as ethanoic acid, in either free acid or salt form. The salt form of acetic acid is referred to as "acetate."
[0106] Certain embodiments of the present invention provide for the "insertion," (e.g., the addition, integration, incorporation, or introduction) of certain genes or particular polynucleotide sequences within thermophilic or mesophilic microorganisms, which insertion of genes or particular polynucleotide sequences may be understood to encompass "genetic modification(s)" or "transformation(s)" such that the resulting strains of said thermophilic or mesophilic microorganisms may be understood to be "genetically modified" or "transformed." In certain embodiments, strains may be of bacterial, fungal, or yeast origin.
[0107] Certain embodiments of the present invention provide for the "inactivation" or "deletion" of certain genes or particular polynucleotide sequences within thermophilic or mesophilic microorganisms, which "inactivation" or "deletion" of genes or particular polynucleotide sequences may be understood to encompass "genetic modification(s)" or "transformation(s)" such that the resulting strains of said thermophilic or mesophilic microorganisms may be understood to be "genetically modified" or "transformed." In certain embodiments, strains may be of bacterial, fungal, or yeast origin.
[0108] The term "CBP organism" is intended to include microorganisms of the invention, e.g., microorganisms that have properties suitable for CBP.
[0109] In one aspect of the invention, the genes or particular polynucleotide sequences are inserted to activate the activity for which they encode, such as the expression of an enzyme. In certain embodiments, genes encoding enzymes in the metabolic production of ethanol, e.g., enzymes that metabolize pentose and/or hexose sugars, may be added to a mesophilic or thermophilic organism. In certain embodiments of the invention, the enzyme may confer the ability to metabolize a pentose sugar and be involved, for example, in the D-xylose pathway and/or L-arabinose pathway.
[0110] In one aspect of the invention, the genes or particular polynucleotide sequences are partially, substantially, or completely deleted, silenced, inactivated, or down-regulated in order to inactivate the activity for which they encode, such as the expression of an enzyme. Deletions provide maximum stability because there is no opportunity for a reverse mutation to restore function. Alternatively, genes can be partially, substantially, or completely deleted, silenced, inactivated, or down-regulated by insertion of nucleic acid sequences that disrupt the function and/or expression of the gene (e.g., P1 transduction or other methods known in the art). The terms "eliminate," "elimination," and "knockout" are used interchangeably with the term "deletion." In certain embodiments, strains of thermophilic or mesophilic microorganisms of interest may be engineered by site directed homologous recombination to knockout the production of organic acids. In still other embodiments, RNAi or antisense DNA (asDNA) may be used to partially, substantially, or completely silence, inactivate, or down-regulate a particular gene of interest.
[0111] In certain embodiments, the genes targeted for deletion or inactivation as described herein may be endogenous to the native strain of the microorganism, and may thus be understood to be referred to as "native gene(s)" or "endogenous gene(s)." An organism is in "a native state" if it has not been genetically engineered or otherwise manipulated by the hand of man in a manner that intentionally alters the genetic and/or phenotypic constitution of the organism. For example, wild-type organisms may be considered to be in a native state. In other embodiments, the gene(s) targeted for deletion or inactivation may be non-native to the organism.
Biomass
[0112] The terms "lignocellulosic material," "lignocellulosic substrate," and "cellulosic biomass" mean any type of biomass comprising cellulose, hemicellulose, lignin, or combinations thereof, such as but not limited to woody biomass, forage grasses, herbaceous energy crops, non-woody-plant biomass, agricultural wastes and/or agricultural residues, forestry residues and/or forestry wastes, paper-production sludge and/or waste paper sludge, waste-water-treatment sludge, municipal solid waste, corn fiber from wet and dry mill corn ethanol plants, and sugar-processing residues.
[0113] In a non-limiting example, the lignocellulosic material can include, but is not limited to, woody biomass, such as recycled wood pulp fiber, sawdust, hardwood, softwood, and combinations thereof; grasses, such as switch grass, cord grass, rye grass, reed canary grass, miscanthus, or a combination thereof; sugar-processing residues, such as but not limited to sugar cane bagasse; agricultural wastes, such as but not limited to rice straw, rice hulls, barley straw, corn cobs, cereal straw, wheat straw, canola straw, oat straw, oat hulls, and corn fiber; stover, such as but not limited to soybean stover, corn stover; and forestry wastes, such as but not limited to recycled wood pulp fiber, sawdust, hardwood (e.g., poplar, oak, maple, birch, willow), softwood, or any combination thereof. Lignocellulosic material may comprise one species of fiber; alternatively, lignocellulosic material may comprise a mixture of fibers that originate from different lignocellulosic materials. Particularly advantageous lignocellulosic materials are agricultural wastes, such as cereal straws, including wheat straw, barley straw, canola straw and oat straw; corn fiber; stovers, such as corn stover and soybean stover; grasses, such as switch grass, reed canary grass, cord grass, and miscanthus; or combinations thereof.
[0114] Paper sludge is also a viable feedstock for ethanol production. Paper sludge is solid residue arising from pulping and paper-making, and is typically removed from process wastewater in a primary clarifier. At a disposal cost of $30/wet ton, the cost of sludge disposal equates to $5/ton of paper that is produced for sale. The cost of disposing of wet sludge is a significant incentive to convert the material for other uses, such as conversion to ethanol. Processes provided by the present invention are widely applicable. Moreover, the saccharification and/or fermentation products may be used to produce ethanol or higher value added chemicals, such as organic acids, aromatics, esters, acetone and polymer intermediates.
Pyruvate Formate Lyase (PFL)
[0115] Pyruvate formate lyase (PFL) is an important enzyme (found in Escherichia coli and other organisms) that helps regulate anaerobic glucose metabolism. Using radical chemistry, it catalyzes the reversible conversion of pyruvate and coenzyme-A into formate and acetyl-CoA, a precursor of ethanol. Pyruvate formate lyase is a homodimer made of 85 kDa, 759-residue subunits. It has a 10-stranded beta/alpha barrel motif into which is inserted a beta finger that contains major catalytic residues. The active site of the enzyme, elucidated by x-ray crystallography, holds three essential amino acids that perform catalysis (Gly734, Cys418, and Cys419), three major residues that hold the substrate pyruvate close by (Arg435, Arg176, and A1a272), and two flanking hydrophobic residues (Trp333 and Phe432).
[0116] Studies have found structural similarities between the active site of pyruvate formate lyase and that of Class I and Class III ribonucleotide reductase (RNR) enzymes. The roles of the 3 catalytic residues are as follows: Gly734 (glycyl radical)--transfers the radical on and off Cys418, via Cys419; Cys418 (thiyl radical)--performs acylation chemistry on the carbon atom of the pyruvate carbonyl; Cys419 (thiyl radical)--performs hydrogen-atom transfers.
[0117] The proposed mechanism for pyruvate formate lyase begins with radical transfer from Gly734 to Cys418, via Cys419. The Cys418 thiyl radical adds covalently to C2 (second carbon atom) of pyruvate, generating an acetyl-enzyme intermediate (which now contains the radical). The acetyl-enzyme intermediate releases a formyl radical that undergoes hydrogen-atom transfer with Cys419. This generates formate and a Cys419 radical. Coenzyme-A undergoes hydrogen-atom transfer with the Cys419 radical to generate a coenzyme-A radical. The coenzyme-A radical then picks up the acetyl group from Cys418 to generate acetyl-CoA, leaving behind a Cys418 radical. Pyruvate formate lyase can then undergo radical transfer to put the radical back onto Gly734. Each of the above mentioned steps are also reversible.
[0118] Two additional enzymes regulate the "on" and "off" states of pyruvate formate lyase to regulate anaerobic glucose metabolism: PFL activase (AE) and PFL deactivase (DA). Activated pyruvate formate lyase allows formation of acetyl-CoA, a small molecule important in the production of energy, when pyruvate is available. Deactivated pyruvate formate lyase, even with substrates present, does not catalyze the reaction. PFL activase is part of the radical SAM (S-adenosylmethionine) superfamily.
[0119] The enzyme turns pyruvate formate lyase "on" by converting Gly734 (G-H) into a Gly734 radical (G*) via a 5'-deoxyadenosyl radical (radical SAM). PFL deactivase (DA) turns pyruvate formate lyase "off" by quenching the Gly734 radical. Furthermore, pyruvate formate lyase is sensitive to molecular oxygen (O2), the presence of which shuts the enzyme off.
Xylose Metabolism
[0120] Xylose is a five-carbon monosaccharide that can be metabolized into useful products by a variety of organisms. There are two main pathways of xylose metabolism, each unique in the characteristic enzymes they utilize. One pathway is called the "Xylose Reductase-Xylitol Dehydrogenase" or XR-XDH pathway. Xylose reductase (XR) and xylitol dehydrogenase (XDH) are the two main enzymes used in this method of xylose degradation. XR, encoded by the XYL1 gene, is responsible for the reduction of xylose to xylitol and is aided by cofactors NADH or NADPH. Xylitol is then oxidized to xylulose by XDH, which is expressed through the XYL2 gene, and accomplished exclusively with the cofactor NAD+. Because of the varying cofactors needed in this pathway and the degree to which they are available for usage, an imbalance can result in an overproduction of xylitol byproduct and an inefficient production of desirable ethanol. Varying expression of the XR and XDH enzyme levels have been tested in the laboratory in the attempt to optimize the efficiency of the xylose metabolism pathway.
[0121] The other pathway for xylose metabolism is called the "Xylose Isomerase" (XI) pathway. Enzyme XI is responsible for direct conversion of xylose into xylulose, and does not proceed via a xylitol intermediate. Both pathways create xylulose, although the enzymes utilized are different. After production of xylulose both the XR-XDH and XI pathways proceed through enzyme xylulokinase (XK), encoded on gene XKS1, to further modify xylulose into xylulose-5-P where it then enters the pentose phosphate pathway for further catabolism.
[0122] Studies on flux through the pentose phosphate pathway during xylose metabolism have revealed that limiting the speed of this step may be beneficial to the efficiency of fermentation to ethanol. Modifications to this flux that may improve ethanol production include a) lowering phosphoglucose isomerase activity, b) deleting the GND1 gene, and c) deleting the ZWF1 gene (Jeppsson et al., 2002). Since the pentose phosphate pathway produces additional NADPH during metabolism, limiting this step will help to correct the already evident imbalance between NAD(P)H and NAD+ cofactors and reduce xylitol byproduct. Another experiment comparing the two xylose metabolizing pathways revealed that the XI pathway was best able to metabolize xylose to produce the greatest ethanol yield, while the XR-XDH pathway reached a much faster rate of ethanol production (Karhumaa et al., 2007).
Microorganisms
[0123] The present invention includes multiple strategies for the development of microorganisms with the combination of substrate-utilization and product-formation properties required for CBP. The "native cellulolytic strategy" involves engineering naturally occurring cellulolytic microorganisms to improve product-related properties, such as yield and titer. The "recombinant cellulolytic strategy" involves engineering natively non-cellulolytic organisms that exhibit high product yields and titers to express a heterologous cellulase system that enables cellulose utilization or hemicellulose utilization or both.
Cellulolytic Microorganisms
[0124] Several microorganisms reported in the literature to be cellulolytic or have cellulolytic activity have been characterized by a variety of means, including their ability to grow on microcrystalline cellulose as well as a variety of other sugars. Additionally, the organisms may be characterized by other means, including but not limited to, their ability to depolymerize and debranch cellulose and hemicellulose. Clostridium thermocellum (strain DSMZ 1237) was used to benchmark the organisms of interest. As used herein, C. thermocellum may include various strains, including, but not limited to, DSMZ 1237, DSMZ 1313, DSMZ 2360, DSMZ 4150, DSMZ 7072, and ATCC 31924. In certain embodiments of the invention, the strain of C. thermocellum may include, but is not limited to, DSMZ 1313 or DSMZ 1237. In another embodiment, particularly suitable organisms of interest for use in the present invention include cellulolytic microorganisms with a greater than 70% 16S rDNA homology to C. thermocellum.Alignment of Clostridium thermocellum, Clostridium cellulolyticum, Thermoanaerobacterium saccharolyticum, C. stercorarium, C. stercorarium II, Caldiscellulosiruptor kristjanssonii, C. phytofermentans indicate a 73-85% homology at the level of the 16S rDNA gene (FIG. 6A-6F).
[0125] Clostridium straminisolvens has been determined to grow nearly as well as C. thermocellum on Avicel®. Table 1 summarizes certain highly cellulolytic organisms.
TABLE-US-00001 TABLE 1 T pH DSMZ optimum; optimum; Gram Aero- Strain No. or range or range Stain tolerant Utilizes Products Clostridium 1313 55-60 7 positive No cellobiose, acetic thermocellum cellulose acid, lactic acid, ethanol, H2, CO2 Clostridium 16021 50-55; 6.5-6.8; positive Yes cellobiose, acetic straminisolvens 45-60 6.0-8.5 cellulose acid, lactic acid, ethanol, H2, CO2
[0126] Organisms were grown on 20 g/L cellobiose or 20 g/L Avicel®. C. thermocellum was grown at 60° C. and C. straminisolvens was grown at 55° C. Both were pre-cultured from -80° C. freezer stock (origin DSMZ) on M122 with 50 mM MOPS. During mid to late log growth phase pre-cultures were used to inoculate the batch cultures in 100 mL serum bottles to a working volume of 50 mL. Liquid samples were removed periodically for HPLC analysis of metabolic byproducts and sugar consumption. OD600 was taken at each of these time points. FIGS. 12A and 12B show product formation and OD600 for C. straminisolvens on cellobiose and Avicel®, respectively. Substantial cellobiose (37%) was consumed with 48 hours before OD dropped and product formation leveled off. FIGS. 13A and 13B show product formation and OD600 for C. thermocellum on cellobiose and Avicel®, respectively. C. thermocellum consumed ˜60% of cellobiose within 48 hours, at which point product formation leveled out. Inhibition due to formation of organic acids caused incomplete utilization of substrates.
[0127] Certain microorganisms, including, for example, C. thermocellum and C. straminisolvens, cannot metabolize pentose sugars, such as D-xylose or L-arabinose, but are able to metabolize hexose sugars. Both D-xylose and L-arabinose are abundant sugars in biomass with D-xylose accounting for approximately 16-20% in soft and hard woods and L-arabinose accounting for approximately 25% in corn fiber. Accordingly, one object of the invention is to provide genetically-modified cellulolytic microorganisms, with the ability to metabolize pentose sugars, such as D-xylose and L-arabinose, thereby to enhance their use as biocatalysts for fermentation in the biomass-to-ethanol industry.
Cellulolytic and Xylanolytic Microorganisms
[0128] Several microorganisms determined from literature to be both cellulolytic and xylanolytic have been characterized by their ability to grow on microcrystalline cellulose and birchwood xylan as well as a variety of other sugars. Clostridium thermocellum was used to benchmark the organisms of interest. Of the strains selected for characterization Clostridium cellulolyticum, Clostridium stercorarium subs. leptospartum, Caldicellulosiruptor kristjanssonii and Clostridium phytofermentans grew weakly on Avicel® and well on birchwood xylan. Table 2 summarizes some of the native cellulolytic and xylanolytic organisms.
TABLE-US-00002 TABLE 2 T pH Source/ optimum; optimum; Gram Aero- Strain No. or range or range Stain tolerant Utilizes Products Clostridium DSM 34 7.2 negative no Cellulose, acetic cellulolyticum 5812 xylan, acid, arabinose, lactic mannose, acid, galactose, ethanol, xylose, H2, CO2 glucose, cellobiose Clostridium DSM 60-65 7.0-7.5 negative no Cellulose, acetic stercorarium subs. 9219 cellobiose, acid, leptospartum lactose, xylose, lactic melibiose, acid, raffinose, ethanol, ribose, H2, CO2 fructose, sucrose Caldicellulosiruptor DSM 78; 45-82 7; 5.8-8.0 negative No cellobiose, acetic kristjanssonii 12137 glucose, xylose, acid, H2, galactose, CO2, mannose, lactic cellulose acid, ethanol formate Clostridium ATCC 37; 5-45 8.5; 6-9 Negative no Cellulose, acetic phytofermentans 700394 (gram xylan, acid, H2, type cellobiose, CO2, positive) fructose, lactic galactose, acid, glucose, ethanol lactose, formate maltose, mannose, ribose, xylose
[0129] Organisms were grown on 20 g/L cellobiose, 20 g/L Avicel® or 5 g/L birchwood xylan. C. cellulolyticum was grown at 37° C., C. stercorarium subs. leptospartum was grown at 60° C., Caldicellulosiruptor kristjanssonii was grown at 75° C. and Clostridium phytofermentans was grown at 37° C. All were pre-cultured from -80° C. freezer stock in M122c supplemented with 50 mM MOPS. During mid to late log growth phase pre-cultures were used to inoculate the batch cultures in 100 mL serum bottles to a working volume of 50 mL. Liquid samples were removed periodically for HPLC analysis of metabolic byproducts and sugar consumption. OD600 was taken at each of these time points. FIGS. 14A-17B show product formation and OD600 for growth on cellobiose and Avicel®.
[0130] In a separate experiment organisms were grown on 2.5 g/L single sugars including cellobiose, glucose, xylose, galactose, arabinose, mannose and lactose as well as 5 g/L Avicel® and birchwood xylan. In FIG. 18 product formation is compared on cellobiose and birchwood xylan after two days. Table 3 summarizes how bacterial strains may be categorized based on their substrate utilization.
TABLE-US-00003 TABLE 3 cellobiose glucose xylose galactose arabinose mannose lactose C. cellulolyticum x x x x x C. stercorarium x x x x x x x subs. leptospartum C. kristjanssonii x x x x x x C. phytofermentans x x x x x
Transgenic Conversion of Microorganisms
[0131] The present invention provides compositions and methods for the transgenic conversion of certain microorganisms. When genes encoding enzymes involved in the metabolic pathway of ethanol, including, for example, D-xylose and/or L-arabinose, are introduced into a bacterial strain that lacks one or more of these genes, for example, C. thermocellum or C. straminisolvens, one may select transformed strains for growth on D-xylose or growth on L-arabinose. It is expected that genes from other Clostridial species should be expressed in C. thermocellum and C. straminisolvens. Target gene donors may include microorganisms that confer the ability to metabolize hexose and pentose sugars, e.g., C. cellulolyticum, Caldicellulosiruptor kristjanssonii, C. phytofermentans, C. stercorarium, and Thermoanaerobacterium saccharolyticum.
[0132] The genomes of T. saccharolyticum, C. cellulolyticum, and C. phytofermentans are available. Accordingly, the present invention provides sequences which correspond to xylose isomerase and xylulokinase in each of the three hosts set forth above. In particular, the sequences corresponding to xylose isomerase (SEQ ID NO:6), xylulokinase (SEQ ID NO:7), L-arabinose isomerase (SEQ ID NO:8), and L-ribulose-5-phosphate 4-epimerase (SEQ ID NO:9) from T. saccharolyticum are set forth herein. Similarly, the sequences corresponding to xylose isomerase (SEQ ID NO:10), xylulokinase (SEQ ID NO:11), L-arabinose isomerase (SEQ ID NO:12), and L-ribulose-5-phosphate 4-epimerase (SEQ ID NO:13) from C. cellulolyticum are provided herein. C. phytofermentans utilizes the D-xylose pathway and does not utilize L-arabinose. Accordingly, the sequences corresponding to xylose isomerase (SEQ ID NO:14) and xylulokinase (SEQ ID NO:15) from C. phytofermentans are set forth herein.
[0133] C. kristjanssonii does metabolize xylose. To this end, the xylose isomerase (SEQ ID NO:71) and xylulokinase (SEQ ID NO:70) genes of C. kristjanssonii have been sequenced and are provided herein. C. straminisolvens has not been shown to grow on xylose, however it does contain xylose isomerase (SEQ ID NO:73) and xylulokinase (SEQ ID NO:72) genes, which may be functional after adaptation on xylose as a carbon source.
[0134] C. thermocellum and C. straminisolvens may lack one or more known genes or enzymes in the D-xylose to ethanol pathway and/or the L-arabinose utilization pathway. FIGS. 2A-2B and 3A-3B depict two key enzymes that are missing in each of these pathways in C. thermocellum. C. straminisolvens has xylose isomerase and xylulokinase, but the functionality of these enzymes is not known. Genomic sequencing has not revealed a copy of either L-arabinose isomerase or L-ribulose-5-phosphate 4-epimerase in C. straminosolvens.
[0135] C. thermocellum and C. straminisolvens are unable to metabolize xylulose which could reflect the absence (C. thermocellum) or lack of activity and/or expression (C. straminsolvens) of genes for xylose isomerase (referred to in FIG. 2A-2B as "XI" or 5.3.1.5), which converts D-xylose to D-xylulose, and xylulokinase (also referred to in FIG. 2A-2B as "XK" or 2.7.1.1), which converts D-xylulose to D-xylulose-5-phosphate. Furthermore, transport of xylose may be a limitation for C. straminsolvens. This potential limitation could be overcome by expression sugar transport genes from xylose utilizing organisms such as T. saccharolyticum and C. kristjanssonii.
[0136] C. thermocellum and C. straminisolvens are also unable to metabolize L-arabinose which could reflect the absence of genes for L-arabinose isomerase (also referred to in FIG. 3A-3B as 5.3.1.4) and L-ribulose-5-phosphate 4-epimerase (also referred to in FIG. 3A-3B as 5.1.3.4).
[0137] The four genes described above, e.g., xylose isomerase, xylulokinase, L-arabinose isomerase and L-ribulose-5-phosphate 4-epimerase, are present in several Clostridial species and Thermoanaerobacterium saccharolyticum species, including, but not limited to, Clostridium cellulolyticum (see FIG. 4A-4B), Thermoanaerobacterium saccharolyticum, C. stercorarium, Caldiscellulosiruptor kristjanssonii, and C. phytofermentans; these strains are good utilizers of these sugars. It will be appreciated that the foregoing bacterial strains may be used as donors of the genes described herein.
[0138] C. phytofermentans express the two xylose pathway genes described above (xylose isomerase and xylulokinase), but lack or do not express the arabinose pathway genes described above (L-arabinose isomerase and L-ribulose-5-phosphate 4-epimerase) (see FIG. 5A-5B).
[0139] Accordingly, it is an object of the invention to modify some of the above-described bacterial strains so as to optimize sugar utilization capability by, for example, introducing genes for one or more enzymes required for the production of ethanol from biomass-derived pentoses, e.g., D-xylose or L-arabinose metabolism. Promoters, including the native promoters of C. thermocellum or C. straminisolvens, such as triose phosphate isomerase (TPI), GAPDH, and LDH, may be used to express these genes. The sequences that correspond to native promoters of C. thermocellum include (TPI) (SEQ ID NO:16), GAPDH (SEQ ID NO:17), and LDH (SEQ ID NO:18). Once the gene has been cloned, codon optimization may be performed before expression. Cassettes containing, for example, the native promoter, a xylanolytic gene or arabinolytic gene, and a selectable marker may then be used to transform C. thermocellum or C. straminisolvens and select for D-xylose and L-arabinose growth on medium containing D-xylose or L-arabinose as the sole carbohydrate source.
Transposons
[0140] To select for foreign DNA that has entered a host it is preferable that the DNA be stably maintained in the organism of interest. With regard to plasmids, there are two processes by which this can occur. One is through the use of replicative plasmids. These plasmids have origins of replication that are recognized by the host and allow the plasmids to replicate as stable, autonomous, extrachromosomal elements that are partitioned during cell division into daughter cells. The second process occurs through the integration of a plasmid onto the chromosome. This predominately happens by homologous recombination and results in the insertion of the entire plasmid, or parts of the plasmid, into the host chromosome. Thus, the plasmid and selectable marker(s) are replicated as an integral piece of the chromosome and segregated into daughter cells. Therefore, to ascertain if plasmid DNA is entering a cell during a transformation event through the use of selectable markers requires the use of a replicative plasmid or the ability to recombine the plasmid onto the chromosome. These qualifiers cannot always be met, especially when handling organisms that do not have a suite of genetic tools.
[0141] One way to avoid issues regarding plasmid-associated markers is through the use of transposons. A transposon is a mobile DNA element, defined by mosaic DNA sequences that are recognized by enzymatic machinery referred to as a transposase. The function of the transposase is to randomly insert the transposon DNA into host or target DNA. A selectable marker can be cloned onto a transposon by standard genetic engineering. The resulting DNA fragment can be coupled to the transposase machinery in an in vitro reaction and the complex can be introduced into target cells by electroporation. Stable insertion of the marker onto the chromosome requires only the function of the transposase machinery and alleviates the need for homologous recombination or replicative plasmids.
[0142] The random nature associated with the integration of transposons has the added advantage of acting as a form of mutagenesis. Libraries can be created that comprise amalgamations of transposon mutants. These libraries can be used in screens or selections to produce mutants with desired phenotypes. For instance, a transposon library of a CBP organism could be screened for the ability to produce more ethanol, or less lactic acid and/or less acetate.
Native Cellulolytic Strategy
[0143] Naturally occurring cellulolytic microorganisms are starting points for CBP organism development via the native strategy. Anaerobes and facultative anaerobes are of particular interest. The primary objective is to engineer product yields and ethanol titers to satisfy the requirements of an industrial process. Metabolic engineering of mixed-acid fermentations in relation to these objectives has been successful in the case of mesophilic, non-cellulolytic, enteric bacteria. Recent developments in suitable gene-transfer techniques allow for this type of work to be undertaken with cellulolytic bacteria.
Recombinant Cellulolytic Strategy
[0144] Non-cellulolytic microorganisms with desired product-formation properties (e.g., high ethanol yield and titer) are starting points for CBP organism development by the recombinant cellulolytic strategy. The primary objective of such developments is to engineer a heterologous cellulase system that enables growth and fermentation on pretreated lignocellulose. The heterologous production of cellulases has been pursued primarily with bacterial hosts producing ethanol at high yield (engineered strains of E. coli, Klebsiella oxytoca, and Zymomonas mobilis) and the yeast Saccharomyces cerevisiae. Cellulase expression in strains of K. oxytoca resulted in increased hydrolysis yields--but not growth without added cellulase--for microcrystalline cellulose, and anaerobic growth on amorphous cellulose. Although dozens of saccharolytic enzymes have been functionally expressed in S. cerevisiae, anaerobic growth on cellulose as the result of such expression has not been definitively demonstrated.
[0145] Aspects of the present invention relate to the use of thermophilic or mesophilic microorganisms as hosts for modification via the native cellulolytic strategy. Their potential in process applications in biotechnology stems from their ability to grow at relatively high temperatures with attendant high metabolic rates, production of physically and chemically stable enzymes, and elevated yields of end products. Major groups of thermophilic bacteria include eubacteria and archaebacteria. Thermophilic eubacteria include: phototropic bacteria, such as cyanobacteria, purple bacteria, and green bacteria; Gram-positive bacteria, such as Bacillus, Clostridium, Lactic acid bacteria, and Actinomyces; and other eubacteria, such as Thiobacillus, Spirochete, Desulfotomaculum, Gram-negative aerobes, Gram-negative anaerobes, and Thermotoga. Within archaebacteria are considered Methanogens, extreme thermophiles (an art-recognized term), and Thermoplasma. In certain embodiments, the present invention relates to Gram-negative organotrophic thermophiles of the genera Thermus, Gram-positive eubacteria, such as genera Clostridium, and also which comprise both rods and cocci, genera in group of eubacteria, such as Thermosipho and Thermotoga, genera of Archaebacteria, such as Thermococcus, Thermoproteus (rod-shaped), Thermofilum (rod-shaped), Pyrodictium, Acidianus, Sulfolobus, Pyrobaculum, Pyrococcus, Thermodiscus, Staphylothermus, Desulfurococcus, Archaeoglobus, and Methanopyrus. Some examples of thermophilic or mesophilic (including bacteria, procaryotic microorganism, and fungi), which may be suitable for the present invention include, but are not limited to: Clostridium thermosulfurogenes, Clostridium cellulolyticum, Clostridium thermocellum, Clostridium thermohydrosulfuricum, Clostridium thermoaceticum, Clostridium thermosaccharolyticum, Clostridium tartarivorum, Clostridium thermocellulaseum, Clostridium phytofermentans, Clostridium straminosolvens, Thermoanaerobacterium thermosaccarolyticum, Thermoanaerobacterium saccharolyticum, Thermobacteroides acetoethylicus, Thermoanaerobium brockii, Methanobacterium thermoautotrophicum, Anaerocellum thermophilium, Pyrodictium occultum, Thermoproteus neutrophilus, Thermofilum librum, Thermothrix thioparus, Desulfovibrio thermophilus, Thermoplasma acidophilum, Hydrogenomonas thermophilus, Thermomicrobium roseum, Thermus flavas, Thermus ruber, Pyrococcus furiosus, Thermus aquaticus, Thermus thermophilus, Chloroflexus aurantiacus, Thermococcus litoralis, Pyrodictium abyssi, Bacillus stearothermophilus, Cyanidium caldarium, Mastigocladus laminosus, Chlamydothrix calidissima, Chlamydothrix penicillata, Thiothrix carnea, Phormidium tenuissimum, Phormidium geysericola, Phormidium subterraneum, Phormidium bijahensi, Oscillatoria filiformis, Synechococcus lividus, Chloroflexus aurantiacus, Pyrodictium brockii, Thiobacillus thiooxidans, Sulfolobus acidocaldarius, Thiobacillus thermophilica, Bacillus stearothermophilus, Cercosulcifer hamathensis, Vahlkampfia reichi, Cyclidium citrullus, Dactylaria gallopava, Synechococcus lividus, Synechococcus elongatus, Synechococcus minervae, Synechocystis aquatilus, Aphanocapsa thermalis, Oscillatoria terebriformis, Oscillatoria amphibia, Oscillatoria germinata, Oscillatoria okenii, Phormidium laminosum, Phormidium parparasiens, Symploca thermalis, Bacillus acidocaldarias, Bacillus coagulans, Bacillus thermocatenalatus, Bacillus licheniformis, Bacillus pamilas, Bacillus macerans, Bacillus circulans, Bacillus laterosporus, Bacillus brevis, Bacillus subtilis, Bacillus sphaericus, Desulfotomaculum nigrificans, Streptococcus thermophilus, Lactobacillus thermophilus, Lactobacillus bulgaricus, Bifidobacterium thermophilum, Streptomyces fragmentosporus, Streptomyces thermonitrificans, Streptomyces thermovulgaris, Pseudonocardia thermophila, Thermoactinomyces vulgaris, Thermoactinomyces sacchari, Thermoactinomyces candidas, Thermomonospora curvata, Thermomonospora viridis, Thermomonospora citrina, Microbispora thermodiastatica, Microbispora aerata, Microbispora bispora, Actinobifida dichotomica, Actinobifida chromogens, Micropolyspora caesia, Micropolyspora faeni, Micropolyspora cectivugida, Micropolyspora cabrobrunea, Micropolyspora thermovirida, Micropolyspora viridinigra, Methanobacterium thermoautothropicum, Caldicellulosiruptor acetigenus, Caldicellulosiruptor saccharolyticus, Caldicellulosiruptor kristjanssonii, Caldicellulosiruptor owensensis, Caldicellulosiruptor lactoaceticus, variants thereof, and/or progeny thereof.
[0146] In certain embodiments, the present invention relates to thermophilic bacteria selected from the group consisting of Fervidobacterium gondwanense, Clostridium thermolacticum, Moorella sp., and Rhodothermus marinus.
[0147] In certain embodiments, the present invention relates to thermophilic bacteria of the genera Thermoanaerobacterium or Thermoanaerobacter, including, but not limited to, species selected from the group consisting of: Thermoanaerobacterium thermosulfurigenes, Thermoanaerobacterium aotearoense, Thermoanaerobacterium polysaccharolyticum, Thermoanaerobacterium zeae, Thermoanaerobacterium xylanolyticum, Thermoanaerobacterium saccharolyticum, Thermoanaerobium brockii, Thermoanaerobacterium thermosaccharolyticum, Thermoanaerobacter thermohydrosulfuricus, Thermoanaerobacter ethanolicus, Thermoanaerobacter brockii, variants thereof, and progeny thereof.
[0148] In certain embodiments, the present invention relates to microorganisms of the genera Geobacillus, Saccharococcus, Paenibacillus, Bacillus, and Anoxybacillus, including, but not limited to, species selected from the group consisting of: Geobacillus thermoglucosidasius, Geobacillus stearothermophilus, Saccharococcus caldoxylosilyticus, Saccharoccus thermophilus, Paenibacillus campinasensis, Bacillus flavothermus, Anoxybacillus kamchatkensis, Anoxybacillus gonensis, variants thereof, and progeny thereof.
[0149] In certain embodiments, the present invention relates to mesophilic bacteria selected from the group consisting of Saccharophagus degradans; Flavobacterium johnsoniae; Fibrobacter succinogenes; Clostridium hungatei; Clostridium phytofermentans; Clostridium cellulolyticum; Clostridium aldrichii; Clostridium termitididis; Acetivibrio cellulolyticus; Acetivibrio ethanolgignens; Acetivibrio multivorans; Bacteroides cellulosolvens; and Alkalibacter saccharofomentans, variants thereof and progeny thereof.
Methods of the Invention
[0150] During glycolysis, cells convert simple sugars, such as glucose, into pyruvic acid, with a net production of ATP and NADH. In the absence of a functioning electron transport system for oxidative phosphorylation, at least 95% of the pyruvic acid is consumed in short pathways which regenerate NAD.sup.+, an obligate requirement for continued glycolysis and ATP production. The waste products of these NAD.sup.+ regeneration systems are commonly referred to as fermentation products.
[0151] Microorganisms produce a diverse array of fermentation products, including organic acids, such as lactate (the salt form of lactic acid), acetate (the salt form of acetic acid), succinate, and butyrate, and neutral products, such as ethanol, butanol, acetone, and butanediol. End products of fermentation share to varying degrees several fundamental features, including: they are relatively nontoxic under the conditions in which they are initially produced, but become more toxic upon accumulation; and they are more reduced than pyruvate because their immediate precursors have served as terminal electron acceptors during glycolysis. Aspects of the present invention relate to the use of gene knockout technology to provide novel microorganisms useful in the production of ethanol from lignocellulosic biomass substrates. The transformed organisms are prepared by deleting or inactivating one or more genes that encode competing pathways, such as the non-limiting pathways to organic acids described herein, optionally followed by a growth-based selection for mutants with improved performance for producing ethanol as a fermentation product.
[0152] In certain embodiments, a thermophilic or mesophilic microorganism, which in a native state contains at least one gene that confers upon the microorganism an ability to produce lactic acid as a fermentation product, is transformed to decrease or eliminate expression of said at least one gene. The gene that confers upon said microorganism an ability to produce lactic acid as a fermentation product may code for expression of lactate dehydrogenase. The deletion or suppression of the gene(s) or particular polynucleotide sequence(s) that encode for expression of LDH diminishes or eliminates the reaction scheme in the overall glycolytic pathway whereby pyruvate is converted to lactic acid; the resulting relative abundance of pyruvate from these first stages of glycolysis should allow for the increased production of ethanol.
[0153] In certain embodiments, a thermophilic or mesophilic microorganism, which in a native state contains at least one gene that confers upon the microorganism an ability to produce acetic acid as a fermentation product, is transformed to eliminate expression of said at least one gene. The gene that confers upon the microorganism an ability to produce acetic acid as a fermentation product may code for expression of acetate kinase and/or phosphotransacetylase. The deletion or suppression of the gene(s) or particular polynucleotide sequence(s) that encode for expression of ACK and/or PTA diminishes or eliminates the reaction scheme in the overall glycolytic pathway whereby acetyl CoA is converted to acetic acid (FIG. 1); the resulting relative abundance of acetyl CoA from these later stages of glycolysis should allow for the increased production of ethanol.
[0154] In certain embodiments, the above-detailed gene knockout schemes can be applied individually or in concert. Eliminating the mechanism for the production of lactate (i.e., knocking out the genes or particular polynucleotide sequences that encode for expression of LDH) generates more acetyl CoA; it follows that if the mechanism for the production of acetate is also eliminated (i.e., knocking out the genes or particular polynucleotide sequences that encode for expression of ACK and/or PTA), the abundance of acetyl CoA will be further enhanced, which should result in increased production of ethanol.
[0155] In certain embodiments, it is not required that the thermophilic or mesophilic microorganisms have native or endogenous PDC or ADH. In certain embodiments, the genes encoding for PDC and/or ADH can be expressed recombinantly in the genetically modified microorganisms of the present invention. In certain embodiments, the gene knockout technology of the present invention can be applied to recombinant microorganisms, which may comprise a heterologous gene that codes for PDC and/or ADH, wherein said heterologous gene is expressed at sufficient levels to increase the ability of said recombinant microorganism (which may be thermophilic) to produce ethanol as a fermentation product or to confer upon said recombinant microorganism (which may be thermophilic) the ability to produce ethanol as a fermentation product.
[0156] In certain embodiments, aspects of the present invention relate to fermentation of lignocellulosic substrates to produce ethanol in a concentration that is at least 70% of a theoretical yield based on cellulose content or hemicellulose content or both.
[0157] In certain embodiments, aspects of the present invention relate to fermentation of lignocellulosic substrates to produce ethanol in a concentration that is at least 80% of a theoretical yield based on cellulose content or hemicellulose content or both.
[0158] In certain embodiments, aspects of the present invention relate to fermentation of lignocellulosic substrates to produce ethanol in a concentration that is at least 90% of a theoretical yield based on cellulose content or hemicellulose content or both.
[0159] In certain embodiments, substantial or complete elimination of organic acid production from microorganisms in a native state may be achieved using one or more site-directed DNA homologous recombination events.
[0160] Operating either a simultaneous saccharification and co-fermentation (SSCF) or CBP process at thermophilic temperatures offers several important benefits over conventional mesophilic fermentation temperatures of 30-37° C. In particular, costs for a process step dedicated to cellulase production are substantially reduced (e.g., 2-fold or more) for thermophilic SSCF and are eliminated for CBP. Costs associated with fermentor cooling and also heat exchange before and after fermentation are also expected to be reduced for both thermophilic SSCF and CBP. Finally, processes featuring thermophilic biocatalysts may be less susceptible to microbial contamination as compared to processes featuring conventional mesophilic biocatalysts.
[0161] The ability to redirect electron flow by virtue of modifications to carbon flow has broad implications. For example, this approach could be used to produce high ethanol yields in strains other than T. saccharolyticum and/or to produce solvents other than ethanol, for example, higher alcohols (i.e., butanol).
Metabolic Engineering Through Antisense Oligonucleotide (asRNA) Strategies
[0162] Fermentative microorganisms such as yeast and anaerobic bacteria ferment sugars to ethanol and other reduced organic end products. Theoretically, carbon flow can be directed to ethanol production if the formation of competing end-products, such as lactate and acetate, can be suppressed. The present invention provides several genetic engineering approaches designed to remove such competing pathways in the CBP organisms of the invention. The bulk of these approaches utilize knock-out constructs (for single crossover recombination) or allele-exchange constructs (for double crossover recombination) and target the genetic loci for ack and ldh. Although these tools employ "tried and true" strain development techniques, there are several potential issues that could stall progress: (i) they are dependent on the host recombination efficiency which in all cases is unknown for the CBP organisms; (ii) they can be used to knock out only one pathway at a time, so successive genetic alterations are incumbent upon having several selectable markers or a recyclable marker; (iii) deletion of target genes may be toxic or have polar effects on downstream gene expression.
[0163] The present invention provides additional approaches towards genetic engineering that do not rely on host recombination efficiency. One of these alternative tools is called antisense RNA (asRNA). Although antisense oligonucleotides have been used for over twenty-five years to inhibit gene expression levels both in vitro and in vivo, recent advances in mRNA structure prediction has facilitated smarter design of asRNA molecules. These advances have prompted a number of groups to demonstrate the usefulness of asRNA in metabolic engineering of bacteria.
[0164] The benefits of using asRNA over knock-out and allele-exchange technology are numerous: (i) alleviates the need for multiple selectable markers because multiple pathways can be targeted by a single asRNA construct; (ii) attenuation level of target mRNA can be adjusted by increasing or decreasing the association rate between asRNA; (iii) pathway inactivation can be conditional if asRNA transcripts are driven by conditional promoters. Recently, this technology has been used to increase solventogenesis in the Gram positive mesophile, Clostridium acetobutylicum (Tummala et al. (2003)). Although the exact molecular mechanism of how asRNA attenuates gene expression is unclear, the likely mechanism is triggered upon hybridization of the asRNA to the target mRNA. Mechanisms may include one or more of the following: (i) inhibition of translation of mRNA into protein by blocking the ribosome binding site from properly interacting with the ribosome, (ii) decreasing the half-life of mRNA through dsRNA-dependent RNases, such as RNase H, that rapidly degrade duplex RNA, and (iii) inhibition of transcription due to early transcription termination of mRNA.
Design of Antisense Sequences
[0165] asRNAs are typically 18-25 nucleotides in length. There are several computation tools available for rational design of RNA-targeting nucleic acids (Sfold, Integrated DNA Technologies, STZ Nucleic Acid Design) which may be used to select asRNA sequences. For instance, the gene sequence for Clostridium thermocellum ack (acetate kinase) can be submitted to a rational design server and several asRNA sequences can be culled. In brief, the design parameters select for mRNA target sequences that do not contain predicted secondary structure.
Design of Delivery Vector
[0166] A replicative plasmid will be used to deliver the asRNA coding sequence to the target organism. Vectors such as, but not limited to, pNW33N, pJIR418, pJIR751, and pCTC1, will form the backbone of the asRNA constructs for delivery of the asRNA coding sequences to inside the host cell. In addition to extra-chromosomal (plasmid based) expression, asRNAs may be stably inserted at a heterologous locus into the genome of the microorganism to get stable expression of asRNAs. In certain embodiments, strains of thermophilic or mesophilic microorganisms of interest may be engineered by site directed homologous recombination to knockout the production of organic acids and other genes of interest may be partially, substantially, or completely deleted, silenced, inactivated, or down-regulated by asRNA.
Promoter Choice
[0167] To ensure expression of asRNA transcripts, compatible promoters for the given host will be fused to the asRNA coding sequence. The promoter-asRNA cassettes are constructed in a single PCR step. Sense and antisense primers designed to amplify a promoter region will be modified such that the asRNA sequence (culled from the rational design approach) is attached to the 5' end of the antisense primer. Additionally, restriction sites, such as EcoRI or BamHI, will be added to the terminal ends of each primer so that the final PCR amplicon can be digested directly with restriction enzymes and inserted into the vector backbone through traditional cloning techniques.
[0168] With respect to microorganisms that do not have the ability to metabolize pentose sugars, but are able to metabolize hexose sugars as described herein, it will be appreciated that the ack and ldh genes of Clostridium thermocellum and Clostridium straminisolvens, for example, may be targeted for inactivation using antisense RNA according to the methods described herein.
[0169] With respect to microorganisms that confer the ability to metabolize pentose and hexose sugars as described herein, it will be appreciated that the ack and ldh genes of Clostridium cellulolyticum, Clostridium phytofermentans and Caldicellulosiruptor kristjanssonii, for example, may be targeted for inactivation using antisense according to the methods described herein.
[0170] In addition to antibiotic selection for strains expressing the asRNA delivery vectors, such strains may be selected on conditional media that contains any of the several toxic metabolite analogues such as sodium fluoroacetate (SFA), bromoacetic acid (BAA), chloroacetic acid (CAA), 5-fluoroorotic acid (5-FOA) and chlorolactic acid. Use of chemical mutagens including, but not exclusively, ethane methyl sulfonate (EMS) may be used in combination with the expression of antisense oligonucleotide (asRNA) to generate strains that have one or more genes partially, substantially, or completely deleted, silenced, inactivated, or down-regulated.
EXEMPLIFICATION
[0171] The invention now being generally described, it will be more readily understood by reference to the following examples, which are included merely for purposes of illustration of certain aspects and embodiments of the present invention, and are not intended to limit the invention.
Example 1
Generation of Custom Transposons for Mesophilic and Thermophilic Cellulolytic, Xylanolytic Organisms
[0172] The present invention provides methods for generating custom transposons for cellulolytic and/or xylanolytic and/or thermophilic organisms. To do this, a native promoter from the host organism will be fused to a selectable marker which has been determined to work in this organism. This fragment will be cloned into the EZ-Tn5® transposon that is carried on the vector pMOD®-2<MCS> (Epicenter®Biotechnologies). For example, the C. thermocellum the gapDH promoter will be fused to the mLs drug marker, as well as the cat gene and then subcloned into vector pMOD®-2<MCS>.
[0173] Commercial transposons are lacking in thermostable drug markers and native promoters of cellulolytic and/or xylanolytic and/or thermophilic organisms. The mLs and cat markers have functioned in thermophilic bacteria and the gapDH promoter regulates a key glycolytic enzyme and should be constantly expressed. The combination of the above drug markers and the gapDH promoter will greatly enhance the probability of generating a functional transposon. This approach may be applied to other cellulolytic and/or xylanolytic and/or thermophilic organisms.
Experimental Design
[0174] FIG. 26 is a diagram taken from the Epicenter®Biotechnologies user manual, which is incorporated herein by reference, representing by 250-550 of pMOD®-2<MCS>. In the top portion, the black arrowheads labeled ME denote 19 bp mosaic ends that define the transposon. The EcoRI and HindIII sites define the multi-cloning site, which is represented by the black box labeled MCS. In the bottom portion, the DNA sequence and the restriction enzymes associated with the MCS are shown.
[0175] The following primers will be used to amplify promoter fusion fragments from pMQ87-gapDH-cat and pMQ87-gapDH-mls: GGCGgaattc CTT GGT CTG ACA ATC GAT GC (SEQ ID NO:19); GGCGgaattc TATCAGTTATTACCCACTTTTCG (SEQ ID NO:20). The lower case letters denote engineered EcoRI restriction sites. The size of the amplicon generated will be ˜1.9 kb. Standard molecular procedures will allow the amplicon to be digested with EcoRI and cloned into the unique EcoRI site of pMOD®-2<MCS>. The transposon and subsequent transpososome will be generated and introduced into host organisms as described by the manufacturer.
Example 2
Constructs for Engineering Cellulolytic and Xylanolytic Strains
[0176] The present invention provides compositions and methods for genetically engineering an organism of interest to CBP by mutating genes encoding key enzymes of metabolic pathways which divert carbon flow away from ethanol. Single crossover knockout constructs are designed so as to insert large fragments of foreign DNA into the gene of interest to partially, substantially, or completely delete, silence, inactivate, or down-regulate it. Double crossover knockout constructs are designed so as to partially, substantially, or completely delete, silence, inactivate, or down-regulate the gene of interest from the chromosome or replace the gene of interest on the chromosome with a mutated copy of the gene, such as a form of the gene interrupted by an antibiotic resistance cassette.
[0177] The design of single crossover knockout vectors requires the cloning of an internal fragment of the gene of interest into a plasmid based system. Ideally, this vector will carry a selectable marker that is expressed in the host strain but will not replicate in the host strain. Thus, upon introduction into the host strain the plasmid will not replicate. If the cells are placed in a conditional medium that selects for the marker carried on the plasmid, only those cells that have found a way to maintain the plasmid will grow. Because the plasmid is unable to replicate as an autonomous DNA element, the most likely way that the plasmid will be maintained is through recombination onto the host chromosome. The most likely place for the recombination to occur is at a region of homology between the plasmid and the host chromosome.
[0178] Alternatively, replicating plasmids can be used to create single crossover interruptions. Cells that have taken up the knockout vector can be selected on a conditional medium, then passaged in the absence of selection. Without the positive selection provided by the conditional medium, many organisms will lose the plasmid. In the event that the plasmid is inserted onto the host chromosome, it will not be lost in the absence of selection. The cells can then be returned to a conditional medium and only those that have retained the marker, through chromosomal integration, will grow. A PCR based method will be devised to screen for organisms that contain the marker located on the chromosome.
[0179] The design of double crossover knockout vectors requires at least cloning the DNA flanking (˜1 kb) the gene of interest into a plasmid and in some cases may include cloning the gene of interest. A selectable marker may be placed between the flanking DNA or if the gene of interest is cloned the marker is placed internally with respect to the gene. Ideally the plasmid used is not capable of replicating in the host strain. Upon the introduction of the plasmid into the host and selection on a medium conditional to the marker, only cells that have recombined the homologous DNA onto the chromosome will grow. Two recombination events are needed to replace the gene of interest with the selectable marker.
[0180] Alternatively, replicating plasmids can be used to create double crossover gene replacements. Cells that have taken up the knockout vector can be selected on a conditional medium, then passaged in the absence of selection. Without the positive selection provided by the conditional medium, many organisms will lose the plasmid. In the event that the drug marker is inserted onto the host chromosome, it will not be lost in the absence of selection. The cells can then be returned to a conditional medium and only those that have retained the marker, through chromosomal integration, will grow. A PCR based method may be devised to screen for organisms that contain the marker located on the chromosome.
[0181] In addition to antibiotic selection schemes, several toxic metabolite analogues such as sodium fluoroacetate (SFA), bromoacetic acid (BAA), chloroacetic acid (CAA), 5-fluoroorotic acid (5-FOA) and chlorolactic acid may be used to select mutants arising from either homologous recombinations, or transposon-based strategies. Use of chemical mutagens including, but not exclusively, ethane methyl sulfonate (EMS) may be used in combination with the directed mutagenesis schemes that employ homologous recombinations, or transposon-based strategies.
C. cellulolyticum Knockout Constructs Acetate Kinase (Gene 131 from C. cellulolyticum Published Genome):
[0182] Single Crossover
[0183] The acetate kinase gene of C. cellulolyticum is 1,110 bp in length. A 662 bp internal fragment (SEQ ID NO:21) spanning nucleotides 91-752 was amplified by PCR and cloned into suicide vectors and replicating vectors that have different selectable markers. Selectable markers may include those that provide erythromycin and chloramphenicol resistance. These plasmids will be used to disrupt the ack gene. A map of the ack gene and the region amplified by PCR for gene disruption are shown in FIG. 19. The underlined portions of SEQ ID NO:21 set forth below correspond to the sites that are EcoRI sites that flank the knockout fragment.
TABLE-US-00004 gaattctgcgacagaatagggattgacaattcctttataaagcaatcaag gggttcagaagaggctgttattttgaataaagagctaaagaatcacaaag atgcaatagaggctgttatttctgcactgactgacgataatatgggcgtt ataaaaaacatgtccgaaatatcagcagtgggacacagaatagtacacgg cggtgaaaaattcaacagttctgtagttatagatgaaaacgttatgaatg cagtaagagagtgtatagacgttgcaccgcttcataatccgccgaatatt ataggtatagaggcttgccagcagattatgcccaatatacctatggtagc tgtatttgataccactttccacagctccatgcctgattatgcataccttt acgcattgccatatgaactttatgaaaagtacggtataagaaaatatggt ttccacggaacatcacacaaatatgttgcagaaagagcttctgcaatgct tgataagtctttgaacgaattaaagataattacatgccatcttgggaacg gttcaagtatttgtgctgttaacaagggtaaatcaattgatacttccatg ggctttacacctttgcagggacttgcaatgggtacaagaagcggtacaat agaccctgaagttgttacgaattc
These sites were engineered during the design of the "ack KO primers" and will allow subsequent cloning of the fragment into numerous vectors.
[0184] Double Crossover
[0185] To construct a double crossover vector for the ack gene of C. cellulolyticum ˜1 kb of DNA flanking each side of the ack gene will be cloned. A selectable marker will be inserted between the flanking DNA. Selectable markers may include those that provide erythromycin and chloramphenicol resistance. The 3' flanking region of the ack gene is not available in the available draft genome. To acquire this DNA, a kit such as GenomeWalker from Clontech will be used.
Lactate Dehydrogenase (Genes 2262 and 2744 of C. cellulolyticum Published Genome):
[0186] Single Crossover
[0187] The ldh genes of C. cellulolyticum are 951 bp (for gene 2262) (SEQ ID NO:22) and 932 by (for gene 2744) (SEQ ID NO:23) in length. A ˜500 bp internal fragment near the 5' end of each gene will be amplified by PCR and cloned into suicide vectors and replicating vectors that have different selectable markers. Selectable markers may include those that provide drug resistance, such as erythromycin and chloramphenicol. These plasmids will be used to disrupt the ldh 2262 and ldh 2744 genes. As an example, a map of the ldh 2262 gene and the region amplified by PCR for gene disruption are shown in FIG. 20.
[0188] Double Crossover
[0189] To construct a double crossover knockout vector for the ldh gene(s) of C. cellulolyticum ˜1 kb of DNA flanking each side of the ldh gene(s) will be cloned. A selectable marker will be inserted between the flanking DNA. Selectable markers may include those that provide drug resistance, such as erythromycin and chloramphenicol. FIG. 21 provides an example of C. cellulolyticum ldh (2262) double crossover knock out fragment.
[0190] In the sequence set forth below (SEQ ID NO:24) the mLs gene (selectable marker) is underlined and the flanking DNA is the remaining sequence. During primer design, restriction sites will be engineered and the 5' and 3' ends of the above fragment so that it can be cloned into a number of replicative and non-replicative vectors. The same strategy will be used to create a vector to delete ldh 2744.
TABLE-US-00005 gacgcatacaggttgtaacacccatttcccttagcttttcgggagatgaa taaaacaaactttccgggtcctttaccacaccgcccacataaagagctat gccgcatgaaagaaacgatatgttatcatttttttcgtaaactgttattt ccgaacccggataaagctttaccatattattaactgctgccgtccctgca tgtgtacaccctataaccactattttcatatacatcctcctttgtttgct tgtaaatatatcccatatataccacctaaatatattttataaacaaattc ggtatatcattcttttggtaaataaaaagtacatccgatattagaatgta cctaaaaaaaattattattttattgtatatgctttatctgttttcattat atggtttgctatccattctacggtaaaatcaagtaattccattaagtact gatcctgatccttgtctatcctgctataatccgtattactgattttctca ataaaatcatggtgttcaactttgtgggagagaagcttgcgatatcctat gctatgcatgtattcttcttcataggtaaaatgaaagacagtgtaatctt ttagttccgtaattagccgtacaatttcatcatatttgtctgtaataagc tgatttttcgtggcctcataaatttccgaagcaatctggaatagtttctt atgctgttcgtcgattttctcaattccaagaataaattcgtctctccatt ctatcatatggaccctcctaaattgtaatgtataccaagattatacatac ttcctagaatataaacaatacaaggataaaattttaatatcgtataccta cataaatgactaacttaaagctctctaaaacttcttttttattatttcta tactactaaaatcaaaaatattctctaaagtatttctacaaatgttgttt ttgcaacaaagtagtatacttttgcacccagaatgttttgttataactta caaattaggggtatatttatagtaaatactaaatggaagagtaggatatt gattatgaacgagaaaaatataaaacacagtcaaaactttattacttcaa aacataatatagataaaataatgacaaatataagattaaatgaacatgat aatatctttgaaatcggctcaggaaaagggcattttacccttgaattagt acagaggtgtaatttcgtaactgccattgaaatagaccataaattatgca aaactacagaaaataaacttgttgatcacgataatttccaagttttaaac aaggatatattgcagtttaaatttcctaaaaaccaatcctataaaatatt tggtaatataccttataacataagtacggatataatacgcaaaattgttt ttgatagtatagctgatgagatttatttaatcgtggaatacgggtttgct aaaagattattaaatacaaaacgctcattggcattatttttaatggcaga agttgatatttctatattaagtatggttccaagagaatattttcatccta aacctaaagtgaatagctcacttatcagattaaatagaaaaaaatcaaga atatcacacaaagataaacagaagtataattatttcgttatgaaatgggt taacaaagaatacaagaaaatatttacaaaaaatcaatttaacaattcct taaaacatgcaggaattgacgatttaaacaatattagctttgaacaattc ttatctcttttcaatagctataaattatttaataagatcccctttacttc ggatgcatgccgcaggcaggcatccgaagtagtttctccattatacaagt attctcttgagtacgtcgtcgcttctcagcagctgctttgctttttccct gttttccggcacatggagataagtgtatctgttaggcttaatagtgtgtg ccatgtcaattgccttttcgaagtcatctgccttcatttttaaggtttcc acaaaattgataaaacccgtatcagtcagaaattttactacccgctgata tctgtgttcttgaaccctgctcataagataggttgcaatcccaacctgaa ttccatgaagctgaggtgtctccagcagcttatctaaagcatgagatatt agatgctcactaccgctggctggagcactgctgtctgctatctgcatggc aattccgctcattgtcagagagtctaccatttcctttaaaaagaagtttt ctgtaacctgtgtgtagggcatccttacaatactgtttactgacttttta gcaatcattgcagcaaaatcgtcaacctttgccgcattgttcctttcttc aaaataccagtcatacacagccgtaattttggatattatgtctccgagac ctgaataaataaatttcataggtgcattttttaatacatctaaatccact aatattccaaatggcatcgaggcatgtacggaagtacgcctgccatttat aatcaaagagcagcctgagctggaaaaaccatcgtttgaggttgatgtag gtatactgataaaaggaagcttgtttaaaaaagctatatatttggctgca tcaagcacctttcctcctcctactccgaccactgcatcggttttggaggg aatagtaaaagccttgagcataagattttcaagctttatgtcatcatagt cgtaagtttcaagtactgcaagagattttcttgactttatggaatccaga atcttttcaccaaataagtcacgtattccctctccaaaaagtactacaac attactaattcctgccctttcaatatgtgc
C. phytofermentans Knockout Constructs For Acetate Kinase (Gene 327 from C. phytofermentans Published Genome):
[0191] Single Crossover
[0192] The acetate kinase gene of C. phytofermentans is 1,244 bp in length. A 572 bp internal fragment spanning nucleotides 55-626 will be amplified by PCR and cloned into suicide vectors and replicating vectors that have different selectable markers. Selectable markers to use will include those that provide drug resistance to C. phytofermentans. These plasmids will be used to disrupt the ack gene. A map of the ack gene and the region amplified by PCR for gene disruption are shown in FIG. 22. Restriction sites will be engineered during the design of the "ack KO primers" and will allow subsequent cloning of the fragment into numerous vectors. The sequence of the knockout fragment described above is set forth as SEQ ID NO:25.
[0193] Double Crossover
[0194] To construct a double crossover knockout vector for the ack gene of C. phytofermentans ˜1 kb of DNA flanking each side of the ack gene will be cloned. A selectable marker will be inserted between the flanking DNA. Selectable markers to use will include those that provide drug resistance to this strain. An example of a putative double crossover knockout construct with the mLs gene as a putative selectable marker is shown in FIG. 23.
[0195] The sequence that corresponds to the fragment depicted in FIG. 23 (SEQ ID NO:26) is set forth below. The mLs gene (putative selectable marker) is underlined and the remainder of the sequence corresponds to the flanking DNA. During primer design, restriction sites will be engineered and the 5' and 3' ends of the above fragment so that it can be cloned into a number of replicative and non-replicative vectors.
TABLE-US-00006 ctgagtgcaatgtaaaaaaggatgcctcaagtattcttgaaacatcctta tattatactacaaaatcataaagtaaattactcagctgtagcaatgatct cttttttgttgtaagatccacaagctttacaaactctatgaggcatcata agtgcaccacacttgctgcatttcactaagtttggagcagtcatcttcca gtttgcacgacgactatctcttctagctttggaatgtttattctttggac aaatagctcccattgattacacctccttaaacttgttaaaaatatctcgg atagcagacattcttgggtctagttctgtacggtcacacccgcactctcc ttcatttaggttagcaccgcagaccttgcagattcctttacagtcttctt tgcacagaaccttcattgggaaaccaatcaagacttcttcatagataagt ttatctacgtctaaatcatatccggaaacaaaatttgtttcatctaaatc ctcggtacgctgttcctctgttttcgatacatcaatctctgtagccacgt cgatgtcttgttggatggtttcttccttcaaacaacgatcgcaaggaacg gctaacgctaatttcgtttttgcttccaccagaatttttcggccacctag attagttaatctaagtttaaccggttctttataggtaatagaataaccga caccatttaattcgaatatatcaaattcaatcggtgcagtgtattctttg agaccattaggaacattcatgacttcagacatttgtatcagcataagtaa ctcctgtctaaaaaaacgcataatgtaagcgcccaaaaattcacactgtt agtattataaacgcttaaaataggtttgtcaactcctaactgttaaaaat gtcagaattgtgtaaccatattttctcttcattatcgttcttcccttatt aaataatttatagctattgaaaagagataagaattgttcaaagctaatat tgtttaaatcgtcaattcctgcatgttttaaggaattgttaaattgattt tttgtaaatattttcttgtattctttgttaacccatttcataacgaaata attatacttctgtttatctttgtgtgatattcttgatttttttctattta atctgataagtgagctattcactttaggtttaggatgaaaatattctctt ggaaccatacttaatatagaaatatcaacttctgccattaaaaataatgc caatgagcgttttgtatttaataatcttttagcaaacccgtattccacga ttaaataaatctcatcagctatactatcaaaaacaattttgcgtattata tccgtacttatgttataaggtatattaccaaatattttataggattggtt tttaggaaatttaaactgcaatatatccttgtttaaaacttggaaattat cgtgatcaacaagtttattttctgtagttttgcataatttatggtctatt tcaatggcagttacgaaattacacctctgtactaattcaagggtaaaatg cccttttcctgagccgatttcaaagatattatcatgttcatttaatctta tatttgtcattattttatctatattatgttttgaagtaataaagttttga ctgtgttttatatttttctcgttcattgtatttctccttataatgttctt aaattcatttatcacggggcaacttaatatatccgaaatatagttcttct atatcgttcccccagtataatgattattatactatttaatcttcaactta acaattggagtttccagttaagaaataataatttaatgccaaagcggata ttcgcaatccgcttacgctacttgctcataacctcaacaggcaatgaagc taagttaattatttactctgtgcctgaacagcagtgattgcaacaacacc aacgatatcatcagaagaacaacctcttgataaatcatttactggagctg caataccctgagttaatggtccataagcttctgcctttgcaagacgctgt gttaacttatatccaatgttaccagcatcaaggtctgggaagattaatac gttagcttttccagcaatatcactaccaggagcttttgaagcacctacac taggaacgattgctgcatctaactggaactcgccgtcgatcttatattct gggtataattcatttgcaatcttagttgcttctacaaccttatcaacatc tgcatgctttgcgcttccctttgttgaatgagaaagcatagctacgatag gttcagagccaactaattgttcaaaactcttcgctgtggaaccagcgatt gctgctaactcttcagcatttggattctgatttaaaccagcatcagagaa aaggaaagttccatttgcgcccatatcacaattaggtactaccattacga agaaagcagaaactaacttagtatttggagcagtttttaaaatctgaaga catggtcttaaggtatctgctgtagagtgacaagcaccagatactaaacc atctgcatcgcccatcttaaccatcattacaccgtatgtaatgtagtctg ttgttaaaagctcttttgctttttcaggggtcatgccttttgcctgtcta agttctacaagcttgttaatgtaagc
For Lactate Dehydrogenase (Genes 1389 and 2971 of C. phytofermentans Published Genome)
[0196] Single Crossover
[0197] The ldh genes of C. phytofermentans are 978 bp (for gene 1389) (SEQ ID NO:27) and 960 bp (for gene 2971) (SEQ ID NO:28) in length. A ˜500 bp internal fragment near the 5' end of each gene will be amplified by PCR and cloned into suicide vectors and replicating vectors that have different selectable markers. Selectable markers to use will include those that provide drug resistance. These plasmids will be used to disrupt the ldh 1389 and ldh 2971 genes. As an example, a map of the ldh 1389 gene and the region amplified by PCR for gene disruption are shown in FIG. 24.
[0198] Double Crossover
[0199] To construct a double crossover knockout vector for the ldh gene(s) of C. phytofermentans ˜1 kb of DNA flanking each side of the ldh gene(s) will be cloned. A selectable marker will be inserted between the flanking DNA. Selectable markers to use will include those that provide drug resistance to this strain. An example of a putative double crossover knockout construct with the mLs gene as a putative selectable marker is shown in FIG. 25.
[0200] The sequence that corresponds to the fragment depicted in FIG. 25 is set forth below as SEQ ID NO:29. The mLs gene (selectable marker) is underlined and the remaining portion of the sequence corresponds to the flanking DNA. During primer design, restriction sites will be engineered and the 5' and 3' ends of the above fragment so that it can be cloned into a number of replicative and non-replicative vectors. The same strategy will be used to create a vector to delete ldh 2971.
TABLE-US-00007 tggaatctcactatgcaccaatgtggtactaaattatatctttatctatg gaaaattaggttttccgcgaatggagatagagggagctgccattgctact ttaatttgtagaattcttgagagtattttagttgttatttatatgtataa gggtgagaaggtacttaagatgagactttcttatatttttaagagatcta aacagtattttcgctctttggctcgttatagtgcgccagtgcttatgagt gaggttaactgggggcttgggattgctgttcagtctgcaatcattgggcg tatgggtgttagttttcttacagccgccagcttcattaatgtagtacaac agttagccggaatcattctgattggtattggtgtgggttcgagcattata atagggaatttgattggtgagggaaaagagcatgaggcgagaatgctagc caataagttaatacgtatcagtatgatactcggaggaattgttgcttttg cagtaatcttactacgtccaatcgctcctaactttattgaggcgtctaag gaaacagcggatttaattcgtcagatgctatttgtttcggcttacctctt attcttccaagccttatctgtattaactatggccggaatattacgtggtg caggggataccctttactgtgcaacctttgatgttttgaccttatgggta ctaaaacttggaggaggtttgcttgcaaccatagtacttcatcttccacc tgtatgggtttactttatcttaagtagcgatgagtgtgttaaagcgctat ttacggtaccgcgggtcttaaagggacgttggattcatgatacaacactg cattaagatttcatatgtccagatatttttgcacagtagcataattacta gagcttattcctataatattcataggttttgatggtccattttacgttac gatagcatatattacatcaaaaccaattctatataagatgaggttatagt atgaacgagaaaaatataaaacacagtcaaaactttattacttcaaaaca taatatagataaaataatgacaaatataagattaaatgaacatgataata tctttgaaatcggctcaggaaaagggcattttacccttgaattagtacag aggtgtaatttcgtaactgccattgaaatagaccataaattatgcaaaac tacagaaaataaacttgttgatcacgataatttccaagttttaaacaagg atatattgcagtttaaatttcctaaaaaccaatcctataaaatatttggt aatataccttataacataagtacggatataatacgcaaaattgtttttga tagtatagctgatgagatttatttaatcgtggaatacgggtttgctaaaa gattattaaatacaaaacgctcattggcattatttttaatggcagaagtt gatatttctatattaagtatggttccaagagaatattttcatcctaaacc taaagtgaatagctcacttatcagattaaatagaaaaaaatcaagaatat cacacaaagataaacagaagtataattatttcgttatgaaatgggttaac aaagaatacaagaaaatatttacaaaaaatcaatttaacaattccttaaa acatgcaggaattgacgatttaaacaatattagctttgaacaattcttat ctcttttcaatagctataaattatttaataagaagtaataggaaataata ctcgaattattctgcaatctgttctaaaaaataaaattaagaaattacta tagcaagccaggttaaaattactagcttgctatttttgtgcatttagtac agttttgattattaaagaataaatttaataactattttgcaataagttat tgactatttcacaagttagtgttactatacaagtatgaaataaagataca taaaaaaataaataatatgaaacataaattcatgacatgcggaatagaat gaaagaatattatgtcggttcctaatactaaatggatataacaatctatt gaaacacttatggggtgtaagtgtggagagaatttctaaagcgccaaaag actctacatatgaaattctaaagcttcacacgggaataatctaatttatg tatcttattatcataattcaggaaggtagtgtgaaaatataaaaattagt tttcctgtttcattcaggcagtagcatttcttaaacaaatttgctatgca ttgggtgttatctgaaaaacaaaaagcaattttctcacaacttatttctg aacaacaatggtattaaaaatttggaggaggattttactatgaaaaaaac ggtaacattactgttggttctgaccatggtggtaagcttatttgcagcat gtggtaagaaaaatggatcaagcgaaaccggcacaaaagatcctgtggca acaagcggtgcaaaagaacctgacaaacaagatccaggcaataaagagcc tgaaaaacaagaccctgttaaaatcaagatttattactctgataatgcaa ccttaccatttaaagaagattggttagttataaaggaagctgagaagaga tttaatgttgatttcgatttcgaagtaattccaattgcagattatcaaac aaaagtttctttaacattaaatacaggaaataacgctccagatgtcatcc tttatcagtcaacgcagggagagaatgcatct
Cald. kristjanssonii and C. stercorarium subs leptospartum
[0201] To the best of our knowledge, genome sequencing of the above organisms has not occurred and if it has, it has not been made available to the public. Based on our experimental results these organisms are cellulolytic and xylanolytic. The DNA sequences of genes encoding key metabolic enzymes are needed from these organisms in order to genetically engineer them and divert carbon flow to ethanol. These include such enzymes as acetate kinase and lactate dehydrogenase. In order to obtain the sequences of these genes, the genomes of these organisms will be sequenced.
[0202] With access to genome sequences, the conserved nature of the above enzymes may be used to find the encoding genes and flanking DNA. These sequences will be used to design constructs for targeted mutagenesis employing both single and double crossover strategies. These strategies will be identical to those described above. We will also determine which antibiotics can be used as selectable markers in these organisms and which protocols for transformation work best.
Example 3
Transformation of C. cellulolyticum
[0203] Cells were grown in 50 mL of GS media with 4 g/1 cellobiose to an OD of 0.8 in anaerobic conditions, incubated at 34 degrees C. After harvesting they were washed 3 times in equal volumes with a wash buffer containing 500 mM sucrose and 5 mM MOPS with pH adjusted to 7. After the final wash, the cell pellet was resuspended in an equal volume of wash buffer 10 ul aliquots of the cell suspension were placed in a standard electroporation cuvette with a 1 mm electrode spacing. 1 ul plasmid DNA was added. The concentration of the plasmid DNA was adjusted to ensure between a 1:1 and 10:1 molar ratio of plasmid to cells. A 5 ms pulse was applied with a field strength of 7 kV/cm (measured) across the sample. A custom pulse generator was used. The sample was immediately diluted 1000:1 with the same media used in the initial culturing and allowed to recover until growth resumed, and was determined via an increase in the OD (24-48 h). The recovered sample was diluted 50:1 and placed in selective media with either 15 ug/mL erythromycin or 15 ug/mL chloramphenicol and allowed to grow for 5-6 days. Samples exhibiting growth in selective media were tested to confirm that they were in fact C. cellulolyticum and that they had the plasmid.
Example 4
Constructs for Engineering Cellulolytic Strains
[0204] Cellulose is one of the main components of biomass, which can be potentially used as a substrate for generation of fuel ethanol by fermentation with Clostridium thermocellum. However, in this process, much energy and carbon sources are used to form by-product acetate and lactate. Engineering of the metabolic pathways of cellulose utilization in Clostridium thermocellum is necessary to minimize the lactate and acetate production and make energy and carbon flows favorable to ethanol formation.
[0205] Acetate kinase is an important enzyme in the metabolic pathway of cellulose utilization to form acetate in Clostridium thermocellum, which is encoded by the ack gene. Inactivation of the ack gene may interrupt acetate kinase, leading to reduction or elimination of acetate.
[0206] Lactate dehydrogenase is an important enzyme in the metabolic pathway of cellulose utilization to form lactate in Clostridium thermocellum, which is encoded by the ldh gene. Inactivation of the ldh gene may interrupt lactate dehydrogenase, leading to reduction or elimination of lactate generation.
Inactivation of the Ack Gene in C. thermocellum Based on the Plasmid pIKM1
[0207] To knock out the ack gene, a vector is constructed on the multiple cloning sites (MCS) of the plasmid pIKM1, in which the cat gene, encoding chloramphenicol acetyltransferase, is inserted into a DNA fragment of 3055 bp, involving the ack and the pta genes (encoding phosphotransacetylase), leading to knockout of 476 bp of the ack gene and 399 bp of the pta gene, and forming 1025 bp and 1048 bp flanking regions on both sides of the mLs gene respectively (FIG. 7). pNW33N contains pBC1 replicon, which is isolated from Bacillus coagulans and Staphylococcus aureus, and is anticipated to be stably replicated in Gram positive strains of bacteria, including Clostridium thermocellum. The sequence of the ack knockout vector constructed on plasmid pIKM1 is set forth as SEQ ID NO:1.
Inactivation of the Ack Gene in C. thermocellum Based on the Replicative Plasmid pNW33N
[0208] To knock out the ack gene, a vector is constructed on the multiple cloning sites (MCS) of the replicative plasmid pNW33N, in which the macrolide, lincosamide, and streptogramin B (MLSB) resistant gene mLs is inserted into a DNA fragment of 3345 bp, which includes the ack gene, the pta gene (encoding phosphotransacetylase) and an unknown upstream gene, leading to knockout of 855 bp of the ack gene and formation of flanking regions of 1195 bp and 1301 bp on either side of the mLs gene (FIG. 8). pNW33N contains pBC1 replicon, which is isolated from Bacillus coagulans and Staphylococcus aureus, and is anticipated to be stably replicated in Gram positive strains of bacteria, including Clostridium thermocellum. The sequence of the ack knockout vector constructed on plasmid pNW33N is set forth as SEQ ID NO:2.
Inactivation of the ldh Gene in C. thermocellum Based on the Plasmid pIKM1
[0209] To knock out the ldh gene, a vector is constructed on the multiple cloning sites (MCS) of the plasmid pIKM1, in which the cat gene, encoding chloramphenicol acetyltransferase, is inserted into a DNA fragment of 3188 bp, involving the ldh and the mdh gene (encoding malate dehydrogenase), leading to knockout of a DNA fragment of 1171 bp, including part of the ldh and mdh genes, and forming 894 bp and 1123 bp flanking regions on both sides of the mLs gene, respectively (FIG. 9). The sequence of the ldh knockout vector constructed on plasmid pIKM1 is set forth as SEQ ID NO:3.
Inactivation of the ldh Gene in C. thermocellum Based on Plasmid pNW33N
[0210] To knock out the ldh gene, a vector is constructed on the multiple cloning sites (MCS) of the replicative plasmid pNW33N, in which the macrolide, lincosamide, and streptogramin B (MLSB) resistant gene mLs is inserted into a DNA fragment of 2523 bp, which includes the ldh gene and the mdh gene (encoding malate dehydrogenase), leading to knocking out of a fragment of 489 bp of the ldh gene and formation of flanking regions of 1034 bp and 1000 bp on either side of the mLs gene (FIG. 10). pNW33N contains pBC1 replicon, which is isolated from Bacillus coagulans and Staphylococcus aureus, and is anticipated to be stably replicated in other Gram positive strains of bacteria, including Clostridium thermocellum. The sequence of the ldh knockout vector constructed on plasmid pNW33N is set forth as SEQ ID NO:4.
Inactivation of the ldh Gene in Clostridium thermocellum Based on Plasmid pUC19
[0211] To knock out the ldh gene, a vector is constructed on the multiple cloning sites (MCS) of the pUC19 plasmid, in which a gene encoding chloramphenicol acetyltransferase (the cat gene) is inserted into a ldh gene fragment of 717 bp, leading to a flanking region of 245 bp and 255 bp on either side of the cat gene (FIG. 11). pUC19 is an E. coli plasmid vector, containing pMB1 origin, which cannot be amplified in Gram positive strains of bacteria, including Clostridium thermocellum. A similar vector may be constructed, in which the mLs gene is flanked by the ldh gene fragments. The sequence of the ldh knockout vector constructed on plasmid pUC 19 is set forth as SEQ ID NO:5.
Expression of Xylose Isomerase and Xylulose Kinase in C. thermocellum and C. Straminisolvens (Prophetic Example)
[0212] For expression of xylose isomerase and xylulose kinase in C. thermocellum, the xylose isomerase and xylulose kinase genes were cloned from T. saccharolyticum and placed under control of the C. thermocellum gapDH promoter. This cassette is harbored in a C. thermocellum replicative plasmid based on the pNW33N backbone, resulting in pMU340 (FIG. 35) SEQ ID NO:74. Upon transfer into C. thermocellum, the resulting transformation can be assayed for the ability to grow on xylose. Analogous constructs can be created using the C. kristajanssonii xylose isomerase and xylulose kinase genes. These constructs can be tested for functionality in C. straminsolvens as well.
Expression of Pyruvate Decarboxylase and Alcohol Dehydrogenase in C. thermocellum and C. straminisolvens (Prophetic Example)
[0213] For expression of pyruvate decarboxylase and alcohol dehydrogenase in C. thermocellum, the pyruvate decarboxylase genes are cloned from sources Z. mobilis and Z. palmae and the alcohol dehydrogenase gene is cloned from source Z. mobilis. These genes (pdc and adh) will be expressed as an operon from the C. thermocellum pta-ack promoter. This cassette is harbored in a C. thermocellum replicative plasmid based on the pNW33N backbone (FIGS. 36 and 37), SEQ ID NOS:75 and 76. Upon transfer into C. thermocellum, the resulting transformation can be screened for enhanced ethanol production and/or aldehyde production to measure the functionality of the expressed enzymes. These constructs will be tested for functionality in C. straminsolvens as well.
Example 5
Fermentation of Avicel® Using C. straminisolvens
[0214] C. straminisolvens was used to ferment 1% Avicel® in serum bottles containing CTFUD medium. The product concentration profile and the ratios are shown in FIG. 27. About 2 g/L of total products was generated in 3 d with ethanol constituting about 50% of the total products. FIG. 27 shows the product concentration profiles for 1% Avicel® using C. straminisolvens. The ethanol to acetate ratio is depicted as E/A and the ratio of ethanol to total products is depicted as E/T.
Example 6
Engineered Group II Introns for Mesophilic and Thermophilic Cellulolytic, Xylanolytic Organisms
[0215] Mobile group II introns, found in many bacterial genomes, are both catalytic RNAs and retrotransposable elements. They use a mobility mechanism known as retrotransposition in which the excised intron RNA reverse splices directly into a DNA target site and is then reverse transcribed by an intron-encoded protein. The mobile Lactococcus lactis L1.LtrB group II intron has been developed into genetic tools known as Targetron® vectors, which are commercially available from Sigma Aldritch (Catalog # TA0100). This product and its use are the subject of one or more of U.S. Pat. Nos. 5,698,421, 5,804,418, 5,869,634, 6,027,895, 6,001,608, and 6,306,596 and/or other pending U.S. and foreign patent applications controlled by InGex, LLC.
[0216] Targetrons cassettes (FIGS. 28 and 29) which contain all the necessary sequences for retro-transposition may be sub-cloned into vectors capable of replication in mesophilic or thermophilic cellulolytic organisms. The Targetron cassette may be modified by replacing the lac promoter with any host- or species-specific constitutive or inducible promoters. The cassettes may be further modified through site-directed mutagenesis of the native recognition sequences such that the Group II intron is retargeted to insert into genes of interest creating genetic knockouts. For example, the group II intron could be redesigned to knockout lactate dehydrogenase or acetate kinase in any mesophilic or thermophilic cellulolytic organism. Table 4 depicts an example of insertion location and primers to retarget Intron to C. cellulolyticum acetate kinase (SEQ ID NO:21). Table 5 depicts an example of insertion location and primers to retarget Intron to C. cellulolyticum lactate dehydrogenase (SEQ ID NO:21).
[0217] An example of a vector for retargeting the L1.Ltrb intron to insert in C. cell. ack gene (SEQ ID NO:21) is depicted in FIG. 28. The vector sequence of pMU367 (C. cell. acetate kinase KO vector) is SEQ ID NO:30.
[0218] An example of a vector for retargeting the L1.Ltrb intron to insert in C. cell. LDH2744 gene (SEQ ID NO:23) is depicted in FIG. 29. The vector sequence of pMU367 (C. cell. lactate dehydrogenase KO vector) is set for as SEQ ID NO:31.
TABLE-US-00008 TABLE 4 Predicted ATTTACCTGGCTGGGAATACTGAGACATAT- Insertion intron-GTCATTGAGGCCGTA location (SEQ ID NO: 62) IBS1 mutagenic AAAAAAGCTTATAATTATCCTTAATTTCCTA primer (SEQ ID CTACGTGCGCCCAGATAGGGTG NO: 63) EBS1d mutagenic CAGATTGTACAAATGTGGTGATAACAGATAA primer (SEQ ID GTCTACTACTGTAACTTACCTTTCTTTGT NO: 64) EBS2 mutagenic TGAACGCAAGTTTCTAATTTCGGTTGAAATC primer (SEQ ID CGATAGAGGAAAGTGTCT NO: 65)
TABLE-US-00009 TABLE 5 Predicted TTAAATGTTGATAAGGAAGCTCTTTTCAAT- Insertion intron-GAAGTTAAGGTAGCA location (SEQ ID NO: 66) IBS1 AAAAAAGCTTATAATTATCCTTAGCTCTCTT mutagenic CAATGTGCGCCCAGATAGGGTG primer (SEQ ID NO: 67) EBS1d CAGATTGTACAAATGTGGTGATAACAGATAA mutagenic GTCTTCAATGATAACTTACCTTTCTTTGT primer (SEQ ID NO 68) EBS2 TGAACGCAAGTTTCTAATTTCGATTAGAGCT mutagenic CGATAGAGGAAAGTGTCT primer (SEQ ID NO: 69)
Example 7
Transformation of Thermoanaerobacter and Thermoanaerobacterium Strains (Prophetic Example)
[0219] Thermoanaerobacter pseudoethanolicus 39E, Thermoanaerobacterium saccharolyticum JW/SL-Y5485, Thermoanaerobacterium saccharolyticum B6A-RI, and Thermoanaerobacter sp. strain 59 will be transformed with the following protocol. Cells are grown at 55° C. in 40 mL of DSMZ M122 media with the following modifications: 5 g/L cellobiose instead of cellulose, 1.8 g/L K2HPO4, no glutathione, and 0.5 g/L L-cystiene-HCl until an optical density of 0.6 to 0.8. Cells are then harvested and washed twice with 40 mL 0.2 M cellobiose at room temperature. Cells are re-suspended in 0.2 M cellobiose in aquilots of 100 uL and 0.1 to 1 ug plasmid DNA is added to the sample in a 1 mm gap-width electroportation cuvette. An exponential pulse (Bio-Rad Instruments) of 1.8 kV, 25 μF, 200Ω, ˜3-6 ms is applied to the cuvette, and cells are diluted 100-200 fold in fresh M122 and incubated for 12-16 hours at 55° C. The recovered cells are then diluted 25-100 fold in petri-plates with fresh agar-containing media containing a selective agent, such as 200 μg/mL kanamycin. Once the media has solidified, plates incubated at 55° C. for 24-72 hours for colony formation. Colonies can be tested by PCR for evidence of site-specific recombination.
Example 8
Fermentation Performance of Engineered Thermoanaerobacter and Thermoanaerobacterium Strains
[0220] Table 6 depicts the fermentation performance of engineered Thermoanaerobacter and Thermoanaerobacterium strains. Cultures were grown for 24 hours in M122 at 55° C. without shaking. The following abbreviations are used in Table 6: Cellobiose (CB), glucose (G), lactic acid (LA), acetic acid (AA), and ethanol (Etoh). Values are in grams per liter. YS485--Thermoanaerobacterium saccharolyticum JW/SL-YS485, B6A-RI--Thermoanaerobacterium saccharolyticum B6A-RI, 39E--Thermoanaerobacter pseudoethanolicus 39E.
TABLE-US-00010 TABLE 6 Fermentation sample CB G LA AA Etoh YS485 wildtype 0 0 0.77 1.04 1.40 YS485 ΔL-ldh 0 0 0 0.92 1.73 YS485 Δpta/ack 2.51 0 0.75 0.06 0.62 YS485 ΔL-ldh, Δpta/ack 0 0 0 0 2.69 B6A-RI wildtype 0 0 0 1.0 1.76 B6A-RI ΔL-ldh, Δpta/ack strain #1 0 0 0 0 2.72 B6A-RI ΔL-ldh, Δpta/ack strain #2 0.45 0 0 0 2.49 39E wildtype 0.51 0 1.51 0.15 1.87 Media 5.10 0.25 0 0 0
Example 9
Construct for Engineering Cellulolytic and Xylanolytic Strains--Antisense RNA Technology Example
[0221] A replicative plasmid (FIG. 38) carrying an antisense RNA cassette targeting a C. thermocellum gene coding for lactate dehydrogenase (Cthe--1053) was transferred to C. thermocellum 1313 by electroporation and thiamphenicol selection. The transformation efficiency observed for this plasmid was equal to that of the parent vector, pMU102. The sequence of the plasmid is shown in SEQ ID NO: 61. The asRNA cassette is depicted in FIG. 38 and is organized as follows: (i) the entire 1827 bp cassette is cloned into the multicloning site of pMU102 in the orientation shown in FIG. 38, (ii) the native promoter region is contained within the first 600 bp of the cassette, (iii) the first 877 bp of the ldh open reading frame are fused to the native promoter in the antisense orientation, (iv) approximately 300 additional by are included downstream of the asRNA ldh region.
[0222] The resulting thiamphenicol resistant colonies were screened for altered end product formation by growing standing cultures on M122C media in the presence of 6 ug/mL thiamphenicol (to maintain the plasmid), as shown in FIG. 39. A preliminary screen of 9 randomly selected thiamphenicol-resistant transformants showed that 4 cultures exhibited low levels of lactate production relative to wild type. Additionally, a construct carrying antisense RNA directed to both ldh genes are to be constructed in order to partially, substantially, or completely delete, silence, inactivate, or down-regulate both genes simultaneously.
Example 10
[0223] SEQ ID NOS:44, 45, and 46 are the pyruvate-formate-lyase (aka formate acetyltransferase, EC. 2.3.1.54, pfl) genes from Thermoanaerobacterium saccharolyticum YS485, Clostridium thermocellum ATCC 27405, and Clostridium phytofermentans. Pfl catalyzes the conversion of pyruvate to Acetyl-CoA and formate (FIG. 34A-B). Deletion of pfl will result in the elimination of formate production, and could result in a decrease in acetic acid yield in some thermophilic strains, with a resulting increase in ethanol yield.
[0224] SEQ ID NOS:47-52, depicted in FIGS. 40-45, show pfl knockout plasmids, two each for the three organisms listed above. Each organism has a single crossover and double crossover plasmid designed to partially, substantially, or completely delete, silence, inactivate, or down-regulate the pfl enzyme. Single crossover plasmids are designed with a single DNA sequence (400 bp to 1000 bp) homologous to an internal section of the pfl gene, double crossover plasmids are designed with two DNA sequences (400 to 1000 bp) homologous to regions upstream (5') and downstream (3') to the pfl gene. All plasmids are designed to use the best available antibiotic markers for selection in the given organism. Plasmids can be maintained in E. coli and constructed through a DNA synthesis contract company, such as Codon Devices or DNA 2.0.
INCORPORATION BY REFERENCE
[0225] All of the U.S. patents and U.S. published patent applications cited herein are hereby incorporated by reference.
EQUIVALENTS
[0226] Those skilled in the art will recognize, or be able to ascertain using no more than routine experimentation, many equivalents to the specific embodiments of the invention described herein. Such equivalents are intended to be encompassed by the following claims.
Sequence CWU
1
1
9219000DNAArtificial SequenceDescription of Artificial Sequence Synthetic
polynucleotide 1agctttggct aacacacacg ccattccaac caatagtttt
ctcggcataa agccatgctc 60tgacgcttaa atgcactaat gccttaaaaa aacattaaag
tctaacacac tagacttatt 120tacttcgtaa ttaagtcgtt aaaccgtgtg ctctacgacc
aaaagtataa aacctttaag 180aactttcttt tttcttgtaa aaaaagaaac tagataaatc
tctcatatct tttattcaat 240aatcgcatca gattgcagta taaatttaac gatcactcat
catgttcata tttatcagag 300ctcgtgctat aattatacta attttataag gaggaaaaaa
taaagagggt tataatgaac 360gagaaaaata taaaacacag tcaaaacttt attacttcaa
aacataatat agataaaata 420atgacaaata taagattaaa tgaacatgat aatatctttg
aaatcggctc aggaaaaggg 480cattttaccc ttgaattagt acagaggtgt aatttcgtaa
ctgccattga aatagaccat 540aaattatgca aaactacaga aaataaactt gttgatcacg
ataatttcca agttttaaac 600aaggatatat tgcagtttaa atttcctaaa aaccaatcct
ataaaatatt tggtaatata 660ccttataaca taagtacgga tataatacgc aaaattgttt
ttgatagtat agctgatgag 720atttatttaa tcgtggaata cgggtttgct aaaagattat
taaatacaaa acgctcattg 780gcattatttt taatggcaga agttgatatt tctatattaa
gtatggttcc aagagaatat 840tttcatccta aacctaaagt gaatagctca cttatcagat
taaatagaaa aaaatcaaga 900atatcacaca aagataaaca gaagtataat tatttcgtta
tgaaatgggt taacaaagaa 960tacaagaaaa tatttacaaa aaatcaattt aacaattcct
taaaacatgc aggaattgac 1020gatttaaaca atattagctt tgaacaattc ttatctcttt
tcaatagcta taaattattt 1080aataagtaag ttaagggatg cataaactgc atcccttaac
ttgtttttcg tgtacctatt 1140ttttgtgaat cgattatgtc ttttgcgcat tcacttcttt
tctatataaa tatgagcgaa 1200gcgaataagc gtcggaaaag cagcaaaaag tttccttttt
gctgttggag catgggggtt 1260cagggggtgc agtatctgac gtcaatgccg agcgaaagcg
agccgaaggg tagcatttac 1320gttagataac cccctgatat gctccgacgc tttatataga
aaagaagatt caactaggta 1380aaatcttaat ataggttgag atgataaggt ttataaggaa
tttgtttgtt ctaatttttc 1440actcattttg ttctaatttc ttttaacaaa tgttcttttt
tttttagaac agttatgata 1500tagttagaat agtttaaaat aaggagtgag aaaaagatga
aagaaagata tggaacagtc 1560tataaaggct ctcagaggct catagacgaa gaaagtggag
aagtcataga ggtagacaag 1620ttataccgta aacaaacgtc tggtaacttc gtaaaggcat
atatagtgca attaataagt 1680atgttagata tgattggcgg aaaaaaactt aaaatcgtta
actatatcct agataatgtc 1740cacttaagta acaatacaat gatagctaca acaagagaaa
tagcaaaagc tacaggaaca 1800agtctacaaa cagtaataac aacacttaaa atcttagaag
aaggaaatat tataaaaaga 1860aaaactggag tattaatgtt aaaccctgaa ctactaatga
gaggcgacga ccaaaaacaa 1920aaatacctct tactcgaatt tgggaacttt gagcaagagg
caaatgaaat agattgacct 1980cccaataaca ccacgtagtt attgggaggt caatctatga
aatgcgatta agcttggctg 2040caggtcgata aacccagcga accatttgag gtgataggta
agattatacc gaggtatgaa 2100aacgagaatt ggacctttac agaattactc tatgaagcgc
catatttaaa aagctaccaa 2160gacgaagagg atgaagagga tgaggaggca gattgccttg
aatatattga caatactgat 2220aagataatat atcttttata tagaagatat cgccgtatgt
aaggatttca gggggcaagg 2280cataggcagc gcgcttatca atatatctat agaatgggca
aagcataaaa acttgcatgg 2340actaatgctt gaaacccagg acaataacct tatagcttgt
aaattctatc ataattgtgg 2400tttcaaaatc ggctccgtcg atactatgtt atacgccaac
tttcaaaaca actttgaaaa 2460agctgttttc tggtatttaa ggttttagaa tgcaaggaac
agtgaattgg agttcgtctt 2520gttataatta gcttcttggg gtatctttaa atactgtaga
aaagaggaag gaaataataa 2580atggctaaaa tgagaatatc accggaattg aaaaaactga
tcgaaaaata ccgctgcgta 2640aaagatacgg aaggaatgtc tcctgctaag gtatataagc
tggtgggaga aaatgaaaac 2700ctatatttaa aaatgacgga cagccggtat aaagggacca
cctatgatgt ggaacgggaa 2760aaggacatga tgctatggct ggaaggaaag ctgcctgttc
caaaggtcct gcactttgaa 2820cggcatgatg gctggagcaa tctgctcatg agtgaggccg
atggcgtcct ttgctcggaa 2880gagtatgaag atgaacaaag ccctgaaaag attatcgagc
tgtatgcgga gtgcatcagg 2940ctctttcact ccatcgacat atcggattgt ccctatacga
atagcttaga cagccgctta 3000gccgaattgg attacttact gaataacgat ctggccgatg
tggattgcga aaactgggaa 3060gaagacactc catttaaaga tccgcgcgag ctgtatgatt
ttttaaagac ggaaaagccc 3120gaagaggaac ttgtcttttc ccacggcgac ctgggagaca
gcaacatctt tgtgaaagat 3180ggcaaagtaa gtggctttat tgatcttggg agaagcggca
gggcggacaa gtggtatgac 3240attgccttct gcgtccggtc gatcagggag gatatcgggg
aagaacagta tgtcgagcta 3300ttttttgact tactggggat caagcctgat tgggagaaaa
taaaatatta tattttactg 3360gatgaattgt tttagtacct agatttagat gtctaaaaag
ctttttagac atctaatctt 3420ttctgaagta catccgcaac tgtccatact ctgatgtttt
atatcttttc taaaagttcg 3480ctagataggg gtcccgagcg cctacgagga atttgtatcg
actctagagg atccctcagc 3540gaagctccac tatgtttcaa aatgtcagat atatcaattt
tcatcaaagt cacctcttaa 3600aaccgacaag gactattata ctaactaata accctcatgt
caagaattat atgacagatt 3660ggcttaaata acaaaaataa ttttgtttag ttaaattcgg
aatttcttct taatattatt 3720aacatattcc acatattaat acaagaaaaa acccggcaaa
aaaataaaaa aattttataa 3780gcccgtttcc taaaaaaaca ggcttgtaaa attataacgc
atcttttata agttttttac 3840aagtcttaaa gtctcccttg caatctcaag ctcctcattt
gtcgggataa ccaaagtctt 3900tactttcgca tcgggagcac tgatatccgc ttctttgcct
ttcacttcat ttttatccaa 3960atctatttta attccgaaaa agtccatatc cttcaaaact
tctcttctta tataagcatt 4020gttttcgccg atacctgcag tgaataccac cgcatcaacg
ccgttcagca ctgcaatata 4080ttttccaata tatttcctaa caccatagca gaaaatatcc
aatgccagct gcgccctgtc 4140atctcccttt tctgcggcat cctgaacatc tctgaaatca
ctgcttacac ctgaaattcc 4200aagcacacct gatttcttgt taaggaaatt gtttatatcg
ttaatattca ttttttcctt 4260ttccatcaaa taagttataa ccgcagggtc aacattgccg
cttctggtac ccatgcacaa 4320cccctgcaga ggagtaaatc ccattgaggt gtcaacggat
tttccgcctt ttaccgcaca 4380aatacttgct ccgtttccaa gatggcaggt tatcagcttc
aggctctcaa taggtttgcc 4440cagcatctga gccgccctgt gggccacata tttgtgggaa
gttccgtgga atccgtattt 4500tctcaattta tacttctcat atatctcata agggagggca
taaatatatg catgctagtt 4560caacaaacgg gattgacttt taaaaaagga ttgattctaa
tgaagaaagc agacaagtaa 4620gcctcctaaa ttcactttag ataaaaattt aggaggcata
tcaaatgaac tttaataaaa 4680ttgatttaga caattggaag agaaaagaga tatttaatca
ttatttgaac caacaaacga 4740cttttagtat aaccacagaa attgatatta gtgttttata
ccgaaacata aaacaagaag 4800gatataaatt ttaccctgca tttattttct tagtgacaag
ggtgataaac tcaaatacag 4860cttttagaac tggttacaat agcgacggag agttaggtta
ttgggataag ttagagccac 4920tttatacaat ttttgatggt gtatctaaaa cattctctgg
tatttggact cctgtaaaga 4980atgacttcaa agagttttat gatttatacc tttctgatgt
agagaaatat aatggttcgg 5040ggaaattgtt tcccaaaaca cctatacctg aaaatgcttt
ttctctttct attattccat 5100ggacttcatt tactgggttt aacttaaata tcaataataa
tagtaattac cttctaccca 5160ttattacagc aggaaaattc attaataaag gtaattcaat
atatttaccg ctatctttac 5220aggtacatca ttctgtttgt gatggttatc atgcaggatt
gtttatgaac tctattcagg 5280aattgtcaga taggcctaat gactggcttt tataacctga
ggttttgctc caaccagcat 5340ctcaaaagat ttggatgcag atattgcaat ttcagaaagc
tggtctgcat ccggattttc 5400caccaagccg caatcggcat atacaaaggt tccgttatga
ccatattcac agttgggtac 5460aaccataaca aaaaaggatg atacgagttt tgtccccggg
gccgtcttta atatctgcaa 5520agccggtctc aaagtatttg cagtggaatt gacagcaccc
gccaccatac catccgcttc 5580accttttttt accatcataa ctccataata aagagggtct
ttgatcgttt cccttgcggc 5640ttctatagtc atacccttcg attttctaag ctcatacagt
gtatttgcat aatcctccaa 5700tttttcggaa tttaaggaat cctctatcat cactccttca
agatcaatat cccccgccag 5760actcttaatc tccttttcat tgcctatcag tacaaccttt
gcaattccct ttttcattat 5820catggatgcg gctttaataa ccctcagatc cgtactttcc
ggcaaaacta tggtttttac 5880gtctgatttc gccctttcaa ttatttgttc caaaaaactc
ataaattctt ctcctttcat 5940aatcccaaaa ctgttatcat aaaaactgta tttgtaatac
ttataactat atattatcac 6000caggtaataa tacctactca ctataaacag ctattttact
gggttccaag caactctaat 6060tatatacaaa atgttttttg tatacaacac cctccttatc
tttttttcgg ctttagccat 6120aaataacggc aagtaactcc aaaatacagg atatttcatg
cttttagaaa ctttttatta 6180gtcttcttaa ttattcagat tttgtggcaa ttaaactttg
cagctcctcc aaatagttgt 6240ccagctcctc ttctttaaga ttgctgagat atgacaatct
gtaattttta gcctttttgg 6300ccatctctag cgcactctcc gtcattccca aatctttcaa
aacacagcta tagttatagt 6360atgcgaattc actggccgtc gttttacaac gtcgtgactg
ggaaaaccct ggcgttaccc 6420aacttaatcg ccttgcagca catccccctt tcgccagctg
gcgtaatagc gaagaggccc 6480gcaccgatcg cccttcccaa cagttgcgca gcctgaatgg
cgaatggcgc ctgatgcggt 6540attttctcct tacgcatctg tgcggtattt cacaccgcat
atggtgcact ctcagtacaa 6600tctgctctga tgccgcatag ttaagccagc cccgacaccc
gccaacaccc gctgacgcgc 6660cctgacgggc ttgtctgctc ccggcatccg cttacagaca
agctgtgacc gtctccggga 6720gctgcatgtg tcagaggttt tcaccgtcat caccgaaacg
cgcgagacga aagggcctcg 6780tgatacgcct atttttatag gttaatgtca tgataataat
ggtttcttag acgtcaggtg 6840gcacttttcg gggaaatgtg cgcggaaccc ctatttgttt
atttttctaa atacattcaa 6900atatgtatcc gctcatgaga caataaccct gataaatgct
tcaataatat tgaaaaagga 6960agagtatgag tattcaacat ttccgtgtcg cccttattcc
cttttttgcg gcattttgcc 7020ttcctgtttt tgctcaccca gaaacgctgg tgaaagtaaa
agatgctgaa gatcagttgg 7080gtgcacgagt gggttacatc gaactggatc tcaacagcgg
taagatcctt gagagttttc 7140gccccgaaga acgttttcca atgatgagca cttttaaagt
tctgctatgt ggcgcggtat 7200tatcccgtat tgacgccggg caagagcaac tcggtcgccg
catacactat tctcagaatg 7260acttggttga gtactcacca gtcacagaaa agcatcttac
ggatggcatg acagtaagag 7320aattatgcag tgctgccata accatgagtg ataacactgc
ggccaactta cttctgacaa 7380cgatcggagg accgaaggag ctaaccgctt ttttgcacaa
catgggggat catgtaactc 7440gccttgatcg ttgggaaccg gagctgaatg aagccatacc
aaacgacgag cgtgacacca 7500cgatgcctgt agcaatggca acaacgttgc gcaaactatt
aactggcgaa ctacttactc 7560tagcttcccg gcaacaatta atagactgga tggaggcgga
taaagttgca ggaccacttc 7620tgcgctcggc ccttccggct ggctggttta ttgctgataa
atctggagcc ggtgagcgtg 7680ggtctcgcgg tatcattgca gcactggggc cagatggtaa
gccctcccgt atcgtagtta 7740tctacacgac ggggagtcag gcaactatgg atgaacgaaa
tagacagatc gctgagatag 7800gtgcctcact gattaagcat tggtaactgt cagaccaagt
ttactcatat atactttaga 7860ttgatttaaa acttcatttt taatttaaaa ggatctaggt
gaagatcctt tttgataatc 7920tcatgaccaa aatcccttaa cgtgagtttt cgttccactg
agcgtcagac cccgtagaaa 7980agatcaaagg atcttcttga gatccttttt ttctgcgcgt
aatctgctgc ttgcaaacaa 8040aaaaaccacc gctaccagcg gtggtttgtt tgccggatca
agagctacca actctttttc 8100cgaaggtaac tggcttcagc agagcgcaga taccaaatac
tgtccttcta gtgtagccgt 8160agttaggcca ccacttcaag aactctgtag caccgcctac
atacctcgct ctgctaatcc 8220tgttaccagt ggctgctgcc agtggcgata agtcgtgtct
taccgggttg gactcaagac 8280gatagttacc ggataaggcg cagcggtcgg gctgaacggg
gggttcgtgc acacagccca 8340gcttggagcg aacgacctac accgaactga gatacctaca
gcgtgagcta tgagaaagcg 8400ccacgcttcc cgaagggaga aaggcggaca ggtatccggt
aagcggcagg gtcggaacag 8460gagagcgcac gagggagctt ccagggggaa acgcctggta
tctttatagt cctgtcgggt 8520ttcgccacct ctgacttgag cgtcgatttt tgtgatgctc
gtcagggggg cggagcctat 8580ggaaaaacgc cagcaacgcg gcctttttac ggttcctggc
cttttgctgg ccttttgctc 8640acatgttctt tcctgcgtta tcccctgatt ctgtggataa
ccgtattacc gcctttgagt 8700gagctgatac cgctcgccgc agccgaacga ccgagcgcag
cgagtcagtg agcgaggaag 8760cggaagagcg cccaatacgc aaaccgcctc tccccgcgcg
ttggccgatt cattaatgca 8820gctggcacga caggtttccc gactggaaag cgggcagtga
gcgcaacgca attaatgtga 8880gttagctcac tcattaggca ccccaggctt tacactttat
gcttccggct cgtatgttgt 8940gtggaattgt gagcggataa caatttcaca caggaaacag
ctatgaccat gattacgcca 900027582DNAArtificial SequenceDescription of
Artificial Sequence Synthetic polynucleotide 2aattctgttc tggctttaga
ccatttacgc tttgggtttg ccatgcatca cacctcctcg 60tcaccataca acaaaccttg
tgtataaata aaatcttgta cacatggaag gtactcgatt 120ttgtttttac ttaaaacaaa
gctacactaa ttgtctgtaa aaaagttttt gaaaacttcc 180agccgcggat ctatatcctc
atttacgcat ttgcactcat cgttgttaag atccgctcca 240cacttcgggc aatatccctt
gcaggcctcg tcacaaacct gctttgccgg aagattcagt 300atgatattgt ctatcattac
cttttccagc tcgaggaact taccatggta cgtataatat 360tcctcgtcgg ttttgttact
gccctcttct acaaagtttt cctttacatc aatatgcatc 420tttgattcaa tatccttgag
gcaccttgag cattttgccc tgtaatccgc ccagagttca 480ccgtcaagtt ttataatccc
tccggcattt accaaagtgc ccttaaaagt taccggtttc 540gcaaagtcaa aatcctcagc
tataaaatca ttaattttaa ttgactcact aaagtccagt 600ctcagcgaag ctccactatg
tttcaaaatg tcagatatat caattttcat caaagtcacc 660tcttaaaacc gacaaggact
attatactaa ctaataaccc tcatgtcaag aattatatga 720cagattggct taaataacaa
aaataatttt gtttagttaa attcggaatt tcttcttaat 780attattaaca tattccacat
attaatacaa gaaaaaaccc ggcaaaaaaa taaaaaaatt 840ttataagccc gtttcctaaa
aaaacaggct tgtaaaatta taacgcatct tttataagtt 900ttttacaagt cttaaagtct
cccttgcaat ctcaagctcc tcatttgtcg ggataaccaa 960agtctttact ttcgcatcgg
gagcactgat atccgcttct ttgcctttca cttcattttt 1020atccaaatct attttaattc
cgaaaaagtc catatccttc aaaacttctc ttcttatata 1080agcattgttt tcgccgatac
ctgcagtgaa taccaccgca tcaacgccgt tcagcactgc 1140aatatatttt ccaatatatt
tcctaacacc atagcagaaa atatccaatg ccagctgctg 1200cagtaatcgc atcagattgc
agtataaatt taacgatcac tcatcatgtt catatttatc 1260agagctcgtg ctataattat
actaatttta taaggaggaa aaaataaaga gggttataat 1320gaacgagaaa aatataaaac
acagtcaaaa ctttattact tcaaaacata atatagataa 1380aataatgaca aatataagat
taaatgaaca tgataatatc tttgaaatcg gctcaggaaa 1440agggcatttt acccttgaat
tagtacagag gtgtaatttc gtaactgcca ttgaaataga 1500ccataaatta tgcaaaacta
cagaaaataa acttgttgat cacgataatt tccaagtttt 1560aaacaaggat atattgcagt
ttaaatttcc taaaaaccaa tcctataaaa tatttggtaa 1620tataccttat aacataagta
cggatataat acgcaaaatt gtttttgata gtatagctga 1680tgagatttat ttaatcgtgg
aatacgggtt tgctaaaaga ttattaaata caaaacgctc 1740attggcatta tttttaatgg
cagaagttga tatttctata ttaagtatgg ttccaagaga 1800atattttcat cctaaaccta
aagtgaatag ctcacttatc agattaaata gaaaaaaatc 1860aagaatatca cacaaagata
aacagaagta taattatttc gttatgaaat gggttaacaa 1920agaatacaag aaaatattta
caaaaaatca atttaacaat tccttaaaac atgcaggaat 1980tgacgattta aacaatatta
gctttgaaca attcttatct cttttcaata gctataaatt 2040atttaataag taagttaagg
gatgcataaa ctgcatccct tacagctgat actttagtga 2100tgagcttccg gtattaataa
ccaaaatatt catttcaaaa actcactccc gtcttgtttt 2160ttttaatttt cctattccta
aacttcgata aacagatgtt tttattaaac gctgcgcaac 2220accttcttca atgtccggtt
ttaacagaat ttatgccttg acatattgag cctgaaccgc 2280agtaattgcc gcaaccccga
ctatatcctc ggcactgcag cctcgtgaca gatcatttac 2340cggtcttgcc aaaccttgtg
ttatcgggcc gtaagcttca gcttttgcca atctctgtgt 2400aagcttgtat gcaatatttc
cggcatcaag atccgggaaa ataagaacat tggcctttcc 2460tgcaacactg cttccctttg
ccttcgattt tgccacttcc ggaacaatgg cggcatccac 2520ctgaagttct ccgtcaattg
caaggtgggg agctttttcc tttgcaagct gtgttgcctt 2580gattaccttt tcggtcagct
cacttttggc actgccgtaa gaagaataag aaagcattgc 2640cacctgaggt tttgctccaa
ccagcatctc aaaagatttg gatgcagata ttgcaatttc 2700agaaagctgg tctgcatccg
gattttccac caagccgcaa tcggcatata caaaggttcc 2760gttatgacca tattcacagt
tgggtacaac cataacaaaa aaggatgata cgagttttgt 2820ccccggggcc gtctttaata
tctgcaaagc cggtctcaaa gtatttgcag tggaattgac 2880agcacccgcc accataccat
ccgcttcacc tttttttacc atcataactc cataataaag 2940agggtctttg atcgtttccc
ttgcggcttc tatagtcata cccttcgatt ttctaagctc 3000atacagtgta tttgcataat
cctccaattt ttcggaattt aaggaatcct ctatcatcac 3060tccttcaaga tcaatatccc
ccgccagact cttaatctcc ttttcattgc ctatcagtac 3120aacctttgca attccctttt
tcattatcat ggatgcggct ttaataaccc tcagatccgt 3180actttccggc aaaactatgg
tttttacgtc tgatttcgcc ctttcaatta tttgttccaa 3240aaaactcata aattcttctc
ctttcataat cccaaaactg ttatcataaa aactgtattt 3300gtaatactta taactatata
ttatcaccag gtaataatac ctactcacta taaacagcta 3360ttttactggg ttccaagcaa
ctctaggatc ctctagagtc gacctgcagg catgcaagct 3420tggcgtaatc atggtcatag
ctgtttcctg tgtgaaattg ttatccgctc acaattccac 3480acaacatacg agccggaagc
ataaagtgta aagcctgggg tgcctaatga gtgagctaac 3540tcacattaat tgcgttgcgc
tcactgcccg ctttccagtc gggaaacctg tcgtgccagc 3600ccttcaaact tcccaaaggc
gagccctagt gacattagaa aaccgactgt aaaaagtaca 3660gtcggcatta tctcatatta
taaaagccag tcattaggcc tatctgacaa ttcctgaata 3720gagttcataa acaatcctgc
atgataacca tcacaaacag aatgatgtac ctgtaaagat 3780agcggtaaat atattgaatt
acctttatta atgaattttc ctgctgtaat aatgggtaga 3840aggtaattac tattattatt
gatatttaag ttaaacccag taaatgaagt ccatggaata 3900atagaaagag aaaaagcatt
ttcaggtata ggtgttttgg gaaacaattt ccccgaacca 3960ttatatttct ctacatcaga
aaggtataaa tcataaaact ctttgaagtc attctttaca 4020ggagtccaaa taccagagaa
tgttttagat acaccatcaa aaattgtata aagtggctct 4080aacttatccc aataacctaa
ctctccgtcg ctattgtaac cagttctaaa agctgtattt 4140gagtttatca cccttgtcac
taagaaaata aatgcagggt aaaatttata tccttcttgt 4200tttatgtttc ggtataaaac
actaatatca atttctgtgg ttatactaaa agtcgtttgt 4260tggttcaaat aatgattaaa
tatctctttt ctcttccaat tgtctaaatc aattttatta 4320aagttcattt gatatgcctc
ctaaattttt atctaaagtg aatttaggag gcttacttgt 4380ctgctttctt cattagaatc
aatccttttt taaaagtcaa tcccgtttgt tgaactactc 4440tttaataaaa taatttttcc
gttcccaatt ccacattgca ataatagaaa atccatcttc 4500atcggctttt tcgtcatcat
ctgtatgaat caaatcgcct tcttctgtgt catcaaggtt 4560taatttttta tgtatttctt
ttaacaaacc accataggag attaaccttt tacggtgtaa 4620accttcctcc aaatcagaca
aacgtttcaa attcttttct tcatcatcgg tcataaaatc 4680cgtatccttt acaggatatt
ttgcagtttc gtcaattgcc gattgtatat ccgatttata 4740tttatttttc ggtcgaatca
tttgaacttt tacatttgga tcatagtcta atttcattgc 4800ctttttccaa aattgaatcc
attgtttttg attcacgtag ttttctgtat tcttaaaata 4860agttggttcc acacatacca
atacatgcat gtgctgatta taagaattat ctttattatt 4920tattgtcact tccgttgcac
gcataaaacc aacaagattt ttattaattt ttttatattg 4980catcattcgg cgaaatcctt
gagccatatc tgacaaactc ttatttaatt cttcgccatc 5040ataaacattt ttaactgtta
atgtgagaaa caaccaacga actgttggct tttgtttaat 5100aacttcagca acaacctttt
gtgactgaat gccatgtttc attgctctcc tccagttgca 5160cattggacaa agcctggatt
tacaaaacca cactcgatac aactttcttt cgcctgtttc 5220acgattttgt ttatactcta
atatttcagc acaatctttt actctttcag cctttttaaa 5280ttcaagaata tgcagaagtt
caaagtaatc aacattagcg attttctttt ctctccatgg 5340tctcactttt ccactttttg
tcttgtccac taaaaccctt gatttttcat ctgaataaat 5400gctactatta ggacacataa
tattaaaaga aacccccatc tatttagtta tttgtttggt 5460cacttataac tttaacagat
ggggtttttc tgtgcaacca attttaaggg ttttcaatac 5520tttaaaacac atacatacca
acacttcaac gcacctttca gcaactaaaa taaaaatgac 5580gttatttcta tatgtatcaa
gaatagaaag aactcgtttt tcgctacgct caaaacgcaa 5640aaaaagcact cattcgagtg
ctttttctta tcgctccaaa tcatgcgatt ttttcctctt 5700tgcttttctt tgctcacgaa
gttctcgatc acgctgcaaa acatcttgaa gcgaaaaagt 5760attcttcttt tcttccgatc
gctcatgctg acgcacgaaa agccctctag gcgcatagga 5820acaactccta aatgcatgtg
aggggttttc tcgtccatgt gaacagtcgc atacgcaata 5880ttttgtttcc catactgcat
taatgaatcg gccaacgcgc ggggagaggc ggtttgcgta 5940ttgggcgctc ttccgcttcc
tcgctcactg actcgctgcg ctcggtcgtt cggctgcggc 6000gagcggtatc agctcactca
aaggcggtaa tacggttatc cacagaatca ggggataacg 6060caggaaagaa catgtgagca
aaaggccagc aaaaggccag gaaccgtaaa aaggccgcgt 6120tgctggcgtt tttccatagg
ctccgccccc ctgacgagca tcacaaaaat cgacgctcaa 6180gtcagaggtg gcgaaacccg
acaggactat aaagatacca ggcgtttccc cctggaagct 6240ccctcgtgcg ctctcctgtt
ccgaccctgc cgcttaccgg atacctgtcc gcctttctcc 6300cttcgggaag cgtggcgctt
tctcatagct cacgctgtag gtatctcagt tcggtgtagg 6360tcgttcgctc caagctgggc
tgtgtgcacg aaccccccgt tcagcccgac cgctgcgcct 6420tatccggtaa ctatcgtctt
gagtccaacc cggtaagaca cgacttatcg ccactggcag 6480cagccactgg taacaggatt
agcagagcga ggtatgtagg cggtgctaca gagttcttga 6540agtggtggcc taactacggc
tacactagaa gaacagtatt tggtatctgc gctctgctga 6600agccagttac cttcggaaaa
agagttggta gctcttgatc cggcaaacaa accaccgctg 6660gtagcggtgg tttttttgtt
tgcaagcagc agattacgcg cagaaaaaaa ggatctcaag 6720aagatccttt gatcttttct
acggggtctg acgctcagtg gaacgaaaac tcacgttaag 6780ggattttggt catgagatta
tcaaaaagga tcttcaccta gatcctttta aattaaaaat 6840gaagttttaa atcaatctaa
agtatatatg agtaaacttg gtctgacagt taccaatgct 6900taatcagtga ggcacctatc
tcagcgatct gtctatttcg ttcatccata gttgcctgac 6960tccccgtcgt gtagataact
acgatacggg agggcttacc atctggcccc agtgctgcaa 7020tgataccgcg agacccacgc
tcaccggctc cagatttatc agcaataaac cagccagccc 7080gatatgggaa acaaaatatt
gcgtatgcga ctgttcacat ggacgagaaa acccctcaca 7140tgcatttagg agttgttcct
atgcgcctag agggcttttc gtgcgtcagc atgagcgatc 7200ggaagaaaag aagaatactt
tttcgcttca agatgttttg cagcgtgatc gagaacttcg 7260tgagcaaaga aaagcaaaga
ggaaaaaatc gcatgatttg gagcgataag aaaaagcact 7320cgaatgagtg ctttttttgc
gttttgagcg tagcgaaaaa cgagttcttt ctattcttga 7380tacatataga aataacgtca
tttttatttt agttgctgaa aggtgcgttg aagtgttggt 7440atgtatgtga ttcaataatt
tcttttactc gctcgttata gtcgatcggt tcatcattca 7500ccaaatcata attttcatgt
gaccgttctt tatcaatatc gggattcgtt ttactttccc 7560gttctctctg attgtgaaat
tg 758238927DNAArtificial
SequenceDescription of Artificial Sequence Synthetic polynucleotide
3agctttggct aacacacacg ccattccaac caatagtttt ctcggcataa agccatgctc
60tgacgcttaa atgcactaat gccttaaaaa aacattaaag tctaacacac tagacttatt
120tacttcgtaa ttaagtcgtt aaaccgtgtg ctctacgacc aaaagtataa aacctttaag
180aactttcttt tttcttgtaa aaaaagaaac tagataaatc tctcatatct tttattcaat
240aatcgcatca gattgcagta taaatttaac gatcactcat catgttcata tttatcagag
300ctcgtgctat aattatacta attttataag gaggaaaaaa taaagagggt tataatgaac
360gagaaaaata taaaacacag tcaaaacttt attacttcaa aacataatat agataaaata
420atgacaaata taagattaaa tgaacatgat aatatctttg aaatcggctc aggaaaaggg
480cattttaccc ttgaattagt acagaggtgt aatttcgtaa ctgccattga aatagaccat
540aaattatgca aaactacaga aaataaactt gttgatcacg ataatttcca agttttaaac
600aaggatatat tgcagtttaa atttcctaaa aaccaatcct ataaaatatt tggtaatata
660ccttataaca taagtacgga tataatacgc aaaattgttt ttgatagtat agctgatgag
720atttatttaa tcgtggaata cgggtttgct aaaagattat taaatacaaa acgctcattg
780gcattatttt taatggcaga agttgatatt tctatattaa gtatggttcc aagagaatat
840tttcatccta aacctaaagt gaatagctca cttatcagat taaatagaaa aaaatcaaga
900atatcacaca aagataaaca gaagtataat tatttcgtta tgaaatgggt taacaaagaa
960tacaagaaaa tatttacaaa aaatcaattt aacaattcct taaaacatgc aggaattgac
1020gatttaaaca atattagctt tgaacaattc ttatctcttt tcaatagcta taaattattt
1080aataagtaag ttaagggatg cataaactgc atcccttaac ttgtttttcg tgtacctatt
1140ttttgtgaat cgattatgtc ttttgcgcat tcacttcttt tctatataaa tatgagcgaa
1200gcgaataagc gtcggaaaag cagcaaaaag tttccttttt gctgttggag catgggggtt
1260cagggggtgc agtatctgac gtcaatgccg agcgaaagcg agccgaaggg tagcatttac
1320gttagataac cccctgatat gctccgacgc tttatataga aaagaagatt caactaggta
1380aaatcttaat ataggttgag atgataaggt ttataaggaa tttgtttgtt ctaatttttc
1440actcattttg ttctaatttc ttttaacaaa tgttcttttt tttttagaac agttatgata
1500tagttagaat agtttaaaat aaggagtgag aaaaagatga aagaaagata tggaacagtc
1560tataaaggct ctcagaggct catagacgaa gaaagtggag aagtcataga ggtagacaag
1620ttataccgta aacaaacgtc tggtaacttc gtaaaggcat atatagtgca attaataagt
1680atgttagata tgattggcgg aaaaaaactt aaaatcgtta actatatcct agataatgtc
1740cacttaagta acaatacaat gatagctaca acaagagaaa tagcaaaagc tacaggaaca
1800agtctacaaa cagtaataac aacacttaaa atcttagaag aaggaaatat tataaaaaga
1860aaaactggag tattaatgtt aaaccctgaa ctactaatga gaggcgacga ccaaaaacaa
1920aaatacctct tactcgaatt tgggaacttt gagcaagagg caaatgaaat agattgacct
1980cccaataaca ccacgtagtt attgggaggt caatctatga aatgcgatta agcttggctg
2040caggtcgata aacccagcga accatttgag gtgataggta agattatacc gaggtatgaa
2100aacgagaatt ggacctttac agaattactc tatgaagcgc catatttaaa aagctaccaa
2160gacgaagagg atgaagagga tgaggaggca gattgccttg aatatattga caatactgat
2220aagataatat atcttttata tagaagatat cgccgtatgt aaggatttca gggggcaagg
2280cataggcagc gcgcttatca atatatctat agaatgggca aagcataaaa acttgcatgg
2340actaatgctt gaaacccagg acaataacct tatagcttgt aaattctatc ataattgtgg
2400tttcaaaatc ggctccgtcg atactatgtt atacgccaac tttcaaaaca actttgaaaa
2460agctgttttc tggtatttaa ggttttagaa tgcaaggaac agtgaattgg agttcgtctt
2520gttataatta gcttcttggg gtatctttaa atactgtaga aaagaggaag gaaataataa
2580atggctaaaa tgagaatatc accggaattg aaaaaactga tcgaaaaata ccgctgcgta
2640aaagatacgg aaggaatgtc tcctgctaag gtatataagc tggtgggaga aaatgaaaac
2700ctatatttaa aaatgacgga cagccggtat aaagggacca cctatgatgt ggaacgggaa
2760aaggacatga tgctatggct ggaaggaaag ctgcctgttc caaaggtcct gcactttgaa
2820cggcatgatg gctggagcaa tctgctcatg agtgaggccg atggcgtcct ttgctcggaa
2880gagtatgaag atgaacaaag ccctgaaaag attatcgagc tgtatgcgga gtgcatcagg
2940ctctttcact ccatcgacat atcggattgt ccctatacga atagcttaga cagccgctta
3000gccgaattgg attacttact gaataacgat ctggccgatg tggattgcga aaactgggaa
3060gaagacactc catttaaaga tccgcgcgag ctgtatgatt ttttaaagac ggaaaagccc
3120gaagaggaac ttgtcttttc ccacggcgac ctgggagaca gcaacatctt tgtgaaagat
3180ggcaaagtaa gtggctttat tgatcttggg agaagcggca gggcggacaa gtggtatgac
3240attgccttct gcgtccggtc gatcagggag gatatcgggg aagaacagta tgtcgagcta
3300ttttttgact tactggggat caagcctgat tgggagaaaa taaaatatta tattttactg
3360gatgaattgt tttagtacct agatttagat gtctaaaaag ctttttagac atctaatctt
3420ttctgaagta catccgcaac tgtccatact ctgatgtttt atatcttttc taaaagttcg
3480ctagataggg gtcccgagag ccccatactc atgagcagtc ttgttacagc tatgccggca
3540gcacctgaac cgtttacaac aacttctata tcctcgattt tcttgttgac aagctttaat
3600gcattgatca ttgctgcaac agtaacaacg gctgtaccgt gctggtcatc atggaatatt
3660ggaatgtcac attcctcttt gagtcttctt tctatttcaa agcatctcgg agcggatata
3720tcttcgaggt ttataccgcc aaagcttccg gagatgagct tgattgtctt tacaatttca
3780tctacgtctt ttgatttgat acagagcgga aatgcgtcca catcaccaaa cttcttgaag
3840agtacgcatt taccttccat aacaggcatt ccggcttcag gtcctatgtc tccgagccct
3900aaaaccgccg taccgtcggt aataaccgct accaggttcc aacgtcttgt atattcataa
3960gaaagattaa catctttctg aattgcaaga catggttctg caacacccgg tgtataagca
4020agcgacaact cttccttggt tgaaacaggt accttgtgta taacctcaat tttacccttc
4080cactcaccgt gaagccttag tgattctttt ctgtaatcca tttgattcta cctccaaatt
4140atattattaa atatctgcga tattaatgca caattataaa ttcttaactt cgttcaatac
4200ttttttaacc tgctccgctg agaatcttaa agcttcttct tcttcaggag tcagattaaa
4260ttggagaact tcctgaacac cttcggaatt tacgatggat ggaaggctta ttgcaacatc
4320ttctattcca tacatgccgt ttataacggt tcctacggtt cttattgtat tctgattctt
4380aaggagtgtt tcaactattg tgttgattga aactgcaata ccatagtatg ttgcaccttt
4440gttcttgata atggttgcac ccgcagtttt aacatcttca gcgatttttt tcttgtcttc
4500ttctgtgaaa ttgcatttcg gatcatcgat atattcgttg atatttttac cggcgatatg
4560tgtgcagctc cacaacggaa gctgtgaatc accgtgttcg cctattatgt agccgtgtac
4620atagttcaac aaacgggatt gacttttaaa aaaggattga ttctaatgaa gaaagcagac
4680aagtaagcct cctaaattca ctttagataa aaatttagga ggcatatcaa atgaacttta
4740ataaaattga tttagacaat tggaagagaa aagagatatt taatcattat ttgaaccaac
4800aaacgacttt tagtataacc acagaaattg atattagtgt tttataccga aacataaaac
4860aagaaggata taaattttac cctgcattta ttttcttagt gacaagggtg ataaactcaa
4920atacagcttt tagaactggt tacaatagcg acggagagtt aggttattgg gataagttag
4980agccacttta tacaattttt gatggtgtat ctaaaacatt ctctggtatt tggactcctg
5040taaagaatga cttcaaagag ttttatgatt tatacctttc tgatgtagag aaatataatg
5100gttcggggaa attgtttccc aaaacaccta tacctgaaaa tgctttttct ctttctatta
5160ttccatggac ttcatttact gggtttaact taaatatcaa taataatagt aattaccttc
5220tacccattat tacagcagga aaattcatta ataaaggtaa ttcaatatat ttaccgctat
5280ctttacaggt acatcattct gtttgtgatg gttatcatgc aggattgttt atgaactcta
5340ttcaggaatt gtcagatagg cctaatgact ggcttttata atgtacattt attggtaaca
5400ttgtcttttg ggtttttctt tcttatatcc gttcttgccg ccgcggtttc ggaaaaattt
5460gaaatattgc ttgtttccct tcttttgttg gtacttttga taccttatat tgcccattat
5520tacaaactgg agaacggagt tcagaggctt tatgagcttt ataacaaaat tgatgaaaaa
5580tgtgtaagga aaaacaagac cgcctgagtt ctcacccaga cggtcggtat tggcagtttc
5640actttcgtta gtcgatgttt ttcatgccgg caaagaaatt attttcttgc aagaaccttt
5700ttcagttttg caaatcttgg aagaccatct tcgataggag gtcttgattc tccctgaatt
5760aacggaagag catacttaat aaagtcttct gtaaggcctg ctccgtcagg tttaatccat
5820tccaacggaa ctttcttctc agtatttgca acttcactga ggttcagaag cttgatattg
5880cacttgtatt caggaccttc cgctctttca aaagcaacca tgtagtctgt tttcccttca
5940acggcatatt gtacggctgc ctgtcctgca agataagctt catttacgtc ggtaagagaa
6000gctacgtgag ctgcgcatct ttggagaagg ctgaattcaa tgccgcgaac ctttgcgccg
6060gtcttctctt taacaatgtt agccagtgtt gaagcaagac cgcccaactg tgcatgtcca
6120aaggagtctt ttgttttcgc aaggtctgaa ccgtattcgg aaatatattt tccgtttttg
6180tctttgatac cttcagatac ggctacaata acctttccgt tttctttgta gattcttgtc
6240acatcttcaa caaatttgtc tatgtcaaag gaaagctcgg gtaccgagct cgaattcact
6300ggccgtcgtt ttacaacgtc gtgactggga aaaccctggc gttacccaac ttaatcgcct
6360tgcagcacat ccccctttcg ccagctggcg taatagcgaa gaggcccgca ccgatcgccc
6420ttcccaacag ttgcgcagcc tgaatggcga atggcgcctg atgcggtatt ttctccttac
6480gcatctgtgc ggtatttcac accgcatatg gtgcactctc agtacaatct gctctgatgc
6540cgcatagtta agccagcccc gacacccgcc aacacccgct gacgcgccct gacgggcttg
6600tctgctcccg gcatccgctt acagacaagc tgtgaccgtc tccgggagct gcatgtgtca
6660gaggttttca ccgtcatcac cgaaacgcgc gagacgaaag ggcctcgtga tacgcctatt
6720tttataggtt aatgtcatga taataatggt ttcttagacg tcaggtggca cttttcgggg
6780aaatgtgcgc ggaaccccta tttgtttatt tttctaaata cattcaaata tgtatccgct
6840catgagacaa taaccctgat aaatgcttca ataatattga aaaaggaaga gtatgagtat
6900tcaacatttc cgtgtcgccc ttattccctt ttttgcggca ttttgccttc ctgtttttgc
6960tcacccagaa acgctggtga aagtaaaaga tgctgaagat cagttgggtg cacgagtggg
7020ttacatcgaa ctggatctca acagcggtaa gatccttgag agttttcgcc ccgaagaacg
7080ttttccaatg atgagcactt ttaaagttct gctatgtggc gcggtattat cccgtattga
7140cgccgggcaa gagcaactcg gtcgccgcat acactattct cagaatgact tggttgagta
7200ctcaccagtc acagaaaagc atcttacgga tggcatgaca gtaagagaat tatgcagtgc
7260tgccataacc atgagtgata acactgcggc caacttactt ctgacaacga tcggaggacc
7320gaaggagcta accgcttttt tgcacaacat gggggatcat gtaactcgcc ttgatcgttg
7380ggaaccggag ctgaatgaag ccataccaaa cgacgagcgt gacaccacga tgcctgtagc
7440aatggcaaca acgttgcgca aactattaac tggcgaacta cttactctag cttcccggca
7500acaattaata gactggatgg aggcggataa agttgcagga ccacttctgc gctcggccct
7560tccggctggc tggtttattg ctgataaatc tggagccggt gagcgtgggt ctcgcggtat
7620cattgcagca ctggggccag atggtaagcc ctcccgtatc gtagttatct acacgacggg
7680gagtcaggca actatggatg aacgaaatag acagatcgct gagataggtg cctcactgat
7740taagcattgg taactgtcag accaagttta ctcatatata ctttagattg atttaaaact
7800tcatttttaa tttaaaagga tctaggtgaa gatccttttt gataatctca tgaccaaaat
7860cccttaacgt gagttttcgt tccactgagc gtcagacccc gtagaaaaga tcaaaggatc
7920ttcttgagat cctttttttc tgcgcgtaat ctgctgcttg caaacaaaaa aaccaccgct
7980accagcggtg gtttgtttgc cggatcaaga gctaccaact ctttttccga aggtaactgg
8040cttcagcaga gcgcagatac caaatactgt ccttctagtg tagccgtagt taggccacca
8100cttcaagaac tctgtagcac cgcctacata cctcgctctg ctaatcctgt taccagtggc
8160tgctgccagt ggcgataagt cgtgtcttac cgggttggac tcaagacgat agttaccgga
8220taaggcgcag cggtcgggct gaacgggggg ttcgtgcaca cagcccagct tggagcgaac
8280gacctacacc gaactgagat acctacagcg tgagctatga gaaagcgcca cgcttcccga
8340agggagaaag gcggacaggt atccggtaag cggcagggtc ggaacaggag agcgcacgag
8400ggagcttcca gggggaaacg cctggtatct ttatagtcct gtcgggtttc gccacctctg
8460acttgagcgt cgatttttgt gatgctcgtc aggggggcgg agcctatgga aaaacgccag
8520caacgcggcc tttttacggt tcctggcctt ttgctggcct tttgctcaca tgttctttcc
8580tgcgttatcc cctgattctg tggataaccg tattaccgcc tttgagtgag ctgataccgc
8640tcgccgcagc cgaacgaccg agcgcagcga gtcagtgagc gaggaagcgg aagagcgccc
8700aatacgcaaa ccgcctctcc ccgcgcgttg gccgattcat taatgcagct ggcacgacag
8760gtttcccgac tggaaagcgg gcagtgagcg caacgcaatt aatgtgagtt agctcactca
8820ttaggcaccc caggctttac actttatgct tccggctcgt atgttgtgtg gaattgtgag
8880cggataacaa tttcacacag gaaacagcta tgaccatgat tacgcca
892747114DNAArtificial SequenceDescription of Artificial Sequence
Synthetic polynucleotide 4aattcagccc catactcatg agcagtcttg
ttacagctat gccggcagca cctgaaccgt 60ttacaacaac ttctatatcc tcgattttct
tgttgacaag ctttaatgca ttgatcattg 120ctgcaacagt aacaacggct gtaccgtgct
ggtcatcatg gaatattgga atgtcacatt 180cctctttgag tcttctttct atttcaaagc
atctcggagc ggatatatct tcgaggttta 240taccgccaaa gcttccggag atgagcttga
ttgtctttac aatttcatct acgtcttttg 300atttgataca gagcggaaat gcgtccacat
caccaaactt cttgaagagt acgcatttac 360cttccataac aggcattccg gcttcaggtc
ctatgtctcc gagccctaaa accgccgtac 420cgtcggtaat aaccgctacc aggttccaac
gtcttgtata ttcataagaa agattaacat 480ctttctgaat tgcaagacat ggttctgcaa
cacccggtgt ataagcaagc gacaactctt 540ccttggttga aacaggtacc ttgtgtataa
cctcaatttt acccttccac tcaccgtgaa 600gccttagtga ttcttttctg taatccattt
gattctacct ccaaattata ttattaaata 660tctgcgatat taatgcacaa ttataaattc
ttaacttcgt tcaatacttt tttaacctgc 720tccgctgaga atcttaaagc ttcttcttct
tcaggagtca gattaaattg gagaacttcc 780tgaacacctt cggaatttac gatggatgga
aggcttattg caacatcttc tattccatac 840atgccgttta taacggttcc tacggttctt
attgtattct gattcttaag gagtgtttca 900actattgtgt tgattgaaac tgcaatacca
tagtatgttg cacctttgtt cttgataatg 960gttgcacccg cagttttaac atcttcagcg
atttttttct tgtcttcttc tgtgaaattg 1020catttcggat catcgattaa tcgcatcaga
ttgcagtata aatttaacga tcactcatca 1080tgttcatatt tatcagagct cgtgctataa
ttatactaat tttataagga ggaaaaaata 1140aagagggtta taatgaacga gaaaaatata
aaacacagtc aaaactttat tacttcaaaa 1200cataatatag ataaaataat gacaaatata
agattaaatg aacatgataa tatctttgaa 1260atcggctcag gaaaagggca ttttaccctt
gaattagtac agaggtgtaa tttcgtaact 1320gccattgaaa tagaccataa attatgcaaa
actacagaaa ataaacttgt tgatcacgat 1380aatttccaag ttttaaacaa ggatatattg
cagtttaaat ttcctaaaaa ccaatcctat 1440aaaatatttg gtaatatacc ttataacata
agtacggata taatacgcaa aattgttttt 1500gatagtatag ctgatgagat ttatttaatc
gtggaatacg ggtttgctaa aagattatta 1560aatacaaaac gctcattggc attattttta
atggcagaag ttgatatttc tatattaagt 1620atggttccaa gagaatattt tcatcctaaa
cctaaagtga atagctcact tatcagatta 1680aatagaaaaa aatcaagaat atcacacaaa
gataaacaga agtataatta tttcgttatg 1740aaatgggtta acaaagaata caagaaaata
tttacaaaaa atcaatttaa caattcctta 1800aaacatgcag gaattgacga tttaaacaat
attagctttg aacaattctt atctcttttc 1860aatagctata aattatttaa taagtaagtt
aagggatgca taaactgcat cccttaatcg 1920atgagaacaa gttcatttgc ggtttgccgc
aaagccattg tgaaggctgc agacgcacct 1980acaaaaccag caccaatgat tgcaactttt
gacctacttt ttaccatttc cataccattc 2040ctttcaatta cccagtatat ttaacggtta
gttcgtttat aaatttgaga ttaattcttt 2100aaattttaac tgtgaacccg gttcacaggt
attatcatta atttcagtat atgtgtttaa 2160taaaaattag tgaaaatttg caactgcaag
catttaaaat tgtaaacgat aaataaatcc 2220aggcaacaaa tttcccccat tttaaatagc
ccagttaaac acattgataa cattttaaca 2280ttattttata tctgcgtcca taactgaaaa
agggaaatcc attactttat gaaatcaaat 2340tttgaagtta tcaagaaatt atgacgattt
tctccgtggc atgcaagatt tcgcgatatt 2400tcattcgttt atattaattt tttatgaaaa
ctgcggtttg ggctgacaat tgcgatggaa 2460gtttcaatta gactttttgt caaatattat
gtataataat attatctata ataatgtatg 2520aaaaaattgt cctaagatgg aagacggggg
tggtttcata tatggttaaa tatctgaaga 2580ggcaggaaga gttggtagaa gaagctctct
caaaggataa ctgttctgac tgggaaagtt 2640tgagaaatta tcataagtcc caaattgaat
ttttgcagca tgagagactt gtacatttat 2700tggtaacatt gtcttttggg tttttctttc
ttatatccgt tcttgccgcc gcggtttcgg 2760aaaaatttga aatattgctt gtttcccttc
ttttgttggt acttttgata ccttatattg 2820cccattatta caaactggag aacggagttc
agaggcttta tgagctttat aacaaaattg 2880atgaaaaatg tgtaaggaaa aacaagaccg
cctgagtgga tcctctagag tcgacctgca 2940ggcatgcaag cttggcgtaa tcatggtcat
agctgtttcc tgtgtgaaat tgttatccgc 3000tcacaattcc acacaacata cgagccggaa
gcataaagtg taaagcctgg ggtgcctaat 3060gagtgagcta actcacatta attgcgttgc
gctcactgcc cgctttccag tcgggaaacc 3120tgtcgtgcca gcccttcaaa cttcccaaag
gcgagcccta gtgacattag aaaaccgact 3180gtaaaaagta cagtcggcat tatctcatat
tataaaagcc agtcattagg cctatctgac 3240aattcctgaa tagagttcat aaacaatcct
gcatgataac catcacaaac agaatgatgt 3300acctgtaaag atagcggtaa atatattgaa
ttacctttat taatgaattt tcctgctgta 3360ataatgggta gaaggtaatt actattatta
ttgatattta agttaaaccc agtaaatgaa 3420gtccatggaa taatagaaag agaaaaagca
ttttcaggta taggtgtttt gggaaacaat 3480ttccccgaac cattatattt ctctacatca
gaaaggtata aatcataaaa ctctttgaag 3540tcattcttta caggagtcca aataccagag
aatgttttag atacaccatc aaaaattgta 3600taaagtggct ctaacttatc ccaataacct
aactctccgt cgctattgta accagttcta 3660aaagctgtat ttgagtttat cacccttgtc
actaagaaaa taaatgcagg gtaaaattta 3720tatccttctt gttttatgtt tcggtataaa
acactaatat caatttctgt ggttatacta 3780aaagtcgttt gttggttcaa ataatgatta
aatatctctt ttctcttcca attgtctaaa 3840tcaattttat taaagttcat ttgatatgcc
tcctaaattt ttatctaaag tgaatttagg 3900aggcttactt gtctgctttc ttcattagaa
tcaatccttt tttaaaagtc aatcccgttt 3960gttgaactac tctttaataa aataattttt
ccgttcccaa ttccacattg caataataga 4020aaatccatct tcatcggctt tttcgtcatc
atctgtatga atcaaatcgc cttcttctgt 4080gtcatcaagg tttaattttt tatgtatttc
ttttaacaaa ccaccatagg agattaacct 4140tttacggtgt aaaccttcct ccaaatcaga
caaacgtttc aaattctttt cttcatcatc 4200ggtcataaaa tccgtatcct ttacaggata
ttttgcagtt tcgtcaattg ccgattgtat 4260atccgattta tatttatttt tcggtcgaat
catttgaact tttacatttg gatcatagtc 4320taatttcatt gcctttttcc aaaattgaat
ccattgtttt tgattcacgt agttttctgt 4380attcttaaaa taagttggtt ccacacatac
caatacatgc atgtgctgat tataagaatt 4440atctttatta tttattgtca cttccgttgc
acgcataaaa ccaacaagat ttttattaat 4500ttttttatat tgcatcattc ggcgaaatcc
ttgagccata tctgacaaac tcttatttaa 4560ttcttcgcca tcataaacat ttttaactgt
taatgtgaga aacaaccaac gaactgttgg 4620cttttgttta ataacttcag caacaacctt
ttgtgactga atgccatgtt tcattgctct 4680cctccagttg cacattggac aaagcctgga
tttacaaaac cacactcgat acaactttct 4740ttcgcctgtt tcacgatttt gtttatactc
taatatttca gcacaatctt ttactctttc 4800agccttttta aattcaagaa tatgcagaag
ttcaaagtaa tcaacattag cgattttctt 4860ttctctccat ggtctcactt ttccactttt
tgtcttgtcc actaaaaccc ttgatttttc 4920atctgaataa atgctactat taggacacat
aatattaaaa gaaaccccca tctatttagt 4980tatttgtttg gtcacttata actttaacag
atggggtttt tctgtgcaac caattttaag 5040ggttttcaat actttaaaac acatacatac
caacacttca acgcaccttt cagcaactaa 5100aataaaaatg acgttatttc tatatgtatc
aagaatagaa agaactcgtt tttcgctacg 5160ctcaaaacgc aaaaaaagca ctcattcgag
tgctttttct tatcgctcca aatcatgcga 5220ttttttcctc tttgcttttc tttgctcacg
aagttctcga tcacgctgca aaacatcttg 5280aagcgaaaaa gtattcttct tttcttccga
tcgctcatgc tgacgcacga aaagccctct 5340aggcgcatag gaacaactcc taaatgcatg
tgaggggttt tctcgtccat gtgaacagtc 5400gcatacgcaa tattttgttt cccatactgc
attaatgaat cggccaacgc gcggggagag 5460gcggtttgcg tattgggcgc tcttccgctt
cctcgctcac tgactcgctg cgctcggtcg 5520ttcggctgcg gcgagcggta tcagctcact
caaaggcggt aatacggtta tccacagaat 5580caggggataa cgcaggaaag aacatgtgag
caaaaggcca gcaaaaggcc aggaaccgta 5640aaaaggccgc gttgctggcg tttttccata
ggctccgccc ccctgacgag catcacaaaa 5700atcgacgctc aagtcagagg tggcgaaacc
cgacaggact ataaagatac caggcgtttc 5760cccctggaag ctccctcgtg cgctctcctg
ttccgaccct gccgcttacc ggatacctgt 5820ccgcctttct cccttcggga agcgtggcgc
tttctcatag ctcacgctgt aggtatctca 5880gttcggtgta ggtcgttcgc tccaagctgg
gctgtgtgca cgaacccccc gttcagcccg 5940accgctgcgc cttatccggt aactatcgtc
ttgagtccaa cccggtaaga cacgacttat 6000cgccactggc agcagccact ggtaacagga
ttagcagagc gaggtatgta ggcggtgcta 6060cagagttctt gaagtggtgg cctaactacg
gctacactag aagaacagta tttggtatct 6120gcgctctgct gaagccagtt accttcggaa
aaagagttgg tagctcttga tccggcaaac 6180aaaccaccgc tggtagcggt ggtttttttg
tttgcaagca gcagattacg cgcagaaaaa 6240aaggatctca agaagatcct ttgatctttt
ctacggggtc tgacgctcag tggaacgaaa 6300actcacgtta agggattttg gtcatgagat
tatcaaaaag gatcttcacc tagatccttt 6360taaattaaaa atgaagtttt aaatcaatct
aaagtatata tgagtaaact tggtctgaca 6420gttaccaatg cttaatcagt gaggcaccta
tctcagcgat ctgtctattt cgttcatcca 6480tagttgcctg actccccgtc gtgtagataa
ctacgatacg ggagggctta ccatctggcc 6540ccagtgctgc aatgataccg cgagacccac
gctcaccggc tccagattta tcagcaataa 6600accagccagc ccgatatggg aaacaaaata
ttgcgtatgc gactgttcac atggacgaga 6660aaacccctca catgcattta ggagttgttc
ctatgcgcct agagggcttt tcgtgcgtca 6720gcatgagcga tcggaagaaa agaagaatac
tttttcgctt caagatgttt tgcagcgtga 6780tcgagaactt cgtgagcaaa gaaaagcaaa
gaggaaaaaa tcgcatgatt tggagcgata 6840agaaaaagca ctcgaatgag tgcttttttt
gcgttttgag cgtagcgaaa aacgagttct 6900ttctattctt gatacatata gaaataacgt
catttttatt ttagttgctg aaaggtgcgt 6960tgaagtgttg gtatgtatgt gattcaataa
tttcttttac tcgctcgtta tagtcgatcg 7020gttcatcatt caccaaatca taattttcat
gtgaccgttc tttatcaata tcgggattcg 7080ttttactttc ccgttctctc tgattgtgaa
attg 711453932DNAArtificial
SequenceDescription of Artificial Sequence Synthetic polynucleotide
5gacgaaaggg cctcgtgata cgcctatttt tataggttaa tgtcatgata ataatggttt
60cttagacgtc aggtggcact tttcggggaa atgtgcgcgg aacccctatt tgtttatttt
120tctaaataca ttcaaatatg tatccgctca tgagacaata accctgataa atgcttcaat
180aatattgaaa aaggaagagt atgagtattc aacatttccg tgtcgccctt attccctttt
240ttgcggcatt ttgccttcct gtttttgctc acccagaaac gctggtgaaa gtaaaagatg
300ctgaagatca gttgggtgca cgagtgggtt acatcgaact ggatctcaac agcggtaaga
360tccttgagag ttttcgcccc gaagaacgtt ttccaatgat gagcactttt aaagttctgc
420tatgtggcgc ggtattatcc cgtattgacg ccgggcaaga gcaactcggt cgccgcatac
480actattctca gaatgacttg gttgagtact caccagtcac agaaaagcat cttacggatg
540gcatgacagt aagagaatta tgcagtgctg ccataaccat gagtgataac actgcggcca
600acttacttct gacaacgatc ggaggaccga aggagctaac cgcttttttg cacaacatgg
660gggatcatgt aactcgcctt gatcgttggg aaccggagct gaatgaagcc ataccaaacg
720acgagcgtga caccacgatg cctgtagcaa tggcaacaac gttgcgcaaa ctattaactg
780gcgaactact tactctagct tcccggcaac aattaataga ctggatggag gcggataaag
840ttgcaggacc acttctgcgc tcggcccttc cggctggctg gtttattgct gataaatctg
900gagccggtga gcgtgggtct cgcggtatca ttgcagcact ggggccagat ggtaagccct
960cccgtatcgt agttatctac acgacgggga gtcaggcaac tatggatgaa cgaaatagac
1020agatcgctga gataggtgcc tcactgatta agcattggta actgtcagac caagtttact
1080catatatact ttagattgat ttaaaacttc atttttaatt taaaaggatc taggtgaaga
1140tcctttttga taatctcatg accaaaatcc cttaacgtga gttttcgttc cactgagcgt
1200cagaccccgt agaaaagatc aaaggatctt cttgagatcc tttttttctg cgcgtaatct
1260gctgcttgca aacaaaaaaa ccaccgctac cagcggtggt ttgtttgccg gatcaagagc
1320taccaactct ttttccgaag gtaactggct tcagcagagc gcagatacca aatactgtcc
1380ttctagtgta gccgtagtta ggccaccact tcaagaactc tgtagcaccg cctacatacc
1440tcgctctgct aatcctgtta ccagtggctg ctgccagtgg cgataagtcg tgtcttaccg
1500ggttggactc aagacgatag ttaccggata aggcgcagcg gtcgggctga acggggggtt
1560cgtgcacaca gcccagcttg gagcgaacga cctacaccga actgagatac ctacagcgtg
1620agctatgaga aagcgccacg cttcccgaag ggagaaaggc ggacaggtat ccggtaagcg
1680gcagggtcgg aacaggagag cgcacgaggg agcttccagg gggaaacgcc tggtatcttt
1740atagtcctgt cgggtttcgc cacctctgac ttgagcgtcg atttttgtga tgctcgtcag
1800gggggcggag cctatggaaa aacgccagca acgcggcctt tttacggttc ctggcctttt
1860gctggccttt tgctcacatg ttctttcctg cgttatcccc tgattctgtg gataaccgta
1920ttaccgcctt tgagtgagct gataccgctc gccgcagccg aacgaccgag cgcagcgagt
1980cagtgagcga ggaagcggaa gagcgcccaa tacgcaaacc gcctctcccc gcgcgttggc
2040cgattcatta atgcagctgg cacgacaggt ttcccgactg gaaagcgggc agtgagcgca
2100acgcaattaa tgtgagttag ctcactcatt aggcacccca ggctttacac tttatgcttc
2160cggctcgtat gttgtgtgga attgtgagcg gataacaatt tcacacagga aacagctatg
2220accatgatta cgccaagctt gtagttggtg caggctttgt aggttccacc acagcttata
2280cattgatgct cagcggactt atatctgaaa ttgtactgat agacataaat gcaaaaaaag
2340ccgacggaga agtcatggac ttaaatcacg gcatgccttt tgtaaggccc gttgaaattt
2400atcgtggtga ctacaaagac tgtgccggat ccgacatagt aatcattacc gccggtgcca
2460accaaaaaga aggcgaaacg agaatagatc ttagttcaac aaacgggatt gacttttaaa
2520aaaggattga ttctaatgaa gaaagcagac aagtaagcct cctaaattca ctttagataa
2580aaatttagga ggcatatcaa atgaacttta ataaaattga tttagacaat tggaagagaa
2640aagagatatt taatcattat ttgaaccaac aaacgacttt tagtataacc acagaaattg
2700atattagtgt tttataccga aacataaaac aagaaggata taaattttac cctgcattta
2760ttttcttagt gacaagggtg ataaactcaa atacagcttt tagaactggt tacaatagcg
2820acggagagtt aggttattgg gataagttag agccacttta tacaattttt gatggtgtat
2880ctaaaacatt ctctggtatt tggactcctg taaagaatga cttcaaagag ttttatgatt
2940tatacctttc tgatgtagag aaatataatg gttcggggaa attgtttccc aaaacaccta
3000tacctgaaaa tgctttttct ctttctatta ttccatggac ttcatttact gggtttaact
3060taaatatcaa taataatagt aattaccttc tacccattat tacagcagga aaattcatta
3120ataaaggtaa ttcaatatat ttaccgctat ctttacaggt acatcattct gtttgtgatg
3180gttatcatgc aggattgttt atgaactcta ttcaggaatt gtcagatagg cctaatgact
3240ggcttttata atgtacatgc ttatattatt ggcgaacacg gtgacaccga agttgcggcc
3300tggagtcttg caaatattgc gggaattccc atggatcgct actgtgacga atgccatcag
3360tgcgaggagc agatttcccg gaataaaata tatgaaagtg ttaaaaatgc agcttatgaa
3420atcatcagga acaaaggtgc aacctattat gccgtagccc ttgccgtaag aagaatcgtt
3480gaagccattg tactgcaggt cgactctaga ggatccccgg gtaccgagct cgaattcact
3540ggccgtcgtt ttacaacgtc gtgactggga aaaccctggc gttacccaac ttaatcgcct
3600tgcagcacat ccccctttcg ccagctggcg taatagcgaa gaggcccgca ccgatcgccc
3660ttcccaacag ttgcgcagcc tgaatggcga atggcgcctg atgcggtatt ttctccttac
3720gcatctgtgc ggtatttcac accgcatatg gtgcactctc agtacaatct gctctgatgc
3780cgcatagtta agccagcccc gacacccgcc aacacccgct gacgcgccct gacgggcttg
3840tctgctcccg gcatccgctt acagacaagc tgtgaccgtc tccgggagct gcatgtgtca
3900gaggttttca ccgtcatcac cgaaacgcgc ga
393261320DNAThermoanaerobacterium saccharolyticum 6atgaataaat attttgagaa
cgtatctaaa ataaaatatg aaggaccaaa atcaaataat 60ccttattcct ttaaatttta
caatccagag gaagtaatcg atggcaagac gatggaggag 120catctccgct tttctatagc
ttattggcac acttttactg ctgatggaac agatcaattt 180ggcaaggcta ctatgcaaag
accatggaac cactacacag atcctatgga tatagcgaaa 240cgaagggtag aagcagcatt
tgagtttttt gataagataa atgcaccttt cttctgcttc 300catgataggg atattgcccc
tgaaggagat actcttagag agacaaacaa aaacttagat 360acaatagttg ctatgataaa
ggattactta aagaccagca agacaaaagt tttgtggggt 420accgcaaatc ttttctccaa
tccgagattt gtacatggtg catcaacatc ctgcaatgct 480gacgtttttg catattctgc
agcgcaagtc aaaaaagccc ttgagattac taaggagctt 540ggccgcgaaa actacgtatt
ttggggtgga agagaagggt acgagacgct tctcaataca 600gatatggagt tagagcttga
taactttgca agatttttgc acatggctgt tgactatgca 660aaggaaatcg gctttgaagg
tcagttcttg attgagccga agccaaagga gcctacaaaa 720catcaatacg actttgacgt
ggcaaatgta ttggcattct tgagaaaata cgaccttgac 780aaatatttca aagtaaatat
cgaagcaaac catgcgacat tggcattcca cgacttccaa 840catgagctaa gatacgccag
aataaacggt gtattaggat caattgacgc aaatacaggc 900gacatgcttt tgggatggga
tacggaccag ttccctacag atatacgcat gacaacgctt 960gctatgtatg aagtcataaa
gatgggtgga tttgacaaag gtggccttaa ctttgatgca 1020aaagtaagac gtgcttcatt
tgagccagaa gatcttttct taggtcacat agcaggaatg 1080gatgcttttg caaaaggctt
taaagttgct tacaagcttg tgaaagatgg cgtatttgac 1140aagttcatcg aagaaagata
cgcaagctac aaagaaggca ttggcgctga tattgtaagc 1200ggtaaagctg acttcaagag
ccttgaaaag tatgcattag agcacagcca gattgtaaac 1260aaatcaggca gacaagagct
attagaatca atcctaaatc agtatttgtt tgcagaataa
132071566DNAThermoanaerobacterium saccharolyticum 7atgagggcgg cttcatgctt
cattaaagct gccctcaaca aaaatcatgg aggtaaatgt 60atgtattttt tagggataga
tttagggaca tcatcagtta agataatact gatgaatgaa 120agcggcaatg tggtatcaag
cgtttcaaaa gaatatcctg tgtactatcc agagccaggc 180tgggctgagc aaaatccaga
agattggtgg aatggcacaa gggatggaat aagagagatt 240attgcgaaaa gcggcgtaaa
tggcgatgaa ataaagggtg ttggcttaag cgggcagatg 300catggactgg tgcttttaga
caaagacaat aacgttttaa cgccagccat actttggtgt 360gaccagagga cacaggaaga
atgcgactac atcacagaga aaataggaaa agaaggcctt 420ttgaagtaca cagggaataa
agcattgaca ggttttactg caccaaagat attatgggta 480aagaagcacc ttaaagacgt
atatgaaaga atcgctcata tccttttgcc aaaagattat 540ataaggttta aattgacagg
tgagtacgct acagaagttt cagatgcatc aggtacactt 600cttttcgatg tggaaaatag
aagatggtca aaggaaatga tagacatatt tgaaataccg 660gaaaaagccc ttcctaagtg
ctacgaatca acagatgtca cagggtatgt caccaaagag 720gcagcagatt tgacagggct
tcatgaaggg actattgtcg taggcggtgg tggtgaccaa 780gccagcggcg ctgtaggcac
tggcacggtg aaaagcggca tagtgtccat cgcattagga 840acttcaggcg tcgtatttgc
atcacaggac aagtacgcag cagatgatga gcttaggctt 900cactcattct gccatgcaaa
cggcaaatgg catgtgatgg gtgtcatgct ttcggctgca 960tcatgtctta aatggtgggt
agatgatgta aataattaca agaccgatgt tatgacattt 1020gatggactct tagaagaagc
agagaaggtg aagccaggca gtgatggatt gatattcttg 1080ccatacctga tgggtgaaag
gaccccttac agcgatcctt atgcgagagg cagctttgta 1140ggtttaacaa ttacacacaa
tagaagccac atgacaagat ctatattaga aggcgtcgca 1200tttggactta gggattcgct
ggagcttata aaggctttaa atatacctgt aaatgaagcc 1260agggtaagtg gtggtggtgc
taaaagcagg ctttggaggc aaatacttgc cgatgtattc 1320aatgtaagga tagacatgat
aaatgctaca gaaggacctt catttggtgc agcaataatg 1380gcgtctgtgg gatatggcct
ttacaaaaat gtagatgatg catgcaatag tttaataaaa 1440gttacagaca gcgtatatcc
aatcaaagaa aacgtcgaaa agtacaacaa actgtatcca 1500atctacgtga gcttgtattc
aaggcttaaa ggcgcctttg aagaaattgg gaagttggat 1560ttgtaa
156681407DNAThermoanaerobacterium saccharolyticum 8atgattattg tgtacaaaga
tgaaaagccc aggataggtt ttttgggtat tatgcaggag 60ttatacgatg atatgttgcc
tggtattact gaaaggcaag agatgtatgc acaacaggtt 120ataggtagat taggtgatgt
tgctgatttt tatttcccag gtgctgcaaa aaacagaaat 180gatatagaaa ggatagttaa
ggaattcaac gataaagatc ttgacggaat aatgatcgtg 240atgctgacat acggaccagc
cacaaatctt gtgaatgctt taagaaacaa taggcttccg 300attatgctgg cgaatataca
gccagaaagc actgtgacag acgattggga tatgggggac 360ttgacctaca accaaggtgt
tcatggtgca caggatactt caaatattat tctgagaatg 420ggcataactt gtcctgttat
aacagaagat tggcattctg atgaatttaa agattttgtg 480aatgattggg caaaaactgt
aaagacagta aaagctttga ggaatatgaa gatagcacaa 540tttggaagaa tgcatggtat
gtatgacata atgggtgatg atgcagcttt tacaagaaaa 600ttggggccgc aaataaacca
ggagtacatt ggccaagttt ttagatatat ggaagaagct 660acaaatgaag aaattgacaa
agtgatagag gaaaacaaga agaactttta tatagatcct 720aaattaagtg atgagagcca
cagatatgct gcaaggcttc aaataggatt taagaaattg 780cttgaggaga aagggtactc
tggctttagt gctcactttg atgtgtttaa aggcgatgga 840agatttaagc agatacacat
gatggcagca tcaaacttga tggcagaagg atatggctat 900gcggcagagg gcgatgtagt
tacggcaagc ctggtggcag caggtcatgt tttgataggc 960aatgcacact ttaccgagat
gtatgcgatg gattttaaga gagattcaat tttgatgagc 1020cacatgggag agggcaattg
gaagatagcc agaaaagata gacctataaa attagtcgat 1080cgagagcttg gcataggaaa
gcttgataat cctccaacag tggtgtttat ggctcaacca 1140ggcattgcga cattggcatc
attagtgtct ttagaaggcg aaaaatatag acttgttgtt 1200tcaaagggag aaattttaga
tacagaagaa gcgaaaaata tagagatgcc gtatttccat 1260tttagacctg aaaacggagt
tagggcttgt ctaaatggct ggcttaaaaa tggtggtaca 1320catcatcagt gcttgacatt
aggtgatgct actaaaagat ggaagctttt atgcgaatta 1380ttagatatcg agtatgttga
agtgtaa
14079642DNAThermoanaerobacterium saccharolyticum 9atgttagaga acctaaaaca
acgtgtatat aaaatgaaca tgatgcttcc taaaaacaat 60ttagtcacaa tgacaagcgg
caatgtcagc ggaagagatc ctgagacaaa tcttgtagtc 120ataaagccca gcggagtttt
gtacgatgaa atgacgccag atgatatggt agtcgtggat 180ttggatggca atgtggttga
gggtaagcta aaaccatctg tcgatactgc tacacatctt 240tacgtctaca ggcatagaaa
tgatgtaaac ggcattgtcc atacacactc accgtatgct 300acaagttttg ccgcacttgg
ccggtcaatt ccggtctatc ttacagctat tgcagacgag 360tttggatgcg caattcctgt
agggccttat gccaaaattg gcggggaaga gataggaaaa 420gccatcgtag attatatagg
tgagagtcct gcaatactta tgaaaaatca cggcgttttt 480accattggca attcacctga
agcagcctta aaagctgctg ttatggtaga agatacagct 540aagacggtgc acttatcact
gcttttaggc acacctgatg taataccaga tgaagaagta 600aaaagagccc atgaaagata
tcttacaaaa tacggtcaat ga 642101320DNAClostridium
cellulolyticum 10atgtcagaag tatttagcgg tatttcaaac attaaatttg aaggaagcgg
gtcagataat 60ccattagctt ttaagtacta tgaccctaag gcagttatcg gcggaaagac
aatggaagaa 120catctgagat tcgcagttgc ctactggcat acttttgcag caccaggtgc
tgacatgttc 180ggtgcaggat catatgtaag accttggaat acaatgtccg atcctctgga
aattgcaaaa 240tacaaagttg aagcaaactt tgaattcatt gaaaagctgg gagcaccttt
cttcgctttc 300catgacaggg atattgctcc tgaaggcgac acactcgctg aaacaaataa
aaaccttgat 360acaatagttt cagtaattaa agatagaatg aaatccagtc cggtaaagtt
attatgggga 420actacaaatg ctttcggaaa cccaagattt atgcatggtg catcaacttc
gccaaacgct 480gacatatttg cgtatgcagc agctcaggtt aaaaaggcaa tggaaatcac
aaaggaatta 540ggcggagaaa actatgtatt ctggggtggt agagaaggtt atgaaactct
cttgaataca 600gacatgaagc tggaacttga taatttagca agattcttga agatggctgt
tgactatgct 660aaggaaatcg gttttgacgg acaattccta atcgaaccaa agccaaaaga
accaactaag 720caccaatatg attttgatac agctacagtt atcggcttcc tgaagacata
tggattagac 780ccatacttca agatgaatat cgaagctaac catgctacat tagcaggaca
cacattccaa 840catgagcttg ctatgtgcag aatcaacgac atgcttggaa gtattgatgc
taaccaaggt 900gatgtaatgc tcggatggga tacagaccaa ttcccaacga acctatatga
tgcaacacta 960gcaatggtgg aagtattaaa ggccggcgga ttgaaaaagg gaggtttgaa
cttcgactca 1020aaagttagaa gaggatcatt cgaaccatca gacttgttct atggacatat
tgcaggtatg 1080gatacttttg caaagggtct tatcatagca aataagatcg ttgaggacgg
taagtttgat 1140gcatttgttg ctgacagata ctcaagctac acaaatggta tcggaaaaga
tattgttgaa 1200ggaaaagttg gctttaagga attggagcaa tatgcactta ctgcaaagat
tcagaacaag 1260tctggacgtc aggaaatgct ggaagctttg ttaaaccagt atatcctcga
aacaaaataa 1320111608DNAClostridium cellulolyticum 11atgaagcatg
aactaaatga cgggagaaat gctattctaa atggaaagac agcaattggg 60attgaactcg
gatcaactag aataaaaacg gtattgatag gtgcagacaa tgcacctatc 120gcatccggta
gtcatgactg ggaaaacagc tatatcaata atatttggac ttacagcttg 180gaagatatct
ggaaaggcgt acagagcagc tatcaggaaa tggttaaaga tgttagggac 240aaattcggag
taagtctaaa gacaaccgga gcaataggtt ttagcggaat gatgcacggt 300tatatggttt
ttgataagga aggtaatctt ctgactcatt tcagaacatg gcgtaacact 360ataactgcac
aggcttccga ggaactaacc aagttgttta attatcctat tcctcaaagg 420tggagcattg
cccatcttta ccaagccata ctgaacaatg aagagcatgt atccaatatc 480gattttatga
ctacattggc cggatttata cactggaagt tgacaggaga aaaagttctt 540ggtgtcggag
aggcatcagg tgttttccca atagatttag atactaagga ttttaattca 600agtatgatta
atcagtttaa tgaggctacc accaatcgaa atttttcatg gaagcttcaa 660aatattcttc
caaaagtttt ggtttcgggt actgaagcag gtaggctgac agaagaaggt 720gcaaagcttc
ttgatgttac cggggagctt caggcgggta ttcctttttg tccccctgag 780ggagatgcgg
gaaccggtat ggttgcaact aacagcgttg ctgtccgtac aggcaatgtg 840tctgccggga
cttctgtttt tgctatggtt gttctcgaaa aggaattatc caaagtgtat 900tcggaaattg
acctggtgac tacacctgac gcaaatcttg tggctatggt tcattcaaat 960aattgtacat
cggactatga cgcatggatg ggtatatttg ctgaggcagt taagaccttg 1020ggctttgacg
tgaaaaaacc acagctatat gataccctgc tgggagccgc acttcaaggt 1080gaccctgatt
gcggagggtt gcttgcgtac ggttatattt caggtgagca tattacccat 1140tttgaagaag
gtcgcccgat ggttgttcgt tcatcaaaca gcaaattcaa cctggccaac 1200tttatcaggg
tcaatttgtt tacatctctt ggagccttga agaccggttt ggatattctt 1260tttcaaaagg
aagctgttaa agtggacggt attaccggac acggcggttt ctttaagacg 1320aaggaagtag
gacagaagat tatggcggct gcctttaatg tccctgtatc tgttatgaag 1380actgcgggtg
aaggcggtgc atggggtatt gccctacttg cttcgtatat gattaatagg 1440gaaagctcac
agtccttgga ggattttctt aaacaaaatg tgtttgggga aagccaaggt 1500gagactgtac
agccagattc gaaggatgtt gacggtttca acgagtttat gaaaaggtac 1560acaaagggac
tgggtattga aagggctgcg ataaacttct tgaactga
1608121380DNAClostridium cellulolyticum 12atgataacca aacaaaaacc
aagaatcgga tttttgggcc taatgcaggg attgtatgac 60gaatcacagc cggaactgcc
gaaaatgcag gaggcatttg ccagagaagt ggttgaacaa 120ttaaaagatg tggcagatat
tgattttccc ggtccagcaa aagaaagaga agatatagaa 180agatatgtaa aatatttcaa
tgataaagag tacgatggaa taatgatagt aaatctgttg 240tacagtccgg gaaatcgttt
aatacaggct atgaagaata ataatctgcc aatattgctg 300gctaatattc aaccacttcc
cgatgttaca tcaaactggg attggatttt gtgcacaact 360aatcagggaa ttcatggaat
acaggataca agtaatgttc tcatgcgttg tggtattaaa 420ccggctatta taacagatga
ttggaaggct gaatccttta aagcctactt tgaagattgg 480gcattggctg ccaacacgca
taacagacta aaaaagacaa aggttgcgat tttcggccgt 540atgcacaata tgggtgacat
acttggtgat gatgcggcat tgtgcagaaa atttggtgta 600gaggcaaacc atgtaacaat
cggtccggtt tattacaaca tggaaggatt gtcagataaa 660gaagtagatg cccagattga
ggaagataaa aagaatttta aaattgatcc taatcttcct 720gaagaaagtc atcggtatgc
tgcacgtatg caattagcct ttgaaaaatt ccttaatgat 780aacggttatg aaggtttttc
acagttcttc aacatataca aggaagacgg caggttcaaa 840caaataccga tattggcagg
ctccagtctc cttgcaaaag gttatggtta ttcggcggaa 900ggtgatacaa atgtacttct
catgactgtg atcggtcaca tgatgatagg ggatcctcat 960tttactgaga tgtactccct
ggactttggt aaggattcag caatgctaag ccatatggga 1020gaaggcaact ggaaggttgc
aaggaaggat cgcggagtga cactgattga caggcctctt 1080gatattggtg gtcttggtaa
tcctccgaca ccaaagttca acgtagaacc aggaacagct 1140acccttgttt ccctcgttgc
agtagaagga gaaaaatacc aactaattgt atcaaagggt 1200actatccttg atactgagga
cttgccagat gttcctatga accatgcttt tttcagaccg 1260gattccggca tcaaaaaggc
tatggacgaa tggttagcta atggtggtac acatcacgaa 1320gtactattcc tgggtgattt
tagaagacgt tttgaattat tatgtaaatt cttgacataa 138013690DNAClostridium
cellulolyticum 13atgttggaac aactaaaaca agcggtgttg gaagccaatc tagagctgcc
tgaaaaagga 60cttgtaacat atacatgggg aaatgtaagc ggtatcgaca gagaaagcag
acttattgca 120attaaaccca gtggtgttga gtataatgtt atgacagctg atgatattgt
attaatcgac 180cttacaggta aagtggtgga aggaaaattg aagccgtctt ctgatgcacc
aacacatgta 240gctctgtata atgcatttcc tgatatagga ggtgtaacac acacccattc
caggtgggca 300actgcttttg cacaggctgg tatggggatt cctgcttacg ggactactca
tgcggattac 360ttttatggtg aaatcccatg tactcgggaa atgacaaagg atgagattga
gtccgattat 420gaagcaaata ccggaacggt gataatagag acttttaaag atttaaatcc
taactatatc 480cctgccgtac ttgtaaaaaa tcatgcacct tttacatggg gaaaaagtgc
agcggaatcg 540gttcataatt ctgttgtttt agaagaagta gctatgatgg ctattcagtg
cagacaactg 600aacccaaatg taactcccat gccgcaggtg ctgctagaca agcattttat
gaggaagcac 660ggcccgaaag cttattacgg acaaaaataa
690141317DNAClostridium phytofermentans 14atgaaaaatt
actttccaaa tgttccagaa gtaaaatacg aaggcccaaa ttcaacgaat 60ccatttgctt
ttaaatatta tgacgcaaat aaagttgtag cgggtaaaac aatgaaagag 120cactgtcgtt
ttgcattatc ttggtggcat actctttgtg caggtggtgc tgatccattc 180ggtgtaacaa
ctatggatag aacctacgga aatatcacag atccaatgga acttgctaag 240gcaaaagttg
acgctggttt cgaattaatg actaaattag gaattgaatt cttctgtttc 300catgacgcag
atattgctcc agaaggtgat acttttgaag agtcaaagaa gaatcttttt 360gaaatcgttg
attacatcaa agagaagatg gatcagactg gtatcaagtt attatggggt 420actgctaata
actttagtca tccaagattt atgcatggtg cttccacatc ttgcaacgca 480gacgtatttg
catatgctgc tgctaagatt aagaatgcat tagatgcaac aattaaatta 540ggcggtaaag
gttatgtatt ctggggtggt cgtgaaggtt atgaaacact tcttaataca 600gatttaggac
ttgagcttga taatatggct agacttatga agatggctgt agagtatggc 660cgtgcaaatg
gttttgatgg cgacttctat attgagccaa agccaaagga accaaccaag 720catcaatatg
attttgatac agcaaccgta cttgctttcc ttcgcaaata tggcttagaa 780aaagatttca
agatgaacat tgaagcaaac catgctactc ttgcaggtca tacctttgaa 840catgaacttg
caatggctag agttaatggt gcatttggtt ctgtagatgc aaaccagggt 900gatccaaacc
ttggatggga tacggatcaa ttcccaactg atgttcatag tgcaactctt 960gcaatgcttg
aagtacttaa ggctggtgga ttcactaacg gcggacttaa ctttgatgca 1020aaggtaagac
gtggttcctt cgaatttgat gatattgcat acggttatat tgcaggaatg 1080gatacttttg
cacttggttt aattaaggct gctgagatta tcgacgatgg tagaatcgca 1140aaatttgtag
atgatcgtta tgcaagctat aaaacaggaa ttggtaaagc aattgtggat 1200ggaactacat
ctcttgaaga attagagcag tatgttttaa cacatagtga accagtaatg 1260cagagtggtc
gtcaggaagt tcttgaaaca atcgtaaata atattttatt tagataa
1317151599DNAClostridium phytofermentans 15atgggcatgg agcattttaa
agatgcgatt cttacgggta aaacaacact tggaattgag 60cttggttcca ctagaataaa
agctgtttta gtaaatgaag aaaacgaacc aattgcgtca 120ggaagccatg attgggaaaa
tcaatatatt gataatgtat ggacttacaa tctggatgat 180atctggaggg gcgttcagaa
tagttatgga caaatgacaa gtgatgttaa gaataagtac 240ggagtagaac ttacaacaat
tggagccatt ggttttagtg gaatgatgca tggctatatg 300gcttttgacg aaagtggaga
gttacttgta ccatttcgta cctggagaaa tacaataaca 360ggaccagcat ccgagcagtt
gaccaatgta tttcagtatc aaattccaca acgttggagt 420attgcccatc tatatcaagc
tatcttaaat ggggaatccc acgtgaaaaa tattagattc 480ctgacaacat tggcaggata
tattcactgg aagctaacag gagaaaaagt attaggagtc 540ggagaagcat ctggaatgtt
tccaatcgat ataaatacga aagattttaa taaatcaatg 600ttagctcagt ttaatgaact
ggttgcttcg aatgactatt catggaaaat agaagatatt 660ctaccgaaag tactagttgc
aggagagtct gctggagtat taaccgaaga aggagtaaaa 720cttcttgatg tttcaggtaa
attaaaagca ggaattcctc tttgtccgcc ggaaggagat 780gctggaactg gtatggtagc
aaccaacagc gtagcaaaga gaactggtaa tgtatctgct 840ggtacttctg tatttgcaat
ggctgtatta gaaaaagagc tttcaaaagt ttacgaagaa 900attgaccttg tgacgactcc
aagcggagat cttgtggcta tggtgcactg caataactgt 960acttctgatt tgaatgcgtg
ggtttctatc tttaaagaat ttgcttcggc aatgggcatg 1020gaagctgata tgtcaaagat
attctcaacg ctatacaata aggcgttaga aggcaatgca 1080gagtgtggag gcttactcgc
atacaattat ttttccggtg aacatataac acactttgaa 1140gaaggccgcc cattgtttgt
aagaactcca gagagtaagt ttaaccttgc gaatttcatg 1200agagttcatc tattcacagc
acttggtgct ttaaagatag gtcttgatat cctattaaaa 1260caagaatcag tacaattgga
tgagattttt ggtcatggtg gattatttaa gacgaaagat 1320gtcggacaaa aaattatggc
tggtgcaatc aatgttcctg tttctgtgat ggagactgcc 1380ggagaaggcg gagcatgggg
aatcgcaatc ttggcttctt atatgatttc taaggaagaa 1440ggtcagtcct tagatgagta
tctttctaaa catgtattcc aaggaaagac aggtagcaag 1500atgcagccgg atccaaggga
tgtagaaggt tttgaacagt ttatcaaacg atatattgac 1560ggacttgaaa ttgagcgtaa
agcagtggag atattataa 159916240DNAClostridium
thermocellum 16actttgccaa cggtacaagg gaagttgcaa gagcggttgc cgagtccgga
gcaatttcaa 60taatcggagg cggagattct gccgcagcta tagaacagct tggttttgcc
gataagatta 120cccacatttc aaccggaggc ggcgcgtctt tggagtttct tgaaggaaaa
gtattgccgg 180gaattgatgt attaatggat aaataaggag agaagaggtc atgagtagaa
aagttattgc 24017420DNAClostridium thermocellum 17aattactgta tctctctggc
attgccaggt tttaataaag attaaaatta ttgactagaa 60ataaaaaaat tgtccataat
attaatggac aaaaaaacaa agaattacat caaaggaaga 120taaaaatact ttgttaaaaa
attaattatt ttttatctaa actattgaaa atgaaaataa 180aataatataa aatgaatcat
agtgcaagag atacttgcca gaggatgaat attttactgc 240attcatgctt tatggcagct
aatagaggca ttaaattaaa ttttaattta caataggagg 300cgatattaat ggcagtaaaa
attggtatca acggttttgg acgtatcggt cgtcttgtgt 360tcagggccag tctcaacaac
ccgaacgttg aggttgtagg tataaacgac ccatttattg 42018300DNAClostridium
thermocellum 18ctgccccatt aaaagctcgg ttccaaccgc taatatctcc gcattcatat
tgaaagaccc 60cttaaattta aactttttgt aacttattat atcaattagt gttataaaat
aaaagggaaa 120aagaattaaa atcaaaggtt tcaagagcag ccgtatcacc cgtaaaagtt
tcagccgatt 180caaccttttt acacataaaa ctttcaaaaa ttgatgactt acaattatca
agtaggatat 240aatattacta atgctaaaca gttattgata aaggaggaag gaatatgaac
aataacaaag 3001930DNAArtificial SequenceDescription of Artificial
Sequence Synthetic primer 19ggcggaattc cttggtctga caatcgatgc
302033DNAArtificial SequenceDescription of
Artificial Sequence Synthetic primer 20ggcggaattc tatcagttat
tacccacttt tcg 3321674DNAArtificial
SequenceDescription of Artificial Sequence Synthetic polynucleotide
21gaattctgcg acagaatagg gattgacaat tcctttataa agcaatcaag gggttcagaa
60gaggctgtta ttttgaataa agagctaaag aatcacaaag atgcaataga ggctgttatt
120tctgcactga ctgacgataa tatgggcgtt ataaaaaaca tgtccgaaat atcagcagtg
180ggacacagaa tagtacacgg cggtgaaaaa ttcaacagtt ctgtagttat agatgaaaac
240gttatgaatg cagtaagaga gtgtatagac gttgcaccgc ttcataatcc gccgaatatt
300ataggtatag aggcttgcca gcagattatg cccaatatac ctatggtagc tgtatttgat
360accactttcc acagctccat gcctgattat gcataccttt acgcattgcc atatgaactt
420tatgaaaagt acggtataag aaaatatggt ttccacggaa catcacacaa atatgttgca
480gaaagagctt ctgcaatgct tgataagtct ttgaacgaat taaagataat tacatgccat
540cttgggaacg gttcaagtat ttgtgctgtt aacaagggta aatcaattga tacttccatg
600ggctttacac ctttgcaggg acttgcaatg ggtacaagaa gcggtacaat agaccctgaa
660gttgttacga attc
67422951DNAClostridium thermocellum 22atgaaaaata aatctataaa taaaatagta
attgtaggta cgggttttgt cggttcaaca 60actgcctata ctttaatggt cagcggacta
gtttccgaga ttgtacttat tgaccgtaac 120acaagcaaag ccgaaggaga ggcaatggat
atgaatcacg gtatgccctt tgtaagacct 180gtcagaatat acaaaggtga ttatcctgat
tgcaaaggtg ctgatattgt tgtaataaca 240ggtggagcaa accagaagcc cggtgaaacc
agaattgacc ttgtaaataa aaatactgaa 300gtttttaaag acattgttgg aaatatcatt
aaatacaata cagactgtat tttacttgtt 360gttacaaacc cggttgatat cttaacctat
gtaacataca aattatccgg atttcccaaa 420aacagagtta taggctccgg aacagttctt
gatactgcac gtttcaaata tatgcttggt 480gaacacatgg gagttgaccc aagaaacgtt
catgcttata taatcggtga acatggagat 540acagaggtac ctacatggag tctggcatcc
atagccggga taccgatgga tgcttattgc 600aaggaatgta aatcctgtga tgctgaaaac
tttaagagtg aaacttttga caaagtaaaa 660aatgcagctt atgaaattat tgatagaaaa
aatgcaacct actacgccgt tgctcttgca 720gtaagaagaa ttgtagaggc tatcgttcgt
aatgaaaact ccatattgac ggtatcaagc 780ctattcgaag gagaatacgg cctcaatgac
atatgtctca gtattcccag ccaggtaaat 840tcggagggtg tttcaaggat tttgaatatt
cctctgagca gtgaggaaac aggtttactt 900aataaatctg cccaggcctt gaaacaggtt
atcagtgggc tgaatttata a 95123933DNAClostridium thermocellum
23atgggtttta aagttgcgat cataggagca ggatttgttg gagcatcagc tgcgtatgcg
60atgtctataa acaacttggt ttctgaattg gtattaattg atgtaaataa agagaaggct
120tatggtgaag cacttgatat cagccatggc ttatcattct caggaaatat gacagtttat
180tccggcgact attctgatgt taaggattgt gatgttatag ttgtaactgc aggggcagca
240agaaaaccgg gagaaactcg tttggacctt gctaaaaaga atactatgat catgaagagc
300atagttactg atataatgaa gtactacaat aagggtgtta ttgtaagtgt atcaaatcct
360gttgatgtat tggcatatat gacacaaaag tggtcaggat tgcctgcaaa taaagttata
420ggatcaggaa cagttcttga cagtgcaaga ctgagaactc atatcagtca ggcattggat
480gtagacattg ctaacgttca cggttatatt gttggtgaac atggtgattc tcagttgcca
540ttatggagtg caacacatat agcaggagta caatttgacg actatgtaaa agctactggc
600ttaaatgttg ataaggaagc tcttttcaat gaagttaagg tagcaggtgc aactattatt
660aagaacaagg gagcaactta ctacggtata gctctttcaa ttaacagaat agttgaatca
720atcctgaagg acttcaatac tattatgcct gttggtacag ttcttgacgg acagtacgga
780ttaaaggatg ttttattaaa cgttcctacg atagttggcg gaaacggagc tgaaaaagtt
840cttgaagtga acattacaga tgcagaatta caacttttga agcattcagc tgaacaggtt
900agggcagtta ttaacgaagt taaagacata taa
933242830DNAArtificial SequenceDescription of Artificial Sequence
Synthetic polynucleotide 24gacgcataca ggttgtaaca cccatttccc
ttagcttttc gggagatgaa taaaacaaac 60tttccgggtc ctttaccaca ccgcccacat
aaagagctat gccgcatgaa agaaacgata 120tgttatcatt tttttcgtaa actgttattt
ccgaacccgg ataaagcttt accatattat 180taactgctgc cgtccctgca tgtgtacacc
ctataaccac tattttcata tacatcctcc 240tttgtttgct tgtaaatata tcccatatat
accacctaaa tatattttat aaacaaattc 300ggtatatcat tcttttggta aataaaaagt
acatccgata ttagaatgta cctaaaaaaa 360attattattt tattgtatat gctttatctg
ttttcattat atggtttgct atccattcta 420cggtaaaatc aagtaattcc attaagtact
gatcctgatc cttgtctatc ctgctataat 480ccgtattact gattttctca ataaaatcat
ggtgttcaac tttgtgggag agaagcttgc 540gatatcctat gctatgcatg tattcttctt
cataggtaaa atgaaagaca gtgtaatctt 600ttagttccgt aattagccgt acaatttcat
catatttgtc tgtaataagc tgatttttcg 660tggcctcata aatttccgaa gcaatctgga
atagtttctt atgctgttcg tcgattttct 720caattccaag aataaattcg tctctccatt
ctatcatatg gaccctccta aattgtaatg 780tataccaaga ttatacatac ttcctagaat
ataaacaata caaggataaa attttaatat 840cgtataccta cataaatgac taacttaaag
ctctctaaaa cttctttttt attatttcta 900tactactaaa atcaaaaata ttctctaaag
tatttctaca aatgttgttt ttgcaacaaa 960gtagtatact tttgcaccca gaatgttttg
ttataactta caaattaggg gtatatttat 1020agtaaatact aaatggaaga gtaggatatt
gattatgaac gagaaaaata taaaacacag 1080tcaaaacttt attacttcaa aacataatat
agataaaata atgacaaata taagattaaa 1140tgaacatgat aatatctttg aaatcggctc
aggaaaaggg cattttaccc ttgaattagt 1200acagaggtgt aatttcgtaa ctgccattga
aatagaccat aaattatgca aaactacaga 1260aaataaactt gttgatcacg ataatttcca
agttttaaac aaggatatat tgcagtttaa 1320atttcctaaa aaccaatcct ataaaatatt
tggtaatata ccttataaca taagtacgga 1380tataatacgc aaaattgttt ttgatagtat
agctgatgag atttatttaa tcgtggaata 1440cgggtttgct aaaagattat taaatacaaa
acgctcattg gcattatttt taatggcaga 1500agttgatatt tctatattaa gtatggttcc
aagagaatat tttcatccta aacctaaagt 1560gaatagctca cttatcagat taaatagaaa
aaaatcaaga atatcacaca aagataaaca 1620gaagtataat tatttcgtta tgaaatgggt
taacaaagaa tacaagaaaa tatttacaaa 1680aaatcaattt aacaattcct taaaacatgc
aggaattgac gatttaaaca atattagctt 1740tgaacaattc ttatctcttt tcaatagcta
taaattattt aataagatcc cctttacttc 1800ggatgcatgc cgcaggcagg catccgaagt
agtttctcca ttatacaagt attctcttga 1860gtacgtcgtc gcttctcagc agctgctttg
ctttttccct gttttccggc acatggagat 1920aagtgtatct gttaggctta atagtgtgtg
ccatgtcaat tgccttttcg aagtcatctg 1980ccttcatttt taaggtttcc acaaaattga
taaaacccgt atcagtcaga aattttacta 2040cccgctgata tctgtgttct tgaaccctgc
tcataagata ggttgcaatc ccaacctgaa 2100ttccatgaag ctgaggtgtc tccagcagct
tatctaaagc atgagatatt agatgctcac 2160taccgctggc tggagcactg ctgtctgcta
tctgcatggc aattccgctc attgtcagag 2220agtctaccat ttcctttaaa aagaagtttt
ctgtaacctg tgtgtagggc atccttacaa 2280tactgtttac tgacttttta gcaatcattg
cagcaaaatc gtcaaccttt gccgcattgt 2340tcctttcttc aaaataccag tcatacacag
ccgtaatttt ggatattatg tctccgagac 2400ctgaataaat aaatttcata ggtgcatttt
ttaatacatc taaatccact aatattccaa 2460atggcatcga ggcatgtacg gaagtacgcc
tgccatttat aatcaaagag cagcctgagc 2520tggaaaaacc atcgtttgag gttgatgtag
gtatactgat aaaaggaagc ttgtttaaaa 2580aagctatata tttggctgca tcaagcacct
ttcctcctcc tactccgacc actgcatcgg 2640ttttggaggg aatagtaaaa gccttgagca
taagattttc aagctttatg tcatcatagt 2700cgtaagtttc aagtactgca agagattttc
ttgactttat ggaatccaga atcttttcac 2760caaataagtc acgtattccc tctccaaaaa
gtactacaac attactaatt cctgcccttt 2820caatatgtgc
283025572DNAClostridium thermocellum
25ccaaggtgac aaacgataac ttttgagtta tttacatcta agccagcaag cgtggttgct
60cttttagaaa catagctgtg acttgttccg tggaaaccat atcttcttac cttatattta
120tcatagtact caaatggaat accataaagg taagcttctt ttggcattgt ctgatggaat
180gcagtatcaa aaacagctac cattggtaca tttggcataa ttgatttaca agcgttgata
240ccaataaggt ttgctgggtt gtgtaaaggt gcaagatcat tacactcttc aattgcattt
300aagacttcat cattgattac tacggaatga gcaaatttct caccaccatg tactactcta
360tgtccaacag cgttgatttc atctaaggac ttaatcacac cataattttc attcataaga
420gcagcgatta catttttaat agcaacctca tggtttggaa gtgcatcctc aagaactacc
480ttctcaccgt cagctgactt gtgagtaaga cggccatcaa taccgattct ttcacaaaga
540cctactgcta atgcttgctc tgtcacagag tc
572262676DNAArtificial SequenceDescription of Artificial Sequence
Synthetic polynucleotide 26ctgagtgcaa tgtaaaaaag gatgcctcaa
gtattcttga aacatcctta tattatacta 60caaaatcata aagtaaatta ctcagctgta
gcaatgatct cttttttgtt gtaagatcca 120caagctttac aaactctatg aggcatcata
agtgcaccac acttgctgca tttcactaag 180tttggagcag tcatcttcca gtttgcacga
cgactatctc ttctagcttt ggaatgttta 240ttctttggac aaatagctcc cattgattac
acctccttaa acttgttaaa aatatctcgg 300atagcagaca ttcttgggtc tagttctgta
cggtcacacc cgcactctcc ttcatttagg 360ttagcaccgc agaccttgca gattccttta
cagtcttctt tgcacagaac cttcattggg 420aaaccaatca agacttcttc atagataagt
ttatctacgt ctaaatcata tccggaaaca 480aaatttgttt catctaaatc ctcggtacgc
tgttcctctg ttttcgatac atcaatctct 540gtagccacgt cgatgtcttg ttggatggtt
tcttccttca aacaacgatc gcaaggaacg 600gctaacgcta atttcgtttt tgcttccacc
agaatttttc ggccacctag attagttaat 660ctaagtttaa ccggttcttt ataggtaata
gaataaccga caccatttaa ttcgaatata 720tcaaattcaa tcggtgcagt gtattctttg
agaccattag gaacattcat gacttcagac 780atttgtatca gcataagtaa ctcctgtcta
aaaaaacgca taatgtaagc gcccaaaaat 840tcacactgtt agtattataa acgcttaaaa
taggtttgtc aactcctaac tgttaaaaat 900gtcagaattg tgtaaccata ttttctcttc
attatcgttc ttcccttatt aaataattta 960tagctattga aaagagataa gaattgttca
aagctaatat tgtttaaatc gtcaattcct 1020gcatgtttta aggaattgtt aaattgattt
tttgtaaata ttttcttgta ttctttgtta 1080acccatttca taacgaaata attatacttc
tgtttatctt tgtgtgatat tcttgatttt 1140tttctattta atctgataag tgagctattc
actttaggtt taggatgaaa atattctctt 1200ggaaccatac ttaatataga aatatcaact
tctgccatta aaaataatgc caatgagcgt 1260tttgtattta ataatctttt agcaaacccg
tattccacga ttaaataaat ctcatcagct 1320atactatcaa aaacaatttt gcgtattata
tccgtactta tgttataagg tatattacca 1380aatattttat aggattggtt tttaggaaat
ttaaactgca atatatcctt gtttaaaact 1440tggaaattat cgtgatcaac aagtttattt
tctgtagttt tgcataattt atggtctatt 1500tcaatggcag ttacgaaatt acacctctgt
actaattcaa gggtaaaatg cccttttcct 1560gagccgattt caaagatatt atcatgttca
tttaatctta tatttgtcat tattttatct 1620atattatgtt ttgaagtaat aaagttttga
ctgtgtttta tatttttctc gttcattgta 1680tttctcctta taatgttctt aaattcattt
atcacggggc aacttaatat atccgaaata 1740tagttcttct atatcgttcc cccagtataa
tgattattat actatttaat cttcaactta 1800acaattggag tttccagtta agaaataata
atttaatgcc aaagcggata ttcgcaatcc 1860gcttacgcta cttgctcata acctcaacag
gcaatgaagc taagttaatt atttactctg 1920tgcctgaaca gcagtgattg caacaacacc
aacgatatca tcagaagaac aacctcttga 1980taaatcattt actggagctg caataccctg
agttaatggt ccataagctt ctgcctttgc 2040aagacgctgt gttaacttat atccaatgtt
accagcatca aggtctggga agattaatac 2100gttagctttt ccagcaatat cactaccagg
agcttttgaa gcacctacac taggaacgat 2160tgctgcatct aactggaact cgccgtcgat
cttatattct gggtataatt catttgcaat 2220cttagttgct tctacaacct tatcaacatc
tgcatgcttt gcgcttccct ttgttgaatg 2280agaaagcata gctacgatag gttcagagcc
aactaattgt tcaaaactct tcgctgtgga 2340accagcgatt gctgctaact cttcagcatt
tggattctga tttaaaccag catcagagaa 2400aaggaaagtt ccatttgcgc ccatatcaca
attaggtact accattacga agaaagcaga 2460aactaactta gtatttggag cagtttttaa
aatctgaaga catggtctta aggtatctgc 2520tgtagagtga caagcaccag atactaaacc
atctgcatcg cccatcttaa ccatcattac 2580accgtatgta atgtagtctg ttgttaaaag
ctcttttgct ttttcagggg tcatgccttt 2640tgcctgtcta agttctacaa gcttgttaat
gtaagc 267627978DNAClostridium thermocellum
27atggcgatta caataaaccg aagtaaagtt attgttgtgg gtgcaggttt agttggtact
60tcaacggcgt ttagtctaat tacgcaaagt gtttgtgatg aggttatgtt gatagatatc
120aatcgtgcta aggcgcatgg ggaagtaatg gatttgtgtc atagtatcga gtatttaaat
180cgaaatgttt tggtaacgga aggagattat acagactgta aggacgctga tattgttgta
240ataactgcag ggcctccgcc aaaaccagga cagtcgcggc ttgatactct tgggttatcc
300gcagatattg tgagcacgat tgtggaacct gtcatgaaga gtgggttcaa tggaatattc
360ttagtcgtga cgaatccggt ggattcgatt gctcaatatg tttatcaatt atcggggctt
420ccaaagcaac aagttcttgg aactggaaca gcgattgact ctgcaagatt aaaacacttt
480attggagata ttttacatgt agatcctaga agcatacagg cttatacgat gggagagcat
540ggagattctc aaatgtgtcc ttggtcgctt gttacggttg gcggtaaaaa tattatggac
600atcgtacggg ataacaaaga gtattccgat attgacttta atgaaatctt atataaggtt
660accagggtag gttttgatat tttatcagtg aagggtacta cttgttatgg aatagcgtca
720gcagctgtgg ggattataaa agcaattctt tatgatgaga attccatcct tccggtctct
780accttattgg agggggaata tggtgagttt gatgtatatg caggggtacc atgcattcta
840aatcgtttcg gcgtgaagga tgtagtggaa gtaaatatga cagaagtaga gttaaatcaa
900ttccgagcct ctgttcacgt tgtgagggaa gctattgaaa acttaaaaga cagagataaa
960aaggcattat ttttataa
97828960DNAClostridium thermocellum 28ttatgatagc gtcaatgcat actgaaaatt
ttctttcatt gttctacaag aagcatcgaa 60ttttcccttt tcttcaggtg tcaaatttag
ctcaatgatt tcttctacac catgaattcc 120aagtaccgta ggaacagatg catagacatc
atgctggcca tactcaccat ttaagagagt 180agatactggt aataccttct tctcatctga
gaaaatggct cgtgtaacct cagctagtga 240tgcaccaata ccaaattccg ttgagccttt
tccagttagg atatgccatc cccctgctct 300agcttcatca gaaagcttag aaagatcaat
ctgcccatat ttttcaggtt tttccttgat 360tagttccaaa attggttttc cagctataga
taccgttgac catgcaacca tctggctttc 420tccgtgttct ccaagaacaa atccatagat
tgatttttga tcaatttcaa cagcatctgc 480aattgctctt ctaagtctgg cagagtctag
taccgtactt gttgaaataa ttttattgga 540tgagtactga agtaaatgct gtaaataatg
tgttattaca tctgctggat ttgaaatgct 600aacaatcata ccatcaaaac ctgaattttt
gatatgccaa gctacctctt taataattag 660agcagtattc gtaagggtac tcattcttgt
ttcaccctta tttttatctg gattggttcc 720tactgcaatc accatgagat ctgcatcagc
tgcatcacta taatcacccg attttacctt 780aactctgtgt ggtaggtata ctgtagcatc
gtagatatcc agtgcttgtg ctttcgcttt 840ttctctatca atatcaataa agataatttc
ttctgcaagc ccctgctctg ccagtgcata 900tccagcatga gatcctacgt gacctgctcc
gataataatg acttttcttg gttttgccat 960292732DNAArtificial
SequenceDescription of Artificial Sequence Synthetic polynucleotide
29tggaatctca ctatgcacca atgtggtact aaattatatc tttatctatg gaaaattagg
60ttttccgcga atggagatag agggagctgc cattgctact ttaatttgta gaattcttga
120gagtatttta gttgttattt atatgtataa gggtgagaag gtacttaaga tgagactttc
180ttatattttt aagagatcta aacagtattt tcgctctttg gctcgttata gtgcgccagt
240gcttatgagt gaggttaact gggggcttgg gattgctgtt cagtctgcaa tcattgggcg
300tatgggtgtt agttttctta cagccgccag cttcattaat gtagtacaac agttagccgg
360aatcattctg attggtattg gtgtgggttc gagcattata atagggaatt tgattggtga
420gggaaaagag catgaggcga gaatgctagc caataagtta atacgtatca gtatgatact
480cggaggaatt gttgcttttg cagtaatctt actacgtcca atcgctccta actttattga
540ggcgtctaag gaaacagcgg atttaattcg tcagatgcta tttgtttcgg cttacctctt
600attcttccaa gccttatctg tattaactat ggccggaata ttacgtggtg caggggatac
660cctttactgt gcaacctttg atgttttgac cttatgggta ctaaaacttg gaggaggttt
720gcttgcaacc atagtacttc atcttccacc tgtatgggtt tactttatct taagtagcga
780tgagtgtgtt aaagcgctat ttacggtacc gcgggtctta aagggacgtt ggattcatga
840tacaacactg cattaagatt tcatatgtcc agatattttt gcacagtagc ataattacta
900gagcttattc ctataatatt cataggtttt gatggtccat tttacgttac gatagcatat
960attacatcaa aaccaattct atataagatg aggttatagt atgaacgaga aaaatataaa
1020acacagtcaa aactttatta cttcaaaaca taatatagat aaaataatga caaatataag
1080attaaatgaa catgataata tctttgaaat cggctcagga aaagggcatt ttacccttga
1140attagtacag aggtgtaatt tcgtaactgc cattgaaata gaccataaat tatgcaaaac
1200tacagaaaat aaacttgttg atcacgataa tttccaagtt ttaaacaagg atatattgca
1260gtttaaattt cctaaaaacc aatcctataa aatatttggt aatatacctt ataacataag
1320tacggatata atacgcaaaa ttgtttttga tagtatagct gatgagattt atttaatcgt
1380ggaatacggg tttgctaaaa gattattaaa tacaaaacgc tcattggcat tatttttaat
1440ggcagaagtt gatatttcta tattaagtat ggttccaaga gaatattttc atcctaaacc
1500taaagtgaat agctcactta tcagattaaa tagaaaaaaa tcaagaatat cacacaaaga
1560taaacagaag tataattatt tcgttatgaa atgggttaac aaagaataca agaaaatatt
1620tacaaaaaat caatttaaca attccttaaa acatgcagga attgacgatt taaacaatat
1680tagctttgaa caattcttat ctcttttcaa tagctataaa ttatttaata agaagtaata
1740ggaaataata ctcgaattat tctgcaatct gttctaaaaa ataaaattaa gaaattacta
1800tagcaagcca ggttaaaatt actagcttgc tatttttgtg catttagtac agttttgatt
1860attaaagaat aaatttaata actattttgc aataagttat tgactatttc acaagttagt
1920gttactatac aagtatgaaa taaagataca taaaaaaata aataatatga aacataaatt
1980catgacatgc ggaatagaat gaaagaatat tatgtcggtt cctaatacta aatggatata
2040acaatctatt gaaacactta tggggtgtaa gtgtggagag aatttctaaa gcgccaaaag
2100actctacata tgaaattcta aagcttcaca cgggaataat ctaatttatg tatcttatta
2160tcataattca ggaaggtagt gtgaaaatat aaaaattagt tttcctgttt cattcaggca
2220gtagcatttc ttaaacaaat ttgctatgca ttgggtgtta tctgaaaaac aaaaagcaat
2280tttctcacaa cttatttctg aacaacaatg gtattaaaaa tttggaggag gattttacta
2340tgaaaaaaac ggtaacatta ctgttggttc tgaccatggt ggtaagctta tttgcagcat
2400gtggtaagaa aaatggatca agcgaaaccg gcacaaaaga tcctgtggca acaagcggtg
2460caaaagaacc tgacaaacaa gatccaggca ataaagagcc tgaaaaacaa gaccctgtta
2520aaatcaagat ttattactct gataatgcaa ccttaccatt taaagaagat tggttagtta
2580taaaggaagc tgagaagaga tttaatgttg atttcgattt cgaagtaatt ccaattgcag
2640attatcaaac aaaagtttct ttaacattaa atacaggaaa taacgctcca gatgtcatcc
2700tttatcagtc aacgcaggga gagaatgcat ct
27323010665DNAArtificial SequenceDescription of Artificial Sequence
Synthetic polynucleotide 30gaattcgagc tcggtacccg gggatcctct
agagtcgacc tgcaggcatg caagcttggc 60actggccgtc gttttacaac gtcgtgactg
ggaaaaccct ggcgttaccc aacttaatcg 120ccttgcagca catccccctt tcgccagctg
gcgtaatagc gaagaggccc gcaccgcgta 180gaggatcgag atctcgatcc cgcgaaatta
atacgactca ctatagggga attgtgagcg 240gataacaatt cccctctagg ctcataactt
cacgctcctg tatatatttt tatttattta 300aaaatgagtc aaaatttagg aaatgattgc
aatatgtata atatccaaat ttccattcaa 360ataaccaaag taattttacc tctttttatg
agctatttca atactttgtt agtaaattaa 420catatatgag ctgtatatgg tttaatgaaa
aaagttattt tgaagggata ttgtaaaaaa 480cataatatat tatatggata aattttacat
ttgacttatc atatgttaat atatgtaata 540tgaatagcta atctaagcag gctactgcct
agaaaaaagc ttataattat ccttaatttc 600ctactacgtg cgcccagata gggtgttaag
tcaagtagtt taaggtacta ctctgtaaga 660taacacagaa aacagccaac ctaaccgaaa
agcgaaagct gatacgggaa cagagcacgg 720ttggaaagcg atgagttacc taaagacaat
cgggtacgac tgagtcgcaa tgttaatcag 780atataaggta taagttgtgt ttactgaacg
caagtttcta atttcggttg aaatccgata 840gaggaaagtg tctgaaacct ctagtacaaa
gaaaggtaag ttacagtagt agacttatct 900gttatcacca catttgtaca atctgtagga
gaacctatgg gaacgaaacg aaagcgatgc 960cgagaatctg aatttaccaa gacttaacac
taactgggga taccctaaac aagaatgcct 1020aatagaaagg aggaaaaagg ctatagcact
agagcttgaa aatcttgcaa gggtacggag 1080tactcgtagt agtctgagaa gggtaacgcc
ctttacatgg caaaggggta cagttattgt 1140gtactaaaat taaaaattga ttagggagga
aaacctcaaa atgaaaccaa caatggcaat 1200tttagaaaga atcagtaaaa attcacaaga
aaatatagac gaagttttta caagacttta 1260tcgttatctt ttacgtccag atatttatta
cgtggcgacg cgttgggaaa tggcaatgat 1320agcgaaacaa cgtaaaactc ttgttgtatg
ctttcattgt catcgtcacg tgattcataa 1380acacaagtga atgtcgacag tgaattttta
cgaacgaaca ataacagagc cgtatactcc 1440gagaggggta cgtacggttc ccgaagaggg
tggtgcaaac cagtcacagt aatgtgaaca 1500aggcggtacc tccctacttc ancatatcat
tttctgcagc cccctagaaa taattttgtt 1560taactttaag aaggagatat acatatatgg
ctagatcgtc cattccgaca gcatcgccag 1620tcactatggc gtgctgctag cgctatatgc
gttgatgcaa tttctatgca ctcgtagtag 1680tctgagaagg gtaacgccct ttacatggca
aaggggtaca gttattgtgt actaaaatta 1740aaaattgatt agggaggaaa acctcaaaat
gaaaccaaca atggcaattt tagaaagaat 1800cagtaaaaat tcacaagaaa atatagacga
agtttttaca agactttatc gttatctttt 1860acgtccagat atttattacg tggcgtatca
aaatttatat tccaataaag gagcttccac 1920aaaaggaata ttagatgata cagcggatgg
ctttagtgaa gaaaaaataa aaaagattat 1980tcaatcttta aaagacggaa cttactatcc
tcaacctgta cgaagaatgt atattgcaaa 2040aaagaattct aaaaagatga gacctttagg
aattccaact ttcacagata aattgatcca 2100agaagctgtg agaataattc ttgaatctat
ctatgaaccg gtattcgaag atgtgtctca 2160cggttttaga cctcaacgaa gctgtcacac
agctttgaaa acaatcaaaa gagagtttgg 2220cggcgcaaga tggtttgtgg agggagatat
aaaaggctgc ttcgataata tagaccacgt 2280tacactcatt ggactcatca atcttaaaat
caaagatatg aaaatgagcc aattgattta 2340taaatttcta aaagcaggtt atctggaaaa
ctggcagtat cacaaaactt acagcggaac 2400acctcaaggt ggaattctat ctcctctttt
ggccaacatc tatcttcatg aattggataa 2460gtttgtttta caactcaaaa tgaagtttga
ccgagaaagt ccagaaagaa taacacctga 2520atatcgggag ctccacaatg agataaaaag
aatttctcac cgtctcaaga agttggaggg 2580tgaagaaaaa gctaaagttc ttttagaata
tcaagaaaaa cgtaaaagat tacccacact 2640cccctgtacc tcacagacaa ataaagtatt
gaaatacgtc cggtatgcgg acgacttcat 2700tatctctgtt aaaggaagca aagaggactg
tcaatggata aaagaacaat taaaactttt 2760tattcataac aagctaaaaa tggaattgag
tgaagaaaaa acactcatca cacatagcag 2820tcaacccgct cgttttctgg gatatgatat
acgagtaagg agatctggaa cgataaaacg 2880atctggtaaa gtcaaaaaga gaacactcaa
tgggagtgta gaactcctta ttcctcttca 2940agacaaaatt cgtcaattta tttttgacaa
gaaaatagct atccaaaaga aagatagctc 3000atggtttcca gttcacagga aatatcttat
tcgttcaaca gacttagaaa tcatcacaat 3060ttataattct gaactccgcg ggatttgtaa
ttactacggt ctagcaagta attttaacca 3120gctcaattat tttgcttatc ttatggaata
cagctgtcta aaaacgatag cctccaaaca 3180taagggaaca ctttcaaaaa ccatttccat
gtttaaagat ggaagtggtt cgtgggggat 3240cccgtatgag ataaagcaag gtaagcagcg
ccgttatttt gcaaatttta gtgaatgtaa 3300atccccttat caatttacgg atgagataag
tcaagctcct gtattgtatg gctatgcccg 3360gaatactctt gaaaacaggt taaaagctaa
atgttgtgaa ttatgtggga cgtctgatga 3420aaatacttcc tatgaaattc accatgtcaa
taaggtcaaa aatcttaaag gcaaagaaaa 3480atgggaaatg gcaatgatag cgaaacaacg
taaaactctt gttgtatgct ttcattgtca 3540tcgtcacgtg attcataaac acaagtgaga
tatctcgagc acccgttctc ggagcactgt 3600ccgaccgctt tggccgccgc ccagtcctgc
tcgcttcgct acttggagcc actatcgact 3660acgcgatcat ggcgaccaca cccgtcctgt
ggatcgccaa gctcgccgat ggtagtgtgg 3720ggtctcccca tgcgagagta gggaactgcc
aggcatcaaa taaaacgaaa ggctcagtcg 3780aaagactggg cctttcgttt tatctgttgt
ttgtcggtga acgctctcct gagtaggaca 3840aatccgccgg gagcggattt gaacgttgcg
aagcaacggc ccggagggtg gcgggcagga 3900cgcccgccat aaactgccag gcatcaaatt
aagcagaagg ccatcctgac ggatggcctt 3960tttgcgtttc tacaaactct tcctgtcgtc
atatctacaa gccatcccgc ccttcccaac 4020agttgcgcag cctgaatggc gaatggcgcc
tgatgcggta ttttctcctt acgcatctgt 4080gcggtatttc acaccgcata tggtgcactc
tcagtacaat ctgctctgat gccgcatagt 4140taagccagcc ccgacacccg ccaacacccg
ctgacgcgcc ctgacgggct tgtctgctcc 4200cggcatccgc ttacagacaa gctgtgaccg
tctccgggag ctgcatgtgt cagaggtttt 4260caccgtcatc accgaaacgc gcgagacgaa
agggcctcgt gatacgccta tttttatagg 4320ttaatgtcat gataataatg gtttcttaga
cgtcaggtgg cacttttcgg ggaaatgtgc 4380gcggaacccc tatttgttta tttttctaaa
tacattcaaa tatgtatccg ctcatgagac 4440aataaccctg ataaatgctt caataatatt
gaaaaaggaa gagtatgagt attcaacatt 4500tccgtgtcgc ccttattccc ttttttgcgg
cattttgcct tcctgttttt gctcacccag 4560aaacgctggt gaaagtaaaa gatgctgaag
atcagttggg tgcacgagtg ggttacatcg 4620aactggatct caacagcggt aagatccttg
agagttttcg ccccgaagaa cgttttccaa 4680tgatgagcac ttttaaatta aaaatgaagt
tttaaaactt catttttaat ttaaattaaa 4740aatgaagttt tatcaaaaaa atttccaata
atcccactct aagccacaaa cacgccctat 4800aaaatcccgc tttaatccca ctttgagaca
catgtaatat tactttacgc cctagtatag 4860tgataatttt ttacattcaa tgccacgcaa
aaaaataaag gggcactata ataaaagttc 4920cttcggaact aactaaagta aaaaattatc
tttacaacct ccccaaaaaa aagaacaggt 4980acaaagtacc ctataataca agcgtaaaaa
aatgagggta aaaataaaaa aataaaaaaa 5040taaaaaaata aaaaaataaa aaaaataaaa
aaataaaaaa ataaaaaaat aaaaaaataa 5100aaaaataaaa aaataaaaaa ataaaaaaat
ataaaaataa aaaaatataa aaataaaaaa 5160atataaaaat aaaaaaatat aaaaataaaa
aaataaaaaa atataaaaat aaaaaaataa 5220aaaaatataa aaatattttt tatttaaagt
ttgaaaaaaa tttttttata ttatataatc 5280tttgaagaaa agaatataaa aaatgagcct
ttataaaagc ccattttttt tcatatacgt 5340aatatgacgt tctaatgttt ttattggtac
ttctaacatt agagtaattt ctttattttt 5400aaagcctttt tctttaaggg cttttatttt
ttttcttaat acatttaatt cctctttttt 5460tgttgctttt cctttagctt ttaattgctc
ttgataattt tttttacctc taatattttc 5520tcttctctta tattcctttt tagaaattat
tattgtcata tatttttgtt cttcttctgt 5580aatttctaat aactctataa gagtttcatt
cttatactta tattgcttat ttttatctaa 5640ataacatctt tcagcacttc tagttgctct
tataacttct ctttcactta aatgttgtct 5700aaacatacta ttaagttcta aaacatcatt
taatgccttc tcaatgtctt ctgtaaagct 5760acaaagataa tatctatata aaaataatat
aagctctctg tgtcctttta aatcatattc 5820tcttagttca caaagtttta ttatgtcttg
tattcttcca taatataaac ttctttctct 5880ataaatataa tttattttgc ttggtctacc
ctttttcctt tcatatggtt ttaattcagg 5940taaaaatcca ttttgtattt ctcttaagtc
ataaatatat tcgtactcat ctaatatatt 6000gactactgtt tttgatttag agtttatact
tcctggaact cttaatattc tggttgcatc 6060taaggcttgt ctatctgctc caaagtattt
taattgatta tataaatatt cttgaaccgc 6120tttccataat ggtaatgctt tactaggtac
tgcatttatt atccatatta aatacattcc 6180tcttccacta tctattacat agtttggtat
aggaatactt tgattaaaat aattcttttc 6240taagtccatt aatacctggt ctttagtttt
gccagtttta taataatcca agtctataaa 6300cagtgtattt aactctttta tattttctaa
tcgcctacac ggcttataaa aggtatttag 6360agttatatag atattttcat cactcatatc
taaatctttt aattcagcgt atttatagtg 6420ccattggcta tatccttttt tatctataac
gctcctggtt atccaccctt tacttctact 6480atgaatatta tctatatagt tctttttatt
cagctttaat gcgtttctca cttattcacc 6540tccccttctg taaaactaag aaaattatat
catattttca ataattatta actattctta 6600aactcttaat aaaaaataga gtaagtcccc
aattgaaact taatctattt tttatgtttt 6660aatttattat ttttattaaa atattttaaa
ctaaattaaa tgattctttt taatttttta 6720ctatttcatt ccataatata ttactataat
tatttacaaa taatatttct tcatttgtaa 6780tatttagatg atttactaat tttagttttt
atatattaaa taattaatgt ataatttata 6840taaaaaatca aaggagctta taaattatga
ttatttccaa agatactaaa gatttaattt 6900tttcaatttt aacaatactt tttgtaatat
tatgtttaaa tttaattgta tttttttcat 6960ataataaagc cgttgaagta aaccaatcca
ttttccttat gatgttatta ttaaatttaa 7020gttttataat aatatcttta ttatatttat
tgtttttaaa aaaactagtg aaatttccgg 7080ctttattaaa cttattttta ggaattttat
tttcattttc atctttacag gatttgatta 7140tatctttaaa tatgttttat caaatattat
ctttttctaa atttatatat atttttatta 7200tatttattat tatatatatt ttatttttaa
gtttctttct aacagctatt aaaaagaaac 7260ttaaaaataa aaacacgtac tctaaaccaa
taaataaaac tatttttatt attgctgcct 7320tgattggaat agtttttagt aaaattaatt
tcaatattcc acaatattat attataagct 7380agctttgcat tgtacttttc aatcgcttca
cgaatgcggt tatctccgaa agataaagtc 7440ttttcatctt ccttgatgaa gataagattt
tctccgtctc cgccggcaga attgaagcgg 7500ggtactacgg tatcgtctgc gtcatcttcc
gttgtctgat agatgatagt cataggctca 7560ttttcttccg tttcggtaaa ggggataggt
tcgccctttg agagcagggc ggcgatggaa 7620agcattaact tgcttttccc atcgcccgga
tctccctgca atagcgtaac tttgccaaac 7680ggaatatacg gataccacag ccactttact
tctttcggct cgatttcact tgccttgatg 7740atttcaagag gtacgctgaa attcatttcg
ttttcattta gtttcatttt ttcttgttct 7800ccttttctct gaaaatataa aaaccacaga
ttgatactaa aaccttggtt gtgttgcttt 7860tcggggctta aatcaaggaa aaatccttgt
tttaagcctt tcaaaaagaa acacaaggtc 7920tttgtactaa cctgtggtta tgtataaaat
tgtagatttt agggtaacaa aaaacaccgt 7980atttctacga tgtttttgct taaatacttg
tttttagtta cagacaaacc tgaagttgaa 8040ttcatattta ttaaattaag cgtatatact
attgaaaatg tttttgaaat attataaaat 8100taactttggt ttaggaaaag taaccagttc
ttttgtcgat aagcattaat ttgcttgact 8160aattaataaa aaacttagga ggtaacacta
atggtattcg agaaaattga caagaacagt 8220tggaacagaa aagaatactt tgatcactat
tttgctagtg taccttgcac atacagtatg 8280actgtaaagg ttgatataac acagattaaa
gagaagggaa tgaaattgta ccctgcaatg 8340ctttattaca tagcaatgat agtaaacaga
catagtgaat ttaggaccgc tatcaatcag 8400gatggtgaac ttggaattta tgatgaaatg
attccatcat atactatatt ccataatgac 8460accgagacat tctcaagtct ttggactgaa
tgcaagtcag attttaagtc atttcttgca 8520gattatgaat ctgatactca aagatacggt
aataatcacc gtatggaagg aaaacctaat 8580gcacctgaga atattttcaa tgtttccatg
ataccttggt caacatttga cggatttaat 8640ctgaatctgc aaaaaggcta cgattactta
atccctatct ttacaatggg caagtattat 8700aaggaagaca ataaaatcat ccttcccctt
gcaatccagg tacatcatgc agtatgtgat 8760ggatttcata tttgtcgttt tgtaaatgaa
ctgcaagaat taataaattc ctaactcgag 8820ggcagtagcg cggtggtccc acctgacccc
atgccgaact cagaagtgaa acgataaaac 8880gaaaggctca gtcgaaagac tgggcctttc
gttttatctg ttgtttgtcg gtgaacgctc 8940tcctgagtag gacaaatccg ccgggagcgg
atttgaacgt tgcgaagcaa cggcccggag 9000ggtggcgggc aggacgcccg ccataaactg
ccaggcatca aattaagcag aaggccatcc 9060tgacggatgg ccttttttat tgtaaattcc
ggtaaccctt gtagcttagt gggaatttgt 9120accccttatc gatacaaatt ccccgtaggc
gctagggaca ctttttcact cgttaaaaag 9180ttttgagaat attttatatt tttgttcatg
taatcactcc ttcttaatta caaattttta 9240gcatctaatt taacttcaat tcctattata
caaaatttta agatactgca ctatcaacac 9300actcttaagt ttgcttctaa gtcttatttc
cataacttct tttacgtttc cgggtacaat 9360tcgtaatcat gtcatagctg tttcctgtgt
gaaattctta tccgctcaca attccacaca 9420acatacgagc cggaagcata aagtgtaaag
cctggggtgc ctaatgagtg agctaactca 9480cattaattgc gttgcgctca ctgcccgctt
tccagtcggg aaacctgtcg tgccagaaaa 9540cttcattttt aatttaaaag gatctaggtg
aagatccttt ttgataatct catgaccaaa 9600atcccttaac gtgagttttc gttccactga
gcgtcagacc ccgtagaaaa gatcaaagga 9660tcttcttgag atcctttttt tctgcgcgta
atctgctgct tgcaaacaaa aaaaccaccg 9720ctaccagcgg tggtttgttt gccggatcaa
gagctaccaa ctctttttcc gaaggtaact 9780ggcttcagca gagcgcagat accaaatact
gtccttctag tgtagccgta gttaggccac 9840cacttcaaga actctgtagc accgcctaca
tacctcgctc tgctaatcct gttaccagtg 9900gctgctgcca gtggcgataa gtcgtgtctt
accgggttgg actcaagacg atagttaccg 9960gataaggcgc agcggtcggg ctgaacgggg
ggttcgtgca cacagcccag cttggagcga 10020acgacctaca ccgaactgag atacctacag
cgtgagctat gagaaagcgc cacgcttccc 10080gaagggagaa aggcggacag gtatccggta
agcggcaggg tcggaacagg agagcgcacg 10140agggagcttc cagggggaaa cgcctggtat
ctttatagtc ctgtcgggtt tcgccacctc 10200tgacttgagc gtcgattttt gtgatgctcg
tcaggggggc ggagcctatg gaaaaacgcc 10260agcaacgcgg cctttttacg gttcctggcc
ttttgctggc cttttgctca catgttcttt 10320cctgcgttat cccctgattc tgtggataac
cgtattaccg cctttgagtg agctgatacc 10380gctcgccgca gccgaacgac cgagcgcagc
gagtcagtga gcgaggaagc ggaagagcgc 10440ccaatacgca aaccgcctct ccccgcgcgt
tggccgattc attaatgcag ctggcacgac 10500aggtttcccg actggaaagc gggcagtgag
cgcaacgcaa ttaatgtgag ttagctcact 10560cattaggcac cccaggcttt acactttatg
cttccggctc gtatgttgtg tggaattgtg 10620agcggataac aatttcacac aggaaacagc
tatgaccatg attac 106653110665DNAArtificial
SequenceDescription of Artificial Sequence Synthetic polynucleotide
31gaattcgagc tcggtacccg gggatcctct agagtcgacc tgcaggcatg caagcttggc
60actggccgtc gttttacaac gtcgtgactg ggaaaaccct ggcgttaccc aacttaatcg
120ccttgcagca catccccctt tcgccagctg gcgtaatagc gaagaggccc gcaccgcgta
180gaggatcgag atctcgatcc cgcgaaatta atacgactca ctatagggga attgtgagcg
240gataacaatt cccctctagg ctcataactt cacgctcctg tatatatttt tatttattta
300aaaatgagtc aaaatttagg aaatgattgc aatatgtata atatccaaat ttccattcaa
360ataaccaaag taattttacc tctttttatg agctatttca atactttgtt agtaaattaa
420catatatgag ctgtatatgg tttaatgaaa aaagttattt tgaagggata ttgtaaaaaa
480cataatatat tatatggata aattttacat ttgacttatc atatgttaat atatgtaata
540tgaatagcta atctaagcag gctactgcct agaaaaaagc ttataattat ccttagctct
600cttcaatgtg cgcccagata gggtgttaag tcaagtagtt taaggtacta ctctgtaaga
660taacacagaa aacagccaac ctaaccgaaa agcgaaagct gatacgggaa cagagcacgg
720ttggaaagcg atgagttacc taaagacaat cgggtacgac tgagtcgcaa tgttaatcag
780atataaggta taagttgtgt ttactgaacg caagtttcta atttcgatta gagctcgata
840gaggaaagtg tctgaaacct ctagtacaaa gaaaggtaag ttatcattga agacttatct
900gttatcacca catttgtaca atctgtagga gaacctatgg gaacgaaacg aaagcgatgc
960cgagaatctg aatttaccaa gacttaacac taactgggga taccctaaac aagaatgcct
1020aatagaaagg aggaaaaagg ctatagcact agagcttgaa aatcttgcaa gggtacggag
1080tactcgtagt agtctgagaa gggtaacgcc ctttacatgg caaaggggta cagttattgt
1140gtactaaaat taaaaattga ttagggagga aaacctcaaa atgaaaccaa caatggcaat
1200tttagaaaga atcagtaaaa attcacaaga aaatatagac gaagttttta caagacttta
1260tcgttatctt ttacgtccag atatttatta cgtggcgacg cgttgggaaa tggcaatgat
1320agcgaaacaa cgtaaaactc ttgttgtatg ctttcattgt catcgtcacg tgattcataa
1380acacaagtga atgtcgacag tgaattttta cgaacgaaca ataacagagc cgtatactcc
1440gagaggggta cgtacggttc ccgaagaggg tggtgcaaac cagtcacagt aatgtgaaca
1500aggcggtacc tccctacttc ancatatcat tttctgcagc cccctagaaa taattttgtt
1560taactttaag aaggagatat acatatatgg ctagatcgtc cattccgaca gcatcgccag
1620tcactatggc gtgctgctag cgctatatgc gttgatgcaa tttctatgca ctcgtagtag
1680tctgagaagg gtaacgccct ttacatggca aaggggtaca gttattgtgt actaaaatta
1740aaaattgatt agggaggaaa acctcaaaat gaaaccaaca atggcaattt tagaaagaat
1800cagtaaaaat tcacaagaaa atatagacga agtttttaca agactttatc gttatctttt
1860acgtccagat atttattacg tggcgtatca aaatttatat tccaataaag gagcttccac
1920aaaaggaata ttagatgata cagcggatgg ctttagtgaa gaaaaaataa aaaagattat
1980tcaatcttta aaagacggaa cttactatcc tcaacctgta cgaagaatgt atattgcaaa
2040aaagaattct aaaaagatga gacctttagg aattccaact ttcacagata aattgatcca
2100agaagctgtg agaataattc ttgaatctat ctatgaaccg gtattcgaag atgtgtctca
2160cggttttaga cctcaacgaa gctgtcacac agctttgaaa acaatcaaaa gagagtttgg
2220cggcgcaaga tggtttgtgg agggagatat aaaaggctgc ttcgataata tagaccacgt
2280tacactcatt ggactcatca atcttaaaat caaagatatg aaaatgagcc aattgattta
2340taaatttcta aaagcaggtt atctggaaaa ctggcagtat cacaaaactt acagcggaac
2400acctcaaggt ggaattctat ctcctctttt ggccaacatc tatcttcatg aattggataa
2460gtttgtttta caactcaaaa tgaagtttga ccgagaaagt ccagaaagaa taacacctga
2520atatcgggag ctccacaatg agataaaaag aatttctcac cgtctcaaga agttggaggg
2580tgaagaaaaa gctaaagttc ttttagaata tcaagaaaaa cgtaaaagat tacccacact
2640cccctgtacc tcacagacaa ataaagtatt gaaatacgtc cggtatgcgg acgacttcat
2700tatctctgtt aaaggaagca aagaggactg tcaatggata aaagaacaat taaaactttt
2760tattcataac aagctaaaaa tggaattgag tgaagaaaaa acactcatca cacatagcag
2820tcaacccgct cgttttctgg gatatgatat acgagtaagg agatctggaa cgataaaacg
2880atctggtaaa gtcaaaaaga gaacactcaa tgggagtgta gaactcctta ttcctcttca
2940agacaaaatt cgtcaattta tttttgacaa gaaaatagct atccaaaaga aagatagctc
3000atggtttcca gttcacagga aatatcttat tcgttcaaca gacttagaaa tcatcacaat
3060ttataattct gaactccgcg ggatttgtaa ttactacggt ctagcaagta attttaacca
3120gctcaattat tttgcttatc ttatggaata cagctgtcta aaaacgatag cctccaaaca
3180taagggaaca ctttcaaaaa ccatttccat gtttaaagat ggaagtggtt cgtgggggat
3240cccgtatgag ataaagcaag gtaagcagcg ccgttatttt gcaaatttta gtgaatgtaa
3300atccccttat caatttacgg atgagataag tcaagctcct gtattgtatg gctatgcccg
3360gaatactctt gaaaacaggt taaaagctaa atgttgtgaa ttatgtggga cgtctgatga
3420aaatacttcc tatgaaattc accatgtcaa taaggtcaaa aatcttaaag gcaaagaaaa
3480atgggaaatg gcaatgatag cgaaacaacg taaaactctt gttgtatgct ttcattgtca
3540tcgtcacgtg attcataaac acaagtgaga tatctcgagc acccgttctc ggagcactgt
3600ccgaccgctt tggccgccgc ccagtcctgc tcgcttcgct acttggagcc actatcgact
3660acgcgatcat ggcgaccaca cccgtcctgt ggatcgccaa gctcgccgat ggtagtgtgg
3720ggtctcccca tgcgagagta gggaactgcc aggcatcaaa taaaacgaaa ggctcagtcg
3780aaagactggg cctttcgttt tatctgttgt ttgtcggtga acgctctcct gagtaggaca
3840aatccgccgg gagcggattt gaacgttgcg aagcaacggc ccggagggtg gcgggcagga
3900cgcccgccat aaactgccag gcatcaaatt aagcagaagg ccatcctgac ggatggcctt
3960tttgcgtttc tacaaactct tcctgtcgtc atatctacaa gccatcccgc ccttcccaac
4020agttgcgcag cctgaatggc gaatggcgcc tgatgcggta ttttctcctt acgcatctgt
4080gcggtatttc acaccgcata tggtgcactc tcagtacaat ctgctctgat gccgcatagt
4140taagccagcc ccgacacccg ccaacacccg ctgacgcgcc ctgacgggct tgtctgctcc
4200cggcatccgc ttacagacaa gctgtgaccg tctccgggag ctgcatgtgt cagaggtttt
4260caccgtcatc accgaaacgc gcgagacgaa agggcctcgt gatacgccta tttttatagg
4320ttaatgtcat gataataatg gtttcttaga cgtcaggtgg cacttttcgg ggaaatgtgc
4380gcggaacccc tatttgttta tttttctaaa tacattcaaa tatgtatccg ctcatgagac
4440aataaccctg ataaatgctt caataatatt gaaaaaggaa gagtatgagt attcaacatt
4500tccgtgtcgc ccttattccc ttttttgcgg cattttgcct tcctgttttt gctcacccag
4560aaacgctggt gaaagtaaaa gatgctgaag atcagttggg tgcacgagtg ggttacatcg
4620aactggatct caacagcggt aagatccttg agagttttcg ccccgaagaa cgttttccaa
4680tgatgagcac ttttaaatta aaaatgaagt tttaaaactt catttttaat ttaaattaaa
4740aatgaagttt tatcaaaaaa atttccaata atcccactct aagccacaaa cacgccctat
4800aaaatcccgc tttaatccca ctttgagaca catgtaatat tactttacgc cctagtatag
4860tgataatttt ttacattcaa tgccacgcaa aaaaataaag gggcactata ataaaagttc
4920cttcggaact aactaaagta aaaaattatc tttacaacct ccccaaaaaa aagaacaggt
4980acaaagtacc ctataataca agcgtaaaaa aatgagggta aaaataaaaa aataaaaaaa
5040taaaaaaata aaaaaataaa aaaaataaaa aaataaaaaa ataaaaaaat aaaaaaataa
5100aaaaataaaa aaataaaaaa ataaaaaaat ataaaaataa aaaaatataa aaataaaaaa
5160atataaaaat aaaaaaatat aaaaataaaa aaataaaaaa atataaaaat aaaaaaataa
5220aaaaatataa aaatattttt tatttaaagt ttgaaaaaaa tttttttata ttatataatc
5280tttgaagaaa agaatataaa aaatgagcct ttataaaagc ccattttttt tcatatacgt
5340aatatgacgt tctaatgttt ttattggtac ttctaacatt agagtaattt ctttattttt
5400aaagcctttt tctttaaggg cttttatttt ttttcttaat acatttaatt cctctttttt
5460tgttgctttt cctttagctt ttaattgctc ttgataattt tttttacctc taatattttc
5520tcttctctta tattcctttt tagaaattat tattgtcata tatttttgtt cttcttctgt
5580aatttctaat aactctataa gagtttcatt cttatactta tattgcttat ttttatctaa
5640ataacatctt tcagcacttc tagttgctct tataacttct ctttcactta aatgttgtct
5700aaacatacta ttaagttcta aaacatcatt taatgccttc tcaatgtctt ctgtaaagct
5760acaaagataa tatctatata aaaataatat aagctctctg tgtcctttta aatcatattc
5820tcttagttca caaagtttta ttatgtcttg tattcttcca taatataaac ttctttctct
5880ataaatataa tttattttgc ttggtctacc ctttttcctt tcatatggtt ttaattcagg
5940taaaaatcca ttttgtattt ctcttaagtc ataaatatat tcgtactcat ctaatatatt
6000gactactgtt tttgatttag agtttatact tcctggaact cttaatattc tggttgcatc
6060taaggcttgt ctatctgctc caaagtattt taattgatta tataaatatt cttgaaccgc
6120tttccataat ggtaatgctt tactaggtac tgcatttatt atccatatta aatacattcc
6180tcttccacta tctattacat agtttggtat aggaatactt tgattaaaat aattcttttc
6240taagtccatt aatacctggt ctttagtttt gccagtttta taataatcca agtctataaa
6300cagtgtattt aactctttta tattttctaa tcgcctacac ggcttataaa aggtatttag
6360agttatatag atattttcat cactcatatc taaatctttt aattcagcgt atttatagtg
6420ccattggcta tatccttttt tatctataac gctcctggtt atccaccctt tacttctact
6480atgaatatta tctatatagt tctttttatt cagctttaat gcgtttctca cttattcacc
6540tccccttctg taaaactaag aaaattatat catattttca ataattatta actattctta
6600aactcttaat aaaaaataga gtaagtcccc aattgaaact taatctattt tttatgtttt
6660aatttattat ttttattaaa atattttaaa ctaaattaaa tgattctttt taatttttta
6720ctatttcatt ccataatata ttactataat tatttacaaa taatatttct tcatttgtaa
6780tatttagatg atttactaat tttagttttt atatattaaa taattaatgt ataatttata
6840taaaaaatca aaggagctta taaattatga ttatttccaa agatactaaa gatttaattt
6900tttcaatttt aacaatactt tttgtaatat tatgtttaaa tttaattgta tttttttcat
6960ataataaagc cgttgaagta aaccaatcca ttttccttat gatgttatta ttaaatttaa
7020gttttataat aatatcttta ttatatttat tgtttttaaa aaaactagtg aaatttccgg
7080ctttattaaa cttattttta ggaattttat tttcattttc atctttacag gatttgatta
7140tatctttaaa tatgttttat caaatattat ctttttctaa atttatatat atttttatta
7200tatttattat tatatatatt ttatttttaa gtttctttct aacagctatt aaaaagaaac
7260ttaaaaataa aaacacgtac tctaaaccaa taaataaaac tatttttatt attgctgcct
7320tgattggaat agtttttagt aaaattaatt tcaatattcc acaatattat attataagct
7380agctttgcat tgtacttttc aatcgcttca cgaatgcggt tatctccgaa agataaagtc
7440ttttcatctt ccttgatgaa gataagattt tctccgtctc cgccggcaga attgaagcgg
7500ggtactacgg tatcgtctgc gtcatcttcc gttgtctgat agatgatagt cataggctca
7560ttttcttccg tttcggtaaa ggggataggt tcgccctttg agagcagggc ggcgatggaa
7620agcattaact tgcttttccc atcgcccgga tctccctgca atagcgtaac tttgccaaac
7680ggaatatacg gataccacag ccactttact tctttcggct cgatttcact tgccttgatg
7740atttcaagag gtacgctgaa attcatttcg ttttcattta gtttcatttt ttcttgttct
7800ccttttctct gaaaatataa aaaccacaga ttgatactaa aaccttggtt gtgttgcttt
7860tcggggctta aatcaaggaa aaatccttgt tttaagcctt tcaaaaagaa acacaaggtc
7920tttgtactaa cctgtggtta tgtataaaat tgtagatttt agggtaacaa aaaacaccgt
7980atttctacga tgtttttgct taaatacttg tttttagtta cagacaaacc tgaagttgaa
8040ttcatattta ttaaattaag cgtatatact attgaaaatg tttttgaaat attataaaat
8100taactttggt ttaggaaaag taaccagttc ttttgtcgat aagcattaat ttgcttgact
8160aattaataaa aaacttagga ggtaacacta atggtattcg agaaaattga caagaacagt
8220tggaacagaa aagaatactt tgatcactat tttgctagtg taccttgcac atacagtatg
8280actgtaaagg ttgatataac acagattaaa gagaagggaa tgaaattgta ccctgcaatg
8340ctttattaca tagcaatgat agtaaacaga catagtgaat ttaggaccgc tatcaatcag
8400gatggtgaac ttggaattta tgatgaaatg attccatcat atactatatt ccataatgac
8460accgagacat tctcaagtct ttggactgaa tgcaagtcag attttaagtc atttcttgca
8520gattatgaat ctgatactca aagatacggt aataatcacc gtatggaagg aaaacctaat
8580gcacctgaga atattttcaa tgtttccatg ataccttggt caacatttga cggatttaat
8640ctgaatctgc aaaaaggcta cgattactta atccctatct ttacaatggg caagtattat
8700aaggaagaca ataaaatcat ccttcccctt gcaatccagg tacatcatgc agtatgtgat
8760ggatttcata tttgtcgttt tgtaaatgaa ctgcaagaat taataaattc ctaactcgag
8820ggcagtagcg cggtggtccc acctgacccc atgccgaact cagaagtgaa acgataaaac
8880gaaaggctca gtcgaaagac tgggcctttc gttttatctg ttgtttgtcg gtgaacgctc
8940tcctgagtag gacaaatccg ccgggagcgg atttgaacgt tgcgaagcaa cggcccggag
9000ggtggcgggc aggacgcccg ccataaactg ccaggcatca aattaagcag aaggccatcc
9060tgacggatgg ccttttttat tgtaaattcc ggtaaccctt gtagcttagt gggaatttgt
9120accccttatc gatacaaatt ccccgtaggc gctagggaca ctttttcact cgttaaaaag
9180ttttgagaat attttatatt tttgttcatg taatcactcc ttcttaatta caaattttta
9240gcatctaatt taacttcaat tcctattata caaaatttta agatactgca ctatcaacac
9300actcttaagt ttgcttctaa gtcttatttc cataacttct tttacgtttc cgggtacaat
9360tcgtaatcat gtcatagctg tttcctgtgt gaaattctta tccgctcaca attccacaca
9420acatacgagc cggaagcata aagtgtaaag cctggggtgc ctaatgagtg agctaactca
9480cattaattgc gttgcgctca ctgcccgctt tccagtcggg aaacctgtcg tgccagaaaa
9540cttcattttt aatttaaaag gatctaggtg aagatccttt ttgataatct catgaccaaa
9600atcccttaac gtgagttttc gttccactga gcgtcagacc ccgtagaaaa gatcaaagga
9660tcttcttgag atcctttttt tctgcgcgta atctgctgct tgcaaacaaa aaaaccaccg
9720ctaccagcgg tggtttgttt gccggatcaa gagctaccaa ctctttttcc gaaggtaact
9780ggcttcagca gagcgcagat accaaatact gtccttctag tgtagccgta gttaggccac
9840cacttcaaga actctgtagc accgcctaca tacctcgctc tgctaatcct gttaccagtg
9900gctgctgcca gtggcgataa gtcgtgtctt accgggttgg actcaagacg atagttaccg
9960gataaggcgc agcggtcggg ctgaacgggg ggttcgtgca cacagcccag cttggagcga
10020acgacctaca ccgaactgag atacctacag cgtgagctat gagaaagcgc cacgcttccc
10080gaagggagaa aggcggacag gtatccggta agcggcaggg tcggaacagg agagcgcacg
10140agggagcttc cagggggaaa cgcctggtat ctttatagtc ctgtcgggtt tcgccacctc
10200tgacttgagc gtcgattttt gtgatgctcg tcaggggggc ggagcctatg gaaaaacgcc
10260agcaacgcgg cctttttacg gttcctggcc ttttgctggc cttttgctca catgttcttt
10320cctgcgttat cccctgattc tgtggataac cgtattaccg cctttgagtg agctgatacc
10380gctcgccgca gccgaacgac cgagcgcagc gagtcagtga gcgaggaagc ggaagagcgc
10440ccaatacgca aaccgcctct ccccgcgcgt tggccgattc attaatgcag ctggcacgac
10500aggtttcccg actggaaagc gggcagtgag cgcaacgcaa ttaatgtgag ttagctcact
10560cattaggcac cccaggcttt acactttatg cttccggctc gtatgttgtg tggaattgtg
10620agcggataac aatttcacac aggaaacagc tatgaccatg attac
10665321056DNAThermoanaerobacterium saccharolyticum 32gtgtatacaa
tatatttctt cttagtaaga ggaatgtata aaaataaata ttttaaagga 60agggacgatc
ttatgagcat tattcaaaac atcattgaaa aagctaaaag cgataaaaag 120aaaattgttc
tgccagaagg tgcagaaccc aggacattaa aagctgctga aatagtttta 180aaagaaggga
ttgcagattt agtgcttctt ggaaatgaag atgagataag aaatgctgca 240aaagacttgg
acatatccaa agctgaaatc attgaccctg taaagtctga aatgtttgat 300aggtatgcta
atgatttcta tgagttaagg aagaacaaag gaatcacgtt ggaaaaagcc 360agagaaacaa
tcaaggataa tatctatttt ggatgtatga tggttaaaga aggttatgct 420gatggattgg
tatctggcgc tattcatgct actgcagatt tattaagacc tgcatttcag 480ataattaaaa
cggctccagg agcaaagata gtatcaagct tttttataat ggaagtgcct 540aattgtgaat
atggtgaaaa tggtgtattc ttgtttgctg attgtgcggt caacccatcg 600cctaatgcag
aagaacttgc ttctattgcc gtacaatctg ctaatactgc aaagaatttg 660ttgggctttg
aaccaaaagt tgccatgcta tcattttcta caaaaggtag tgcatcacat 720gaattagtag
ataaagtaag aaaagcgaca gagatagcaa aagaattgat gccagatgtt 780gctatcgacg
gtgaattgca attggatgct gctcttgtta aagaagttgc agagctaaaa 840gcgccgggaa
gcaaagttgc gggatgtgca aatgtgctta tattccctga tttacaagct 900ggtaatatag
gatataagct tgtacagagg ttagctaagg caaatgcaat tggacctata 960acacaaggaa
tgggtgcacc ggttaatgat ttatcaagag gatgcagcta tagagatatt 1020gttgacgtaa
tagcaacaac agctgtgcag gctcaa
1056331209DNAThermoanaerobacterium saccharolyticum 33atgaaaatta
tgaaaatact ggttattaat tgcggaagtt cttcgctaaa atatcaactg 60attgaatcaa
ctgatggaaa tgtgttggca aaaggccttg ctgaaagaat cggcataaat 120gattccatgt
tgacacataa tgctaacgga gaaaaaatca agataaaaaa agacatgaaa 180gatcacaaag
acgcaataaa attggtttta gatgctttgg taaacagtga ctacggcgtt 240ataaaagata
tgtctgagat agatgctgta ggacatagag ttgttcacgg aggagaatct 300tttacatcat
cagttctcat aaatgatgaa gtgttaaaag cgataacaga ttgcatagaa 360ttagctccac
tgcacaatcc tgctaatata gaaggaatta aagcttgcca gcaaatcatg 420ccaaacgttc
caatggtggc ggtatttgat acagcctttc atcagacaat gcctgattat 480gcatatcttt
atccaatacc ttatgaatac tacacaaagt acaggattag aagatatgga 540tttcatggca
catcgcataa atatgtttca aatagggctg cagagatttt gaataaacct 600attgaagatt
tgaaaatcat aacttgtcat cttggaaatg gctccagcat tgctgctgtc 660aaatatggta
aatcaattga cacaagcatg ggatttacac cattagaagg tttggctatg 720ggtacacgat
ctggaagcat agacccatcc atcatttcgt atcttatgga aaaagaaaat 780ataagcgctg
aagaagtagt aaatatatta aataaaaaat ctggtgttta cggtatttca 840ggaataagca
gcgattttag agacttagaa gatgccgcct ttaaaaatgg agatgaaaga 900gctcagttgg
ctttaaatgt gtttgcatat cgagtaaaga agacgattgg cgcttatgca 960gcagctatgg
gaggcgtcga tgtcattgta tttacagcag gtgttggtga aaatggtcct 1020gagatacgag
aatttatact tgatggatta gagtttttag ggttcagctt ggataaagaa 1080aaaaataaag
tcagaggaaa agaaactatt atatctacgc cgaattcaaa agttagcgtg 1140atggttgtgc
ctactaatga agaatacatg attgctaaag atactgaaaa gattgtaaag 1200agtataaaa
1209341358DNAThermoanaerobacter pseudoethanolicus 34gctaatgcta tcggaccaat
ttctcaaggt cttgcaaaac ctatcaatga cttgtcaaga 60ggttgtagtg tagaagatat
tgttaatgtt atagcaataa cttgtgtaca agctcaaggg 120gtgcaaaaat aactttgagg
aggcagcgat tatgaaaatt ttagtcatga actgtggaag 180ctcgtcatta aaagtatcaa
ttgttagata tggataatgg gaaagtgcta gcgaaaggat 240tggcggaaag gataggtatc
aatgattctc ttttaactca tcaagtagag ggcaaagata 300aaataaaaat acaaaaagat
atgaaaaatc ataaagaagc tatacaaatt gttttagagg 360ctttagtaga taaagaaatc
ggaatattaa aagatatgaa agaaatagat gcagtaggac 420atagagttgt gcacggggga
gagtttttta ctgattccgt attgattgac gatgaggtaa 480tcaaaaaatt agaagcatgt
attgaccttg cacctttgca caatcctgct aatattgagg 540gaataaaagc ttgtcggcag
ataatgccag gggtgccaat ggtagcagtt tttgatacgg 600ctttccatca aacaatgcca
gattatgcgt atatttatcc cattccttat gaatactacg 660aaaaatatag aataagaaga
tatggattcc atgggacttc tcataaatat gtatctttaa 720gagctgctga aatattaaag
aggcctattg aagagttaaa aattattact tgccatttag 780ggaatgggtc tagtattgct
gcggttaaag gcggtaagtc gatagataca agtatgggat 840ttactccatt agaagggctg
gctatgggta caaggtccgg aaatgttgat ccttcaatta 900taactttctt aatggaaaaa
gaaggattga ctgcagaaca ggttatagat atacttaata 960agaaatcagg tgtatacgga
atttcaggaa taagtaatga ctttagagat atagaaaatg 1020cagcttttaa agaagggcat
aaaagggcta tgttggcatt aaaagttttc gcttataggg 1080tgaaaaagac aataggttct
tatacagctg ctatgggtgg ggttgatgta attgtgttta 1140ctgctggagt tggagaaaat
ggaccagaaa tgagagagtt tattttagag gatctagagt 1200ttttaggctt taaactggac
aaagagaaga ataaggtaag aggaaaagag gaaattatat 1260ctacagaaga ttcaaaagtt
aaagttatgg ttattcctac aaatgaagaa tatatgattg 1320ctaaagatac tgaaaaattg
gtaaaaggtt taaagtag
1358351196DNAThermoanaerobacter pseudoethanolicus 35atggcagtaa tggatagtat
catacaaaag gctaaagcta ataaaaaaag gattgtgctt 60cctgagggaa gtgaagctcg
aactttaaaa gctgctgaaa aggttattaa agaaggtatt 120gctgatgtag ttttattagg
gaaggaagaa gaaataaaag aaaaagcaaa gggattggat 180atctcgaaag cagaaattat
agaccctgaa aagtcgcctc ttttacaaaa atatgctgaa 240gaatattata atttgagaaa
aaccaaagga gttacagaag aacaggcata tcaaattatg 300aaagacccta tgtactatgg
gtgcatgatg gtcaaattag acgatgttga tggtatggta 360tctggggcga ttcacgctac
tgctgatgtt ttcagaccgg cttttcaaat tgtaaaaact 420gctgcaggtg tcaaagtagt
atccagcgcc tttataatgg aagtacctaa ttgtacttat 480ggaagcgatg gagtatttat
ttttgctgat tgtgcaataa atcctaatcc taatgaagag 540gaattagcag caattgccat
tgcttctgcc catactgcaa aagtccttgc tggaattgag 600cctagaattg ctatgctgtc
attttctact aaaggaagtg caaaccatga attagtagat 660aaggtgaaaa atgcgactaa
aatcgcaaaa gaattggcgc ctgatttgct aattgatggt 720gagcttcaat tagatgctgc
gattgtcaaa gaagtaggag agttaaaggc tccaggaagt 780cctgtagcgg ggaatgcaaa
tgtgcttatt ttcccagatt tgcaagcggg aaacattgga 840tataagctag tgcaaagact
tgctaaagct aatgctatcg gaccaatttc tcaaggtctt 900gcaaaaccta tcaatgactt
gtcaagaggt tgtagtgtag aagatattgt taatgttata 960gcaataactt gtgtacaagc
tcaaggggtg caaaaataac tttgaggagg cagcgattat 1020gaaaatttta gtcatgaact
gtggaagctc gtcattaaaa gtatcaattg ttagatatgg 1080ataatgggaa agtgctagcg
aaaggattgg cggaaaggat aggtatcaat gattctcttt 1140taactcatca agtagagggc
aaagataaaa taaaaataca aaaagatatg aaaaat
1196361053DNAThermoanaerobacter sp. 36gtgtatacaa tatatttctt ctttttagta
agaggaatgt ataaaaataa atattttaaa 60ggaagggacg atcttatgag cattattcaa
aacatcattg aaaaagctaa aagtgataaa 120aagaaaattg ttctgccgga aggtgcagaa
cccagaacat taaaagctgc tgaaatagtt 180ttaaaagaag gaattgcaga tttggtgctt
cttggaaatg aagatgagat aagaaatgct 240gcaaaagact tggacatatc taaagctgaa
atcattgatc ctgtaaaatc tgaaatgttt 300gataggtatg ctaatgattt ttatgagtta
aggaagagca aaggaatcac gttggaaaaa 360gccagagaaa caatcaagga taatatctat
tttggatgta tgatggttaa agaaggttat 420gctgatggat tggtatctgg cgctattcat
gctactgcag atttattaag acctgcattt 480cagataatta aaacggctcc aggagcaaag
atagtatcaa gcttttttat aatggaagtg 540cctaattgtg aatatggtga aaatggtgta
ttcttgtttg ctgattgcgc ggtcaaccca 600tcgcctaatg cagaagaact tgcttctatt
gctgtacaat ctgctaatac tgcaaagaat 660ttgttgggct ttgaaccaaa agttgctatg
ctatcatttt ccacaaaagg tagtgcatca 720catgaattag tagataaagt aagaaaagcg
acagaaatag caaaagaatt gatgccagat 780gttgctatcg acggtgaatt gcaattggat
gctgctcttg tcaaagaagt tgcagagcta 840aaagcgccag gaagcaaagt tgcgggatgt
gcaaatgtgc ttatattccc tgatttacaa 900gctggtaata taggatataa gcttgtacag
agattagcta gcaaatgcaa ttggacctat 960aacacaggaa tgggtgcacc ggttaatgat
ttatcaagag gatgcagcta tagagatatt 1020gttgacgtaa tagcacacag ctgtacaggc
tca 1053371068DNAThermoanaerobacter sp.
37atgctaacgg agaaaaatca agataaaaaa agacatgaaa gatcacaaag acgcaataaa
60attgttttag atgctttggt aagcagtgac tacggcgtta taaaggatat gtctgagata
120gatgctgtag gacatagagt tgttcacgga ggagaatctt ttacatcatc agttctcata
180aatgatgatg tgttaaaagc gataacagat tgcatagaat tagctccact gcacaatcct
240gccaatatag aaggaattaa agcttgccag caaatcatgc caaacgttcc aatggtggcg
300gtatttgata cagcctttca tcagacaatg cctgattatg catatcttta tccaatacct
360tatgaatact acacaaagta caggatcaga agatatggat ttcatggcac atcgcataaa
420tatgtttcaa atagggctgc agagatttta aataaaccta ttgaagattt gaaaatcata
480acttgtcatc ttggaaatgg ctccagcatt gctgctgtca aatatggtaa atcaattgac
540acaagcatgg gatttacacc attagaaggt ttggctatgg gtacacgatc tggaagcata
600gacccatcca ttatttcgta tcttatggaa aaagaaaata taagcgctga agaagtagta
660aatatattaa ataaaaaatc tggtgtttac ggtatttcag gaataagcag cgattttaga
720gacttagaag atgccgcctt taaaaatgga gatgaaagag ctcagttggc tttaaatgtg
780tttgcatatc gagtaaagaa gatgattggc gcttatgcag cagctatggg aggcgtcgat
840gccattgtat ttacagcagg tgttggtgaa aatggtcctg agatacgaga atttatactt
900gatggattag agttcttagg gttcagcttg gataaagaaa aaaataaagt cagaggaaaa
960gaaactatta tatctacgcc gaattcaaaa gttagcgtga tggttgtgcc cactaatgaa
1020gaatacatga ttgctaaaga tactgaaaag attgtaaaga gtataaaa
1068381059DNAThermoanaerobacterium saccharolyticum 38gtgtatacaa
tatatttctt ctttttagta agaggaatgt ataaaaataa atattttaaa 60ggaagggatg
atcttatgag cattattcag aacatcattg aaaaagctaa aagcgataaa 120aagaaaattg
ttctgccaga aggtgcagaa cccaggacat taaaagctgc tgaaatagtt 180ttaaaagaag
gaattgcaga tttggtgctt cttggaaatg aagatgagat aagaaatgca 240gcaaaagact
tggacatatc caaagctgaa ataattgacc ctgtaaaatc tgaaatgttt 300gataggtatg
ctaatgattt ttacgaatta agaaagagca agggaatcac attggaaaaa 360gccagagaaa
caatcaagga taatatctat tttggatgta tgatggttaa agaaggttat 420gctgatggat
tagtatctgg cgctattcat gctactgcag atttattaag acctgcattt 480cagataatta
aaacagctcc aggagcaaag atagtatcaa gcttttttat aatggaagtg 540cctaattgtg
aatatggtga aaatggcgta ttcttgtttg ctgattgtgc ggtcaatcca 600tcacctaatg
cagaagaact tgcttctatt gctgtacaat ctgctaatac tgcaaagaat 660ttgttgggtt
ttgaaccaaa agttgccatg ctatcatttt ccacaaaagg tagtgcatca 720catgaattag
tagacaaggt aagaaaagcg acagagatag caaaggattt gatgccagat 780gttgctatcg
atggtgaatt gcaactggat gctgctattg ttaaagaagt tgcagagcta 840aaagcaccgg
gaagcaaagt tgcgggatgt gcaaatgtgc ttatattccc tgacttacaa 900gctggtaata
taggatataa gcttgtacag agattagcta aggcaaatgc aattggaccg 960ataacgcaag
gaatgggtgc accagttaat gatttatcaa gaggatgcag ctataaagat 1020attgttgacg
taatagcgac aacagctgtg caggctcaa
1059391209DNAThermoanaerobacterium saccharolyticum 39atgaaaacta
tgaaaattct ggttattaat tgtggaagtt cttcactaaa atatcaattg 60attgaatcaa
ttgatggaaa tgtgctggca aaaggccttg ctgaaagaat cggcataaat 120gattccctgt
tgacgcataa tgctaacgga gaaaaaatca agataaaaaa agacatgaaa 180gatcacaaag
acgcaataaa attggtttta gatgctttgg taagtagcga ctacggcgtt 240ataaaggata
tgtctgagat agatgctgta ggacatagag ttgttcatgg aggagagtct 300tttacatcat
cagttcttat aaatgatgaa gtgttaaagg caataacaga ttgtatagaa 360ttagctccac
tgcataatcc tgctaatata gaaggaatta aagcttgcca gcaaatcatg 420ccaaacgttc
caatggtggc ggtatttgat acagcctttc atcaaacaat gcctgattat 480gcatatcttt
atccaatacc ttatgagtac tacacaaagt acaggatcag aagatatgga 540tttcatggca
cgtcgcataa atatgtttca agtagggctg cagagatttt gaataaacct 600attgaagatt
tgaaaatcat aacttgtcat cttggaaatg gctccagtat tgctgccgtc 660aaatatggta
aatcaattga cacaagcatg ggatttacac cattagaagg tttggctatg 720ggtacacgat
ctggaagtat agacccatcc atcatttctt atcttatgga aaaagaaaat 780ataagtgctg
aagaggtagt aaatatatta aataaaaaat ctggtgttta cggtatttcg 840ggaataagca
gcgattttag agatttagaa gatgctgcct ttaaaaatgg agatgaaaga 900gctcagttgg
ccttaaatgt gtttgcatat cgagtaaaga agacgattgg agcttatgca 960gcagctatgg
gaggcgttga tgtcattgta tttacggcag gtgttggtga aaatgggcct 1020gagataagag
aatttatact tgatggattg gagttcttag ggttcagctt ggataaagaa 1080aaaaataaag
tcagaggaaa ggaaactatt atatctacgc caaattcaaa aattagcgtg 1140atggttgtgc
cgactaatga agaatatatg attgctaaag atactgaaaa gattgtaaag 1200agtataaaa
120940933DNAThermoanaerobacterium saccharolyticum 40atgagcaagg tagcaataat
aggatctggt tttgtaggtg caacatcggc atttacgctg 60gcattaagtg ggactgtgac
agatatcgtg ctggtggatt taaacaagga caaggctata 120ggcgatgcac tggacataag
ccatggcata ccgctaatac agcctgtaaa tgtgtatgca 180ggtgactaca aagatgtgaa
aggcgcagat gtaatagttg tgacagcagg tgctgctcaa 240aagccgggag agacacggct
tgaccttgta aagaaaaata cagccatatt taagtccatg 300atacctgagc ttttaaagta
caatgacaag gccatatatt tgattgtgac aaatcccgta 360gatatactga cgtacgttac
atacaagatt tctggacttc catggggcag agtttttggt 420tctggcaccg ttcttgacag
ctcaaggttt agataccttt taagcaagca ctgcaatata 480gatccgagaa atgtccacgg
aaggataatc ggcgagcatg gtgacacaga gtttgcagca 540tggagcataa caaacatatc
gggtatatca tttaatgagt actgcagcat atgcggacgc 600gtctgcaaca caaatttcag
aaaggaagta gaagaagaag tcgtaaatgc tgcttacaag 660ataatagaca aaaaaggtgc
tacatactat gctgtggcag ttgcagtaag aaggattgtg 720gagtgcatct taagagatga
aaattccatc ctcacagtat catctccatt aaatggacag 780tacggcgtga aagatgtttc
attaagcttg ccatctatcg taggcaggaa tggcgttgcc 840aggattttgg acttgccttt
atctgacgaa gaagtggaga agtttaggca ttcagcaagt 900gtcatggcag atgtcataaa
acaattagat ata
93341933DNAThermoanaerobacter sp. 41atgagtaaag tggccataat aggttcagga
tttgtaggtg ctacatctgc atttacattg 60gctctaagtg ggactgtgac agacattgtt
ttagtagatt taaacaagga caaggcgata 120ggcgatgcac tggatattag ccacggtata
ccgcttatac agcctgtaaa tgtgtatgct 180ggcgactaca aggatatcga gggcgcagat
gtagtagttg taacagcagg tgcggctcaa 240aagccaggag agtctaggct ggaccttgta
aaaaagaata catctatatt caagtccatg 300atacctgaac ttttaaaata caatgataaa
gctatatacc tgattgtaac aaatcctgtt 360gatatattaa cgtatgttac atacaaaata
gcgaaacttc cgtgggggcg tgtattcggt 420tcaggtactg tccttgacag ttcccgattt
aggtatcttt taagtaaaca ttgcaatatt 480gatcctagaa atgtacatgg aaggataatt
ggagaacacg gcgatacaga atttgcggcg 540tggagcataa caaatatttc aggaatatca
tttaatgagt actgcaattt gtgcggacga 600gtttgtaata caaatttcag aaaggaagtg
gaagatgaag ttgtcaatgc ggcttacaaa 660attattgata aaaagggtgc cacgtattac
gctgtggctg tagcagtaag aagaatagtt 720gagtgtatca taagggatga aaattcaatt
cttacagttt catctccatt aaatggtcaa 780tacggtgtaa gagatgtatc tttaagcttg
ccatcaattg tgggcaaaaa tggtgttgca 840agggttctgg atttgccttt ggctgatgac
gaagttgaga agtttaaaca ttcggcaagc 900gttatggctg atgttataaa acagttggac
ata 93342936DNAThermoanaerobacter
pseudoethanolicus 42atgaacaaaa tatctataat aggttctgga tttgtcggtg
ctactactgc atacacactg 60gctttgagtg ggattgccaa aactattgta ttaatagata
ttaataaaga caaagcagaa 120ggcgatgctc ttgatataag ccacggcgta ccgtttatta
gtccagttga attgtacgcg 180ggagattata gtgatgtttc aggttctgac ataataatca
ttacagcggg agcagcacaa 240aaaccgggag aaaccagact tgacttagtg aagagaaata
cgatgatttt taaagacata 300gtggcaaaac ttattaaagt aaatgacaca gcaatatacc
ttatagttac aaatccagta 360gatattctta catacgttac ctataaaata tctggcttgc
catacggaag agtattgggg 420tctggcacag ttctcgacag tgcgagattc agatatcttt
taagcaaaca ttgtaacata 480gatccgagga atatacacgg atatataatt ggggagcatg
gcgattctga gcttgcagct 540tggagcatta cgaacatagc aggcatacca attgataatt
actgcaattt atgtggaaaa 600gcatgtgaaa aagattttag agaggagatt tttaataatg
ttgtaagagc tgcctatacg 660ataatagaaa aaaagggtgc gacatattat gcggttgctc
tcgcagtaag aagaatcgta 720gaagctattt tcagagatga aaattccatt ttgactgtgt
catctccgct aaccggccaa 780tatggtgtta caaatgtggc tttgagcctt ccctccgttg
ttggacgaaa tggaatcgta 840aatatacttg aattaccact ttcacaggaa gaaattgctg
cttttagaag atcagccgaa 900gttatcaaaa gtgtaataca agagcttgat atataa
93643631DNAThermoanaerobacterium saccharolyticum
43aggcgatgca ctggacataa gccatggcat accattaata cagcctgtaa atgtgtatgc
60aggtgactac aaagatgttg aaggcgcgga cgtaatagtt gtgacagcag gggctgctca
120aaagccaggt gagacgaggc ttgaccttgt gaagaaaaat acagctatat ttaagtccat
180gatacctgag cttttaaagt acaatgacaa ggctatatat ttgattgtca caaatcctgt
240agacatactg acgtacgtta catacaagat atctggactt ccatggggca gagttttcgg
300ttctggcact gttcttgaca gttcaaggtt taggtacctt ttaagcaggc actgcaatat
360agattccgag aaatgtccac ggaaggataa tcggcgagca tggtgacaca gagtttgcag
420catggagcat aacaaacata tctggaatat catttaatga gtactgcagc atatgcgggc
480gcatctgcaa cacaaatttc agaaaggaag tagaagaaga agtcgtaaat gctgcttata
540agataataga caaaaaaggt gctacatact atgctgtcgc agttgcagta agaaggattg
600tggagtgcat cttaagagat gaaaattcca t
631442229DNAThermoanaerobacterium saccharolyticum 44atgatcaatg aatggcgcgg
gtttcaggag ggcaaatggc aaaagactat tgacgttcaa 60gattttatcc agaaaaatta
cacattatac gaaggcgatg atagtttttt agaagggcct 120acagaaaaga ctattaagct
ttggaacaaa gttcttgagc taatgaagga agaactgaaa 180aaaggtgtgt tagatattga
tacaaaaact gtatcgtcta taacatccca tgatgcgggg 240tatatagaca aagatcttga
ggaaatagtt ggattgcaga cagacaaacc tcttaaaaga 300gctataatgc cttacggtgg
cataagaatg gtcaaaaaag cttgcgaagc ttatggatat 360aaagtggacc caaaagtaga
agagatattt acgaagtaca gaaagaccca caatgatggt 420gtatttgatg catatactcc
agaaataaga gcagcaagac atgccggcat aataacaggt 480cttccagatg catatggcag
aggaagaatc ataggtgatt acagaagagt tgctctttat 540ggaattgata gactcatcga
agaaaaggaa aaagaaaaac ttgagcttga ttacgatgaa 600tttgatgaag caactattcg
cttgagagaa gaattgacag aacagataaa agcattaaac 660gaaatgaaag agatggcttt
aaagtacggt tatgacatat caaagcctgc aaaaaatgca 720aaagaagctg tgcagtggac
ttactttgcc ttccttgctg ctataaagga acaaaatggt 780gccgctatgt cgctgggcag
agtatctact tttttagata tatacattga aagagatctt 840aaagaaggaa cattgacaga
gaaacaagca caagagttaa tggatcactt tgtcatgaag 900cttagaatgg tgaggttctt
aaggactcct gattacaatg aactatttag tggcgatcct 960gtttgggtga ctgaatcaat
tggcggtgta ggcgtagacg gaagacctct tgtcactaaa 1020aattcattca ggatattaaa
tactttatat aacttaggtc ctgcacctga gccaaacttg 1080acggttttat ggtccaaaaa
ccttcctgaa aactttaaaa gattctgtgc caaggtatca 1140atagatacaa gttctattca
atatgaaaat gacgacttaa tgaggccaat atacaatgac 1200gactatagca tcgcctgctg
tgtgtcagct atgaagacgg gagaacagat gcaatttttt 1260ggagcaaggg caaatctcgc
gaaggcgcta ctgtatgcta taaacggcgg tatcgatgaa 1320aggtataaaa cgcaagtggc
accaaaattt aatcctataa cgtctgagta tttagactac 1380gatgaggtaa tggcagcata
tgacaatatg ttagagtggc ttgcaaaagt gtatgttaaa 1440gctatgaata taatacacta
catgcacgat aaatacgctt atgaaagatc ccttatggct 1500ttgcatgata gagacatcgt
aaggacgatg gcttttggaa tcgcaggtct ttctgttgcg 1560gcagattcgt taagcgccat
aaagtatgct aaagtaaaag ccataagaga tgaaaatggc 1620atagcaatag attatgaagt
ggaaggagat ttccctaagt ttggcaatga tgatgacagg 1680gttgactcaa tagcagttga
cattgtagaa agattcatga ataagcttaa aaagcacaag 1740acttacagaa actctatacc
aacactgtct gttttgacaa taacgtcaaa tgtggtgtac 1800ggcaaaaaga cgggtgctac
acctgacgga agaaaagcgg gagaaccttt tgcgccaggc 1860gcaaatccga tgcacggcag
agatacaaaa ggtgccatag catcaatgaa ttcagtatca 1920aaaatacctt atgacagttc
attggatggt atatcataca catttacgat tgtaccaaat 1980gcgcttggca aggatgacga
agataaaatt aataatcttg taggactatt agatggatat 2040gcatttaatg cggggcacca
cataaacatc aatgttttaa acagagatat gttgcttgat 2100gctatggagc atcctgaaaa
atatccgcag cttactataa gggtttcagg gtatgctgtc 2160aatttcaata aattaacgag
agagcaacag ttggaggtta tatcccgcac ttttcacgaa 2220tctatgtag
2229452229DNAClostridium
thermocellum 45atggatgcat ggcgcggatt taataaaggc aactggtgcc aggaaattga
cgttcgtgat 60tttataatta gaaattatac tccttatgaa ggcgatgaaa gctttcttgt
aggacctacg 120gatagaacgc ggaaactttg ggagaaggtt tccgaactgt taaagaaaga
acgggagaac 180ggcggggtat tggatgttga tacccataca atttcaacga ttacgtctca
taaacctgga 240tatatagata aagaacttga agttattgtc gggcttcaga cggatgagcc
tttaaaaaga 300gccataatgc cgtttggcgg tatacgtatg gtgattaagg gagccgaagc
ttatggccac 360agtgtggacc ctcaggttgt tgaaatattc acaaagtaca gaaagactca
taaccaggga 420gtttatgatg tatatactcc cgaaatgaga aaagccaaaa aagccgggat
tattacagga 480cttcccgacg catacggcag aggaagaata attggcgatt acagaagggt
tgcactttat 540ggcgttgaca ggctgattgc tgaaaaagag aaagaaatgg caagtcttga
aagagattac 600attgactatg agactgttcg agacagagaa gaaataagcg agcagattaa
atctttaaaa 660caacttaaag aaatggcttt aagttacggt tttgacatat cttgtcctgc
aaaggatgcc 720agagaagcct ttcaatggtt gtattttgca tatcttgcag cagtcaagga
acagaacggc 780gcggcaatga gtattggaag aatttcgact ttccttgaca tatacattga
aagggatctc 840aaagaaggaa aactcacgga ggagttggct caggaactgg ttgaccagct
ggttataaag 900ctgagaattg tgagattttt gagaactcct gagtatgaaa agctcttcag
cggagacccc 960acttgggtaa ccgaaagtat cggaggtatg gcgctggatg gaagaacgct
ggttacaaaa 1020tcttcgttca ggtttttgca cactcttttc aacctgggac atgcaccgga
gcccaacctt 1080acagtacttt ggtccgtcaa tcttcccgaa ggctttaaaa agtactgtgc
aaaggtatca 1140attcattcaa gctccatcca gtatgaaagc gacgacataa tgaggaaaca
ctggggagac 1200gattatggaa tagcatgctg tgtttctgct atgagaattg gaaaacagat
gcagttcttc 1260ggtgcaagat gcaatcttgc aaaagctctt ctttacgcta ttaacggcgg
aaaggatgaa 1320atgacgggag aacagattgc tccgatgttt gcaccggtgg aaaccgaata
ccttgattac 1380gaggacgtaa tgaagaggtt tgacatggtg cttgactggg tggcaaggct
ttatatgaac 1440accctcaata taattcacta catgcatgac aaatatgcct atgaggcgct
gcagatggca 1500ttgcatgaca aagacgtgtt caggacgatg gcatgcggaa tagccggttt
gtctgtggtg 1560gcagactccc ttagcgcgat aaaatatgca aaggttaaac cgatacgcaa
tgaaaacaac 1620ctcgttgttg actacgaagt tgagggtgat tatcctaaat tcggaaataa
cgacgaacgt 1680gttgatgaaa ttgcagtgca agtagtaaaa atgttcatga acaagcttag
aaagcaaagg 1740gcttacagaa gtgccactcc gaccctttcc atacttacca taacttcaaa
cgtggtatat 1800ggaaagaaaa ccggaaacac tcctgacggc agaaaagctg gagaaccttt
ggcgccggga 1860gcaaatccga tgcatggaag ggatataaac ggagcattgg ctgtactgaa
cagtattgcg 1920aagcttccct atgaatatgc ccaggacggc atttcatata ctttctccat
aattccaaaa 1980gctctgggaa gagacgagga aaccagaata aacaatctta aatcaatgct
tgacggatat 2040ttcaagcagg gcggccacca cataaatgta aatgtgtttg aaaaagagac
actgttagat 2100gccatggaac atccggaaaa atatccacaa cttaccataa gagtgtccgg
gtatgcagtg 2160aactttataa agcttacacg ggagcaacag ctggatgtta ttaacagaac
gattcacgga 2220aagatttaa
2229462061DNAClostridium phytofermentans 46atgatgactt
cagttatgaa acaggaatgg gaaggtttta aacaaggtag atggatcact 60tcagtaaatg
ttcgagactt catacagaac aattacacaa tgtatgatgg tgatgaatcc 120tttttagcag
gtccaaccga agccaccaat aaactatggg cccaggttat ggagctttca 180aagcaggaaa
gtgagaaagg tggagtcctt gatatggaca ccaagatagt atctactatt 240gtttctcacg
gtcctggtta tttagataaa gatattgaaa caattgttgg ttttcagacc 300gataagccat
ttaagagatc actacaggtc tttggtggta ttcgtatggc acagagtgct 360tgccatgaat
atggatatga ggtagacgaa gaggtagcac gtatttttac agactaccgc 420aagacacata
atcaaggtgt atttgatgca tacactgacg aaatgaagct cgctagaaaa 480tcagcaatca
ttactggttt gcctgatgct tatggtagag gtagaattat tggcgattac 540cgtcgagtgg
cactttacgg tactgattta cttattgaag acaagaaaga acaacttaca 600acttccttaa
agagaatgac tagtgataat attcgcttaa gagaagaatt agcagaacaa 660attcgtgcat
taaaagaatt agcgaagctt ggtgaaatct atggttacga tattacgaag 720ccagcaataa
atgcaaagga agcaattcag tggctttact ttggatatct tgcagcggta 780aaagagcaaa
acggtgctgc aatgagctta ggccgtactt ctacattcct tgatatttat 840atccagagag
atttagataa tggtgttatc acagaaaaag aagcacaaga gtatatcgat 900cattttatta
tgaaacttcg tctagtgaag tttgcaagaa ctccagaata caatgcctta 960ttctccggtg
accctacttg ggtaacagaa agtatcgctg ggattggtac agatggacgc 1020catatggtaa
caaagacatc cttccgttac cttcatacgt tagacaacct tggaactgct 1080ccagaaccaa
acatgacagt tctatggtca actagattac caagattatt taaagagtac 1140tgtgctaaga
tgtcaattaa gtcatcctct attcaatacg aaaatgatga tatcatgcgt 1200ccaactcatg
gtgatgatta tgcaattgct tgttgtgtat cctctatgaa aattggtaaa 1260gagatgcagt
tctttggagc acgtgcaaat cttgctaagt gtcttcttta cgcaatcaat 1320ggtggtgtag
atgaagttct taaaattcag gttggtccaa agtaccgtcc agttgagggt 1380gaatacctta
attatgagga cgtaatgtcg aaatacaaag atatgatgga gtggctagca 1440gaactttatg
tgaatacttt aaatgtaatc cactacatgc atgataaata tagctatgaa 1500agaattcaaa
tggcacttca tgatcgtgaa gtaaaacgtt actttgcaac tggtattgcg 1560ggtctttctg
ttgtagcgga ctctttaagt gcaattaagt atgctaaggt aaaagtaatt 1620cgtgatgaga
atggcgttgt aaccgattac gaaattgaag gtgattatcc aaagtacggc 1680aacaatgatg
atcgtgtaga cgatatcgct gtacagttag tgcatgactt tatgaacatg 1740attcgcaagc
atcatactta tcgtgatgga tacccaacga tgtcaatctt aacgataact 1800tctaatgtag
tttatggaaa gaagacaggt aatactccag acggacgtaa gaagggtgaa 1860ccattagcac
caggtgctaa cccaatgcat cgtcgtgata ctcatggtgc agcagcgtcc 1920ctagcatcgg
tagcaaagct tccattccgt gatgcgcagg atggtatttc taatacgttc 1980tctattgtac
caggagcatt aggtaagaat gatgtgttat ttgctggaga cttagattta 2040gacgatatgt
ctgagaacta a
2061475003DNAArtificial SequenceDescription of Artificial Sequence
Synthetic polynucleotide 47agagctataa tgccttacgg tggcataaga
atggtcaaaa aagcttgcga agcttatgga 60tataaagtgg acccaaaagt agaagagata
tttacgaagt acagaaagac ccacaatgat 120ggtgtatttg atgcatatac tccagaaata
agagcagcaa gacatgccgg cataataaca 180ggtcttccag atgcatatgg cagaggaaga
atcataggtg attacagaag agttgctctt 240tatggaattg atagactcat cgaagaaaag
gaaaaagaaa aacttgagct tgattacgat 300gaatttgatg aagcaactat tcgcttgaga
gaagaattga cagaacagat aaaagcatta 360aacgaaatga aagagatggc tttaaagtac
ggttatgaca tatcaaagcc tgcaaaaaat 420gcaaaagaag ctgtgcagtg gacttacttt
gccttccttg ctgctataaa ggaacaaaat 480ggtgccgcta tgtcgctggg cagagtatct
acttttttag atatatacat tgaaagagat 540cttaaagaag gaacattgac agagaaacaa
gcacaagagt taatggatca ctttgtcatg 600aagcttagaa tggtgaggtt cttaaggact
cctgattaca atgaactatt tagtggcgat 660cctgtttggg tgactgaatc aattggcggt
gtaggcgtag acggaagacc tcttgtcact 720aaaaattcat tcaggatatt aaatacttta
tataacttag gtcctgcacc tgagccaaac 780ttgacggttt tatggtccaa aaaccttcct
gaaggtcaat ctatgaaatg cgattaagct 840tggctgcagg tcgataaacc cagcgaacca
tttgaggtga taggtaagat tataccgagg 900tatgaaaacg agaattggac ctttacagaa
ttactctatg aagcgccata tttaaaaagc 960taccaagacg aagaggatga agaggatgag
gaggcagatt gccttgaata tattgacaat 1020actgataaga taatatatct tttatataga
agatatcgcc gtatgtaagg atttcagggg 1080gcaaggcata ggcagcgcgc ttatcaatat
atctatagaa tgggcaaagc ataaaaactt 1140gcatggacta atgcttgaaa cccaggacaa
taaccttata gcttgtaaat tctatcataa 1200ttgtggtttc aaaatcggct ccgtcgatac
tatgttatac gccaactttc aaaacaactt 1260tgaaaaagct gttttctggt atttaaggtt
ttagaatgca aggaacagtg aattggagtt 1320cgtcttgtta taattagctt cttggggtat
ctttaaatac tgtagaaaag aggaaggaaa 1380taataaatgg ctaaaatgag aatatcaccg
gaattgaaaa aactgatcga aaaataccgc 1440tgcgtaaaag atacggaagg aatgtctcct
gctaaggtat ataagctggt gggagaaaat 1500gaaaacctat atttaaaaat gacggacagc
cggtataaag ggaccaccta tgatgtggaa 1560cgggaaaagg acatgatgct atggctggaa
ggaaagctgc ctgttccaaa ggtcctgcac 1620tttgaacggc atgatggctg gagcaatctg
ctcatgagtg aggccgatgg cgtcctttgc 1680tcggaagagt atgaagatga acaaagccct
gaaaagatta tcgagctgta tgcggagtgc 1740atcaggctct ttcactccat cgacatatcg
gattgtccct atacgaatag cttagacagc 1800cgcttagccg aattggatta cttactgaat
aacgatctgg ccgatgtgga ttgcgaaaac 1860tgggaagaag acactccatt taaagatccg
cgcgagctgt atgatttttt aaagacggaa 1920aagcccgaag aggaacttgt cttttcccac
ggcgacctgg gagacagcaa catctttgtg 1980aaagatggca aagtaagtgg ctttattgat
cttgggagaa gcggcagggc ggacaagtgg 2040tatgacattg ccttctgcgt ccggtcgatc
agggaggata tcggggaaga acagtatgtc 2100gagctatttt ttgacttact ggggatcaag
cctgattggg agaaaataaa atattatatt 2160ttactggatg aattgtttta gtacctagat
ttagatgtct aaaaagcttt ttagacatct 2220aatcttttct gaagtacatc cgcaactgtc
catactctga tgttttatat cttttctaaa 2280agttcgctag ataggggtcc cgagcgccta
cgaggaattt gtatcgactc tagaggatcc 2340ccgggtaccg agctcgaatt cactggccgc
aagcttggcg taatcatggt catagctgtt 2400tcctgtgtga aattgttatc cgctcacaat
tccacacaac atacgagccg gaagcataaa 2460gtgtaaagcc tggggtgcct aatgagtgag
ctaactcaca ttaattgcgt tgcgctcact 2520gcccgctttc cagtcgggaa acctgtcgtg
ccagctgcat taatgaatcg gccaacgcgc 2580ggggagaggc ggtttgcgta ttgggcgctc
ttccgcttcc tcgctcactg actcgctgcg 2640ctcggtcgtt cggctgcggc gagcggtatc
agctcactca aaggcggtaa tacggttatc 2700cacagaatca ggggataacg caggaaagaa
catgtgagca aaaggccagc aaaaggccag 2760gaaccgtaaa aaggccgcgt tgctggcgtt
tttccatagg ctccgccccc ctgacgagca 2820tcacaaaaat cgacgctcaa gtcagaggtg
gcgaaacccg acaggactat aaagatacca 2880ggcgtttccc cctggaagct ccctcgtgcg
ctctcctgtt ccgaccctgc cgcttaccgg 2940atacctgtcc gcctttctcc cttcgggaag
cgtggcgctt tctcatagct cacgctgtag 3000gtatctcagt tcggtgtagg tcgttcgctc
caagctgggc tgtgtgcacg aaccccccgt 3060tcagcccgac cgctgcgcct tatccggtaa
ctatcgtctt gagtccaacc cggtaagaca 3120cgacttatcg ccactggcag cagccactgg
taacaggatt agcagagcga ggtatgtagg 3180cggtgctaca gagttcttga agtggtggcc
taactacggc tacactagaa ggacagtatt 3240tggtatctgc gctctgctga agccagttac
cttcggaaaa agagttggta gctcttgatc 3300cggcaaacaa accaccgctg gtagcggtgg
tttttttgtt tgcaagcagc agattacgcg 3360cagaaaaaaa ggatctcaag aagatccttt
gatcttttct acggggtctg acgctcagtg 3420gaacgaaaac tcacgttaag ggattttggt
catgagatta tcaaaaagga tcttcaccta 3480gatcctttta aattaaaaat gaagttttaa
atcaatctaa agtatatatg agtaaacttg 3540gtctgacagt taccaatgct taatcagtga
ggcacctatc tcagcgatct gtctatttcg 3600ttcatccata gttgcctgac tccccgtcgt
gtagataact acgatacggg agggcttacc 3660atctggcccc agtgctgcaa tgataccgcg
agacccacgc tcaccggctc cagatttatc 3720agcaataaac cagccagccg gaagggccga
gcgcagaagt ggtcctgcaa ctttatccgc 3780ctccatccag tctattaatt gttgccggga
agctagagta agtagttcgc cagttaatag 3840tttgcgcaac gttgttgcca ttgctacagg
catcgtggtg tcacgctcgt cgtttggtat 3900ggcttcattc agctccggtt cccaacgatc
aaggcgagtt acatgatccc ccatgttgtg 3960caaaaaagcg gttagctcct tcggtcctcc
gatcgttgtc agaagtaagt tggccgcagt 4020gttatcactc atggttatgg cagcactgca
taattctctt actgtcatgc catccgtaag 4080atgcttttct gtgactggtg agtactcaac
caagtcattc tgagaatagt gtatgcggcg 4140accgagttgc tcttgcccgg cgtcaatacg
ggataatacc gcgccacata gcagaacttt 4200aaaagtgctc atcattggaa aacgttcttc
ggggcgaaaa ctctcaagga tcttaccgct 4260gttgagatcc agttcgatgt aacccactcg
tgcacccaac tgatcttcag catcttttac 4320tttcaccagc gtttctgggt gagcaaaaac
aggaaggcaa aatgccgcaa aaaagggaat 4380aagggcgaca cggaaatgtt gaatactcat
actcttcctt tttcaatatt attgaagcat 4440ttatcagggt tattgtctca tgagcggata
catatttgaa tgtatttaga aaaataaaca 4500aataggggtt ccgcgcacat ttccccgaaa
agtgccacct gacgtctaag aaaccattat 4560tatcatgaca ttaacctata aaaataggcg
tatcacgagg ccctttcgtc tcgcgcgttt 4620cggtgatgac ggtgaaaacc tctgacacat
gcagctcccg gagacggtca cagcttgtct 4680gtaagcggat gccgggagca gacaagcccg
tcagggcgcg tcagcgggtg ttggcgggtg 4740tcggggctgg cttaactatg cggcatcaga
gcagattgta ctgagagtgc accatatgcg 4800gtgtgaaata ccgcacagat gcgtaaggag
aaaataccgc atcaggcgcc attcgccatt 4860caggctgcgc aactgttggg aagggcgatc
ggtgcgggcc tcttcgctat tacgccagct 4920ggcgaaaggg ggatgtgctg caaggcgatt
aagttgggta acgccagggt tttcccagtc 4980acgacgttgt aaaacgacgg cca
5003486033DNAArtificial
SequenceDescription of Artificial Sequence Synthetic polynucleotide
48cccattgtgc aatcgatcaa ttgaaatagc tattcaattg atttaagcaa ttttatttct
60tcttcattta attctcgcca ttccccttct tttaaatttt catctaattt taattggcct
120attgaaagcc tttttaagta gacgactttt gagcctatcg cttcaaacat tctttttatt
180tgatgatatt taccttctct gattgagaca tatactttcg atgtactgcc tgaagatatt
240atttccagtt ttgccggcat agtcttgtaa ccatcgtcta atagtatgcc atctgaaaac
300aaagatacgt catcttcatc gataaaaccc aaaacttctg catagtattt tttgaaaacg
360tgtttttttg gcgataagag cttatgagat agttcaccgt catttgtaat taaaagtaat
420ccctctgtgt ctttatcaag ccttcctgct gggaaaacct ttcttgcctt tatatggtgt
480ggcaaaagat ctacaacggt tttttctgat ggatcatatg ttgcacagat tacacctttc
540ggtttattca tcattatata tatgtattct ttgtacgata ttttttcact tctaaacgtg
600attatatctt tatcaggttg tactgcaaaa ccggggtcgt caatcgtcac attatttatt
660gcaacaaggc cttcttttat aaaattttta atttcttttc ttgtgccata acccatattt
720gataaaagct tatctattct catttttgac atcttaaatt cctcctaaac aatatgactg
780tgcttcttag taaattatat cccaaaaata taaaatttgt agcaaaaatg tgatatatat
840catatttttt gcgttttcct gatgatacaa ttaagatgat gtttcaagat aataaatttt
900tctgaagtgt atacagtata ttgactacaa agaacaaaat actgcaggtc gataaaccca
960gcgaaccatt tgaggtgata ggtaagatta taccgaggta tgaaaacgag aattggacct
1020ttacagaatt actctatgaa gcgccatatt taaaaagcta ccaagacgaa gaggatgaag
1080aggatgagga ggcagattgc cttgaatata ttgacaatac tgataagata atatatcttt
1140tatatagaag atatcgccgt atgtaaggat ttcagggggc aaggcatagg cagcgcgctt
1200atcaatatat ctatagaatg ggcaaagcat aaaaacttgc atggactaat gcttgaaacc
1260caggacaata accttatagc ttgtaaattc tatcataatt gtggtttcaa aatcggctcc
1320gtcgatacta tgttatacgc caactttcaa aacaactttg aaaaagctgt tttctggtat
1380ttaaggtttt agaatgcaag gaacagtgaa ttggagttcg tcttgttata attagcttct
1440tggggtatct ttaaatactg tagaaaagag gaaggaaata ataaatggct aaaatgagaa
1500tatcaccgga attgaaaaaa ctgatcgaaa aataccgctg cgtaaaagat acggaaggaa
1560tgtctcctgc taaggtatat aagctggtgg gagaaaatga aaacctatat ttaaaaatga
1620cggacagccg gtataaaggg accacctatg atgtggaacg ggaaaaggac atgatgctat
1680ggctggaagg aaagctgcct gttccaaagg tcctgcactt tgaacggcat gatggctgga
1740gcaatctgct catgagtgag gccgatggcg tcctttgctc ggaagagtat gaagatgaac
1800aaagccctga aaagattatc gagctgtatg cggagtgcat caggctcttt cactccatcg
1860acatatcgga ttgtccctat acgaatagct tagacagccg cttagccgaa ttggattact
1920tactgaataa cgatctggcc gatgtggatt gcgaaaactg ggaagaagac actccattta
1980aagatccgcg cgagctgtat gattttttaa agacggaaaa gcccgaagag gaacttgtct
2040tttcccacgg cgacctggga gacagcaaca tctttgtgaa agatggcaaa gtaagtggct
2100ttattgatct tgggagaagc ggcagggcgg acaagtggta tgacattgcc ttctgcgtcc
2160ggtcgatcag ggaggatatc ggggaagaac agtatgtcga gctatttttt gacttactgg
2220ggatcaagcc tgattgggag aaaataaaat attatatttt actggatgaa ttgttttagt
2280acctagattt agatgtctaa aaagcttttt agacatctaa tcttttctga agtacatccg
2340caactgtcca tactctgatg ttttatatct tttctaaaag ttcgctagat aggggtcccg
2400agcgcctacg aggaatttgt atcgactcta gaggatcccc gggtaccgaa aaggtgattg
2460tcatggttat ggggaagata cattcaatag agacatgtgg tactgtagat gggcctggca
2520taaggtacgt agtctttatg caaggttgtc ctttaaggtg cgcttattgc cataaccctg
2580acacatggaa ttataacggt ggtaaagaag tatcaacaga tgagatattt aacgatgcaa
2640aaagatatat accgtacatg aaatcatcag gcggcggcgt gacgctgaca ggtggagagc
2700ctacattaca gcctgaattt tgcgaagatc tatttaaaaa gcttaaagcg tctggcatac
2760acactgcatt agacacatcg ggatatgtga atatagataa agtaaaagaa cttgtaaaac
2820acactgatct ttttttgctt gatataaagc acattgatga tgaaagccat aaaaagctta
2880caggagtgtc gaatagaaag actttggagt ttgcaagata cctttccgat gaaggcaaga
2940aaatgtggat aaggcatgtg atagtacctg gaataacgga tgatatggaa gagataagga
3000aattggctga ttttgtctca tcattgaaaa atgtagatag agttgagata cttccgtatc
3060ataaaatggg tgtgtataaa tatgaggcac ttgggatacc atatagattg aagggaataa
3120atcctcctga cacatcaaaa attaaagaga taaaagaaga gtttaggaaa agagatataa
3180aagtggtcta aaagcctcat gattcgtatc atggggcttt tcctttgaat taatttgata
3240aagggtgtaa aattatcatg tgatgatgtg attttggagg taatcgcatg aatttaaata
3300agataaatag aaacacgtac tacatagata atcctacgaa tattggcgtt tatgcctata
3360aaaataaaaa ttgtctatta gtagatactg gtataaacgc aagcttggcg taatcatggt
3420catagctgtt tcctgtgtga aattgttatc cgctcacaat tccacacaac atacgagccg
3480gaagcataaa gtgtaaagcc tggggtgcct aatgagtgag ctaactcaca ttaattgcgt
3540tgcgctcact gcccgctttc cagtcgggaa acctgtcgtg ccagctgcat taatgaatcg
3600gccaacgcgc ggggagaggc ggtttgcgta ttgggcgctc ttccgcttcc tcgctcactg
3660actcgctgcg ctcggtcgtt cggctgcggc gagcggtatc agctcactca aaggcggtaa
3720tacggttatc cacagaatca ggggataacg caggaaagaa catgtgagca aaaggccagc
3780aaaaggccag gaaccgtaaa aaggccgcgt tgctggcgtt tttccatagg ctccgccccc
3840ctgacgagca tcacaaaaat cgacgctcaa gtcagaggtg gcgaaacccg acaggactat
3900aaagatacca ggcgtttccc cctggaagct ccctcgtgcg ctctcctgtt ccgaccctgc
3960cgcttaccgg atacctgtcc gcctttctcc cttcgggaag cgtggcgctt tctcatagct
4020cacgctgtag gtatctcagt tcggtgtagg tcgttcgctc caagctgggc tgtgtgcacg
4080aaccccccgt tcagcccgac cgctgcgcct tatccggtaa ctatcgtctt gagtccaacc
4140cggtaagaca cgacttatcg ccactggcag cagccactgg taacaggatt agcagagcga
4200ggtatgtagg cggtgctaca gagttcttga agtggtggcc taactacggc tacactagaa
4260ggacagtatt tggtatctgc gctctgctga agccagttac cttcggaaaa agagttggta
4320gctcttgatc cggcaaacaa accaccgctg gtagcggtgg tttttttgtt tgcaagcagc
4380agattacgcg cagaaaaaaa ggatctcaag aagatccttt gatcttttct acggggtctg
4440acgctcagtg gaacgaaaac tcacgttaag ggattttggt catgagatta tcaaaaagga
4500tcttcaccta gatcctttta aattaaaaat gaagttttaa atcaatctaa agtatatatg
4560agtaaacttg gtctgacagt taccaatgct taatcagtga ggcacctatc tcagcgatct
4620gtctatttcg ttcatccata gttgcctgac tccccgtcgt gtagataact acgatacggg
4680agggcttacc atctggcccc agtgctgcaa tgataccgcg agacccacgc tcaccggctc
4740cagatttatc agcaataaac cagccagccg gaagggccga gcgcagaagt ggtcctgcaa
4800ctttatccgc ctccatccag tctattaatt gttgccggga agctagagta agtagttcgc
4860cagttaatag tttgcgcaac gttgttgcca ttgctacagg catcgtggtg tcacgctcgt
4920cgtttggtat ggcttcattc agctccggtt cccaacgatc aaggcgagtt acatgatccc
4980ccatgttgtg caaaaaagcg gttagctcct tcggtcctcc gatcgttgtc agaagtaagt
5040tggccgcagt gttatcactc atggttatgg cagcactgca taattctctt actgtcatgc
5100catccgtaag atgcttttct gtgactggtg agtactcaac caagtcattc tgagaatagt
5160gtatgcggcg accgagttgc tcttgcccgg cgtcaatacg ggataatacc gcgccacata
5220gcagaacttt aaaagtgctc atcattggaa aacgttcttc ggggcgaaaa ctctcaagga
5280tcttaccgct gttgagatcc agttcgatgt aacccactcg tgcacccaac tgatcttcag
5340catcttttac tttcaccagc gtttctgggt gagcaaaaac aggaaggcaa aatgccgcaa
5400aaaagggaat aagggcgaca cggaaatgtt gaatactcat actcttcctt tttcaatatt
5460attgaagcat ttatcagggt tattgtctca tgagcggata catatttgaa tgtatttaga
5520aaaataaaca aataggggtt ccgcgcacat ttccccgaaa agtgccacct gacgtctaag
5580aaaccattat tatcatgaca ttaacctata aaaataggcg tatcacgagg ccctttcgtc
5640tcgcgcgttt cggtgatgac ggtgaaaacc tctgacacat gcagctcccg gagacggtca
5700cagcttgtct gtaagcggat gccgggagca gacaagcccg tcagggcgcg tcagcgggtg
5760ttggcgggtg tcggggctgg cttaactatg cggcatcaga gcagattgta ctgagagtgc
5820accatatgcg gtgtgaaata ccgcacagat gcgtaaggag aaaataccgc atcaggcgcc
5880attcgccatt caggctgcgc aactgttggg aagggcgatc ggtgcgggcc tcttcgctat
5940tacgccagct ggcgaaaggg ggatgtgctg caaggcgatt aagttgggta acgccagggt
6000tttcccagtc acgacgttgt aaaacgacgg cca
6033494542DNAArtificial SequenceDescription of Artificial Sequence
Synthetic polynucleotide 49gtttatgatg tatatactcc cgaaatgaga
aaagccaaaa aagccgggat tattacagga 60cttcccgacg catacggcag aggaagaata
attggcgatt acagaagggt tgcactttat 120ggcgttgaca ggctgattgc tgaaaaagag
aaagaaatgg caagtcttga aagagattac 180attgactatg agactgttcg agacagagaa
gaaataagcg agcagattaa atctttaaaa 240caacttaaag aaatggcttt aagttacggt
tttgacatat cttgtcctgc aaaggatgcc 300agagaagcct ttcaatggtt gtattttgca
tatcttgcag cagtcaagga acagaacggc 360gcggcaatga gtattggaag aatttcgact
ttccttgaca tatacattga aagggatctc 420aaagaaggaa aactcacgga ggagttggct
caggaactgg ttgaccagct ggttataaag 480ctgagaattg tgagattttt gagaactcct
gagtatgaaa agctcttcag cggagacccc 540acttgggtaa ccgaaagtat cggaggtatg
gcgctggatg gaagaacgct ggttacaaaa 600tcttcgttca ggtttttgca cactcttttc
aacctgggac atgcaccgga gcccaacctt 660acagtacttt ggtccgtcaa tcttcccgaa
ggctttaaaa agtactgtgc aaaggtatca 720attcattcaa gctccatcca gtatgaaagc
gacgacataa tgaggaaaca ctggggagac 780gattatggaa tagcagatgg attttctatt
attgcaatgt ggaattggga acggaaaaat 840tattttatta aagagtagtt caacaaacgg
gattgacttt taaaaaagga ttgattctaa 900tgaagaaagc agacaagtaa gcctcctaaa
ttcactttag ataaaaattt aggaggcata 960tcaaatgaac tttaataaaa ttgatttaga
caattggaag agaaaagaga tatttaatca 1020ttatttgaac caacaaacga cttttagtat
aaccacagaa attgatatta gtgttttata 1080ccgaaacata aaacaagaag gatataaatt
ttaccctgca tttattttct tagtgacaag 1140ggtgataaac tcaaatacag cttttagaac
tggttacaat agcgacggag agttaggtta 1200ttgggataag ttagagccac tttatacaat
ttttgatggt gtatctaaaa cattctctgg 1260tatttggact cctgtaaaga atgacttcaa
agagttttat gatttatacc tttctgatgt 1320agagaaatat aatggttcgg ggaaattgtt
tcccaaaaca cctatacctg aaaatgcttt 1380ttctctttct attattccat ggacttcatt
tactgggttt aacttaaata tcaataataa 1440tagtaattac cttctaccca ttattacagc
aggaaaattc attaataaag gtaattcaat 1500atatttaccg ctatctttac aggtacatca
ttctgtttgt gatggttatc atgcaggatt 1560gtttatgaac tctattcagg aattgtcaga
taggcctaat gactggcttt tataatatga 1620gataatgccg actgtacttt ttacagtcgg
ttttctaatg tcactagggc tcgcctttgg 1680gaagtttgaa gggctggcac gacaggtttc
ccgactggaa agcgggcagt gagcgcaacg 1740caattaatgt gagttagctc actcattagg
caccccaggc tttacacttt atgcttccgg 1800ctcgtatgtt gtgtggaatt gtgagcggat
aacaatttca cacaggaaac agctatgacc 1860atgattacgc caagcttgca tgcctgcagg
tcgactctag aggatccgca agcttggcgt 1920aatcatggtc atagctgttt cctgtgtgaa
attgttatcc gctcacaatt ccacacaaca 1980tacgagccgg aagcataaag tgtaaagcct
ggggtgccta atgagtgagc taactcacat 2040taattgcgtt gcgctcactg cccgctttcc
agtcgggaaa cctgtcgtgc cagctgcatt 2100aatgaatcgg ccaacgcgcg gggagaggcg
gtttgcgtat tgggcgctct tccgcttcct 2160cgctcactga ctcgctgcgc tcggtcgttc
ggctgcggcg agcggtatca gctcactcaa 2220aggcggtaat acggttatcc acagaatcag
gggataacgc aggaaagaac atgtgagcaa 2280aaggccagca aaaggccagg aaccgtaaaa
aggccgcgtt gctggcgttt ttccataggc 2340tccgcccccc tgacgagcat cacaaaaatc
gacgctcaag tcagaggtgg cgaaacccga 2400caggactata aagataccag gcgtttcccc
ctggaagctc cctcgtgcgc tctcctgttc 2460cgaccctgcc gcttaccgga tacctgtccg
cctttctccc ttcgggaagc gtggcgcttt 2520ctcatagctc acgctgtagg tatctcagtt
cggtgtaggt cgttcgctcc aagctgggct 2580gtgtgcacga accccccgtt cagcccgacc
gctgcgcctt atccggtaac tatcgtcttg 2640agtccaaccc ggtaagacac gacttatcgc
cactggcagc agccactggt aacaggatta 2700gcagagcgag gtatgtaggc ggtgctacag
agttcttgaa gtggtggcct aactacggct 2760acactagaag gacagtattt ggtatctgcg
ctctgctgaa gccagttacc ttcggaaaaa 2820gagttggtag ctcttgatcc ggcaaacaaa
ccaccgctgg tagcggtggt ttttttgttt 2880gcaagcagca gattacgcgc agaaaaaaag
gatctcaaga agatcctttg atcttttcta 2940cggggtctga cgctcagtgg aacgaaaact
cacgttaagg gattttggtc atgagattat 3000caaaaaggat cttcacctag atccttttaa
attaaaaatg aagttttaaa tcaatctaaa 3060gtatatatga gtaaacttgg tctgacagtt
accaatgctt aatcagtgag gcacctatct 3120cagcgatctg tctatttcgt tcatccatag
ttgcctgact ccccgtcgtg tagataacta 3180cgatacggga gggcttacca tctggcccca
gtgctgcaat gataccgcga gacccacgct 3240caccggctcc agatttatca gcaataaacc
agccagccgg aagggccgag cgcagaagtg 3300gtcctgcaac tttatccgcc tccatccagt
ctattaattg ttgccgggaa gctagagtaa 3360gtagttcgcc agttaatagt ttgcgcaacg
ttgttgccat tgctacaggc atcgtggtgt 3420cacgctcgtc gtttggtatg gcttcattca
gctccggttc ccaacgatca aggcgagtta 3480catgatcccc catgttgtgc aaaaaagcgg
ttagctcctt cggtcctccg atcgttgtca 3540gaagtaagtt ggccgcagtg ttatcactca
tggttatggc agcactgcat aattctctta 3600ctgtcatgcc atccgtaaga tgcttttctg
tgactggtga gtactcaacc aagtcattct 3660gagaatagtg tatgcggcga ccgagttgct
cttgcccggc gtcaatacgg gataataccg 3720cgccacatag cagaacttta aaagtgctca
tcattggaaa acgttcttcg gggcgaaaac 3780tctcaaggat cttaccgctg ttgagatcca
gttcgatgta acccactcgt gcacccaact 3840gatcttcagc atcttttact ttcaccagcg
tttctgggtg agcaaaaaca ggaaggcaaa 3900atgccgcaaa aaagggaata agggcgacac
ggaaatgttg aatactcata ctcttccttt 3960ttcaatatta ttgaagcatt tatcagggtt
attgtctcat gagcggatac atatttgaat 4020gtatttagaa aaataaacaa ataggggttc
cgcgcacatt tccccgaaaa gtgccacctg 4080acgtctaaga aaccattatt atcatgacat
taacctataa aaataggcgt atcacgaggc 4140cctttcgtct cgcgcgtttc ggtgatgacg
gtgaaaacct ctgacacatg cagctcccgg 4200agacggtcac agcttgtctg taagcggatg
ccgggagcag acaagcccgt cagggcgcgt 4260cagcgggtgt tggcgggtgt cggggctggc
ttaactatgc ggcatcagag cagattgtac 4320tgagagtgca ccatatgcgg tgtgaaatac
cgcacagatg cgtaaggaga aaataccgca 4380tcaggcgcca ttcgccattc aggctgcgca
actgttggga agggcgatcg gtgcgggcct 4440cttcgctatt acgccagctg gcgaaagggg
gatgtgctgc aaggcgatta agttgggtaa 4500cgccagggtt ttcccagtca cgacgttgta
aaacgacggc ca 4542505648DNAArtificial
SequenceDescription of Artificial Sequence Synthetic polynucleotide
50cccgcaataa tggaagtaaa gagcatgaaa gtgggaatgc tggcttacac cgatatggcg
60gaaattgtgt acaagggcaa tccgaactac aagtttgcgg ccggagagga caagccgggg
120gttgcaccaa gacctttgaa atttgacgat tccataaaaa aagacataga agagttacgg
180agcaaggtgg atattttaat tgtttcactt cactggggag tggaggaaag ctttgaagtt
240ctgcctgaac agagggaatt tgcccacagt cttatagata acggagtgga tgtaatattg
300ggacaccatc cccaccagtt ccaaggtata gaaatctaca agggcaaacc tgttttctac
360agtctgggta attttatttt tgatcagaac gatcccgaaa accaggagtc ctttattgtg
420acacttgatt acaaaggcag cagactgaca ggaatagagg ctgtacccgt gagaacaatc
480ggaaaaatac aggtagttcc tcaaaaagga gatgaagcaa aacctatttt ggaaagagag
540aaaaatttat gtaataggct tgatacaaac tgcattataa aagatgacaa attatatttt
600gaaattggaa aataatgata atataattaa gttggacgta ttttgacaaa ataaaatcat
660aaagtggttg catttgtcga gatttgtgat atcattggat agtaaattat attttaggtt
720aaaaatggaa aaatagtttt ttatttaaac tttattttta aactttattt aaaatatcaa
780aataattgcc tttgtatttt acttattgta caatatattt gtacaatata ttaaggaaaa
840aaatactttt gtagcgactt aaaagtcaat tgaatggacc aataaaggac cttttcaaat
900ttgtcaaggt attttaggac aatttttttt attttggata ttgttcttgt ttattgggta
960aataagatgg attttctatt attgcaatgt ggaattggga acggaaaaat tattttatta
1020aagagtagtt caacaaacgg gattgacttt taaaaaagga ttgattctaa tgaagaaagc
1080agacaagtaa gcctcctaaa ttcactttag ataaaaattt aggaggcata tcaaatgaac
1140tttaataaaa ttgatttaga caattggaag agaaaagaga tatttaatca ttatttgaac
1200caacaaacga cttttagtat aaccacagaa attgatatta gtgttttata ccgaaacata
1260aaacaagaag gatataaatt ttaccctgca tttattttct tagtgacaag ggtgataaac
1320tcaaatacag cttttagaac tggttacaat agcgacggag agttaggtta ttgggataag
1380ttagagccac tttatacaat ttttgatggt gtatctaaaa cattctctgg tatttggact
1440cctgtaaaga atgacttcaa agagttttat gatttatacc tttctgatgt agagaaatat
1500aatggttcgg ggaaattgtt tcccaaaaca cctatacctg aaaatgcttt ttctctttct
1560attattccat ggacttcatt tactgggttt aacttaaata tcaataataa tagtaattac
1620cttctaccca ttattacagc aggaaaattc attaataaag gtaattcaat atatttaccg
1680ctatctttac aggtacatca ttctgtttgt gatggttatc atgcaggatt gtttatgaac
1740tctattcagg aattgtcaga taggcctaat gactggcttt tataatatga gataatgccg
1800actgtacttt ttacagtcgg ttttctaatg tcactagggc tcgcctttgg gaagtttgaa
1860gggctggcac gacaggtttc ccgactggaa agcgggcagt gagcgcaacg caattaatgt
1920gagttagctc actcattagg caccccaggc tttacacttt atgcttccgg ctcgtatgtt
1980gtgtggaatt gtgagcggat aacaatttca cacaggaaac agctatgacc atgattacgc
2040caagcttgca tgcctgcagg tcgactctag aggatcccat taaagggcag gatacactca
2100tttgaatctt ttgggacact ggacggaccg ggtataagat ttgtggtttt catgcagggc
2160tgtcccttgc gttgtatata ttgccacaac agggatacct gggatgttaa tgcggggagt
2220gagtacactc cccggcaagt aattgatgaa atgatgaaat acatagacta tataaaggtc
2280tccggaggcg gaataactgt taccggcggg gagcctgttc tccaggccga ttttgtggcc
2340gaggtgttca gacttgcaaa agagcaggga gtgcatacgg cgctggatac caatggattt
2400gctgacatag agaaggttga aaggcttata aaatacaccg atcttgtatt gctggatata
2460aagcatgccc gggaggataa acataagata attaccggtg tgtccaacga aaaaatcaag
2520cgttttgcgc tgtatctttc ggaccaggga gtgcctatct ggataagata tgtccttgtc
2580cccggatata ccgacgatga agatgacctt aaaatggcgg ctgatttcat aaaaaagctt
2640aaaacggtgg aaaaaatcga agttcttcct tatcacaaca tgggagcata caaatgggaa
2700aaacttggtc agaaatacat gcttgaagga gtaaaggggc cgagtgcgca agaggtggaa
2760aaagcaaaga ggattctgtc aggcaaataa taaaagcttt tttcttttat tatttgcttt
2820tttctattac caatttgctt tgcttaagtt taggtttggt tttgatgagt tttttaatgt
2880ttcttttata tttatctttt atatgaacag tgttgtaaac ttccaaatcc agtttgtcaa
2940atattgattt aaaaatcttt gccgtatact gggcgtcagt taatgcccgg tgaagatttt
3000cgtctatttc aacgcaagct tggcgtaatc atggtcatag ctgtttcctg tgtgaaattg
3060ttatccgctc acaattccac acaacatacg agccggaagc ataaagtgta aagcctgggg
3120tgcctaatga gtgagctaac tcacattaat tgcgttgcgc tcactgcccg ctttccagtc
3180gggaaacctg tcgtgccagc tgcattaatg aatcggccaa cgcgcgggga gaggcggttt
3240gcgtattggg cgctcttccg cttcctcgct cactgactcg ctgcgctcgg tcgttcggct
3300gcggcgagcg gtatcagctc actcaaaggc ggtaatacgg ttatccacag aatcagggga
3360taacgcagga aagaacatgt gagcaaaagg ccagcaaaag gccaggaacc gtaaaaaggc
3420cgcgttgctg gcgtttttcc ataggctccg cccccctgac gagcatcaca aaaatcgacg
3480ctcaagtcag aggtggcgaa acccgacagg actataaaga taccaggcgt ttccccctgg
3540aagctccctc gtgcgctctc ctgttccgac cctgccgctt accggatacc tgtccgcctt
3600tctcccttcg ggaagcgtgg cgctttctca tagctcacgc tgtaggtatc tcagttcggt
3660gtaggtcgtt cgctccaagc tgggctgtgt gcacgaaccc cccgttcagc ccgaccgctg
3720cgccttatcc ggtaactatc gtcttgagtc caacccggta agacacgact tatcgccact
3780ggcagcagcc actggtaaca ggattagcag agcgaggtat gtaggcggtg ctacagagtt
3840cttgaagtgg tggcctaact acggctacac tagaaggaca gtatttggta tctgcgctct
3900gctgaagcca gttaccttcg gaaaaagagt tggtagctct tgatccggca aacaaaccac
3960cgctggtagc ggtggttttt ttgtttgcaa gcagcagatt acgcgcagaa aaaaaggatc
4020tcaagaagat cctttgatct tttctacggg gtctgacgct cagtggaacg aaaactcacg
4080ttaagggatt ttggtcatga gattatcaaa aaggatcttc acctagatcc ttttaaatta
4140aaaatgaagt tttaaatcaa tctaaagtat atatgagtaa acttggtctg acagttacca
4200atgcttaatc agtgaggcac ctatctcagc gatctgtcta tttcgttcat ccatagttgc
4260ctgactcccc gtcgtgtaga taactacgat acgggagggc ttaccatctg gccccagtgc
4320tgcaatgata ccgcgagacc cacgctcacc ggctccagat ttatcagcaa taaaccagcc
4380agccggaagg gccgagcgca gaagtggtcc tgcaacttta tccgcctcca tccagtctat
4440taattgttgc cgggaagcta gagtaagtag ttcgccagtt aatagtttgc gcaacgttgt
4500tgccattgct acaggcatcg tggtgtcacg ctcgtcgttt ggtatggctt cattcagctc
4560cggttcccaa cgatcaaggc gagttacatg atcccccatg ttgtgcaaaa aagcggttag
4620ctccttcggt cctccgatcg ttgtcagaag taagttggcc gcagtgttat cactcatggt
4680tatggcagca ctgcataatt ctcttactgt catgccatcc gtaagatgct tttctgtgac
4740tggtgagtac tcaaccaagt cattctgaga atagtgtatg cggcgaccga gttgctcttg
4800cccggcgtca atacgggata ataccgcgcc acatagcaga actttaaaag tgctcatcat
4860tggaaaacgt tcttcggggc gaaaactctc aaggatctta ccgctgttga gatccagttc
4920gatgtaaccc actcgtgcac ccaactgatc ttcagcatct tttactttca ccagcgtttc
4980tgggtgagca aaaacaggaa ggcaaaatgc cgcaaaaaag ggaataaggg cgacacggaa
5040atgttgaata ctcatactct tcctttttca atattattga agcatttatc agggttattg
5100tctcatgagc ggatacatat ttgaatgtat ttagaaaaat aaacaaatag gggttccgcg
5160cacatttccc cgaaaagtgc cacctgacgt ctaagaaacc attattatca tgacattaac
5220ctataaaaat aggcgtatca cgaggccctt tcgtctcgcg cgtttcggtg atgacggtga
5280aaacctctga cacatgcagc tcccggagac ggtcacagct tgtctgtaag cggatgccgg
5340gagcagacaa gcccgtcagg gcgcgtcagc gggtgttggc gggtgtcggg gctggcttaa
5400ctatgcggca tcagagcaga ttgtactgag agtgcaccat atgcggtgtg aaataccgca
5460cagatgcgta aggagaaaat accgcatcag gcgccattcg ccattcaggc tgcgcaactg
5520ttgggaaggg cgatcggtgc gggcctcttc gctattacgc cagctggcga aagggggatg
5580tgctgcaagg cgattaagtt gggtaacgcc agggttttcc cagtcacgac gttgtaaaac
5640gacggcca
5648514648DNAArtificial SequenceDescription of Artificial Sequence
Synthetic polynucleotide 51tttggtggta ttcgtatggc acagagtgct
tgccatgaat atggatatga ggtagacgaa 60gaggtagcac gtatttttac agactaccgc
aagacacata atcaaggtgt atttgatgca 120tacactgacg aaatgaagct cgctagaaaa
tcagcaatca ttactggttt gcctgatgct 180tatggtagag gtagaattat tggcgattac
cgtcgagtgg cactttacgg tactgattta 240cttattgaag acaagaaaga acaacttaca
acttccttaa agagaatgac tagtgataat 300attcgcttaa gagaagaatt agcagaacaa
attcgtgcat taaaagaatt agcgaagctt 360ggtgaaatct atggttacga tattacgaag
ccagcaataa atgcaaagga agcaattcag 420tggctttact ttggatatct tgcagcggta
aaagagcaaa acggtgctgc aatgagctta 480ggccgtactt ctacattcct tgatatttat
atccagagag atttagataa tggtgttatc 540acagaaaaag aagcacaaga gtatatcgat
cattttatta tgaaacttcg tctagtgaag 600tttgcaagaa ctccagaata caatgcctta
ttctccggtg accctacttg ggtaacagaa 660agtatcgctg ggattggtac agatggacgc
catatggtaa caaagacatc cttccgttac 720cttcatacgt tagacaacct tggaactgct
ccagaaccaa acatgacagt tctatggtca 780actagattac caagattatt taaagagtac
tgtgctaaga tgtcaattaa gtcatcctct 840attcaatacg aaaatgatga tatcatgcgt
ccaactcatg gtgatgatta tgcaattgct 900agatggattt tctattattg caatgtggaa
ttgggaacgg aaaaattatt ttattaaaga 960gtagttcaac aaacgggatt gacttttaaa
aaaggattga ttctaatgaa gaaagcagac 1020aagtaagcct cctaaattca ctttagataa
aaatttagga ggcatatcaa atgaacttta 1080ataaaattga tttagacaat tggaagagaa
aagagatatt taatcattat ttgaaccaac 1140aaacgacttt tagtataacc acagaaattg
atattagtgt tttataccga aacataaaac 1200aagaaggata taaattttac cctgcattta
ttttcttagt gacaagggtg ataaactcaa 1260atacagcttt tagaactggt tacaatagcg
acggagagtt aggttattgg gataagttag 1320agccacttta tacaattttt gatggtgtat
ctaaaacatt ctctggtatt tggactcctg 1380taaagaatga cttcaaagag ttttatgatt
tatacctttc tgatgtagag aaatataatg 1440gttcggggaa attgtttccc aaaacaccta
tacctgaaaa tgctttttct ctttctatta 1500ttccatggac ttcatttact gggtttaact
taaatatcaa taataatagt aattaccttc 1560tacccattat tacagcagga aaattcatta
ataaaggtaa ttcaatatat ttaccgctat 1620ctttacaggt acatcattct gtttgtgatg
gttatcatgc aggattgttt atgaactcta 1680ttcaggaatt gtcagatagg cctaatgact
ggcttttata atatgagata atgccgactg 1740tactttttac agtcggtttt ctaatgtcac
tagggctcgc ctttgggaag tttgaagggc 1800tggcacgaca ggtttcccga ctggaaagcg
ggcagtgagc gcaacgcaat taatgtgagt 1860tagctcactc attaggcacc ccaggcttta
cactttatgc ttccggctcg tatgttgtgt 1920ggaattgtga gcggataaca atttcacaca
ggaaacagct atgaccatga ttacgccaag 1980cttgcatgcc tgcaggtcga ctctagagga
tccgcaagct tggcgtaatc atggtcatag 2040ctgtttcctg tgtgaaattg ttatccgctc
acaattccac acaacatacg agccggaagc 2100ataaagtgta aagcctgggg tgcctaatga
gtgagctaac tcacattaat tgcgttgcgc 2160tcactgcccg ctttccagtc gggaaacctg
tcgtgccagc tgcattaatg aatcggccaa 2220cgcgcgggga gaggcggttt gcgtattggg
cgctcttccg cttcctcgct cactgactcg 2280ctgcgctcgg tcgttcggct gcggcgagcg
gtatcagctc actcaaaggc ggtaatacgg 2340ttatccacag aatcagggga taacgcagga
aagaacatgt gagcaaaagg ccagcaaaag 2400gccaggaacc gtaaaaaggc cgcgttgctg
gcgtttttcc ataggctccg cccccctgac 2460gagcatcaca aaaatcgacg ctcaagtcag
aggtggcgaa acccgacagg actataaaga 2520taccaggcgt ttccccctgg aagctccctc
gtgcgctctc ctgttccgac cctgccgctt 2580accggatacc tgtccgcctt tctcccttcg
ggaagcgtgg cgctttctca tagctcacgc 2640tgtaggtatc tcagttcggt gtaggtcgtt
cgctccaagc tgggctgtgt gcacgaaccc 2700cccgttcagc ccgaccgctg cgccttatcc
ggtaactatc gtcttgagtc caacccggta 2760agacacgact tatcgccact ggcagcagcc
actggtaaca ggattagcag agcgaggtat 2820gtaggcggtg ctacagagtt cttgaagtgg
tggcctaact acggctacac tagaaggaca 2880gtatttggta tctgcgctct gctgaagcca
gttaccttcg gaaaaagagt tggtagctct 2940tgatccggca aacaaaccac cgctggtagc
ggtggttttt ttgtttgcaa gcagcagatt 3000acgcgcagaa aaaaaggatc tcaagaagat
cctttgatct tttctacggg gtctgacgct 3060cagtggaacg aaaactcacg ttaagggatt
ttggtcatga gattatcaaa aaggatcttc 3120acctagatcc ttttaaatta aaaatgaagt
tttaaatcaa tctaaagtat atatgagtaa 3180acttggtctg acagttacca atgcttaatc
agtgaggcac ctatctcagc gatctgtcta 3240tttcgttcat ccatagttgc ctgactcccc
gtcgtgtaga taactacgat acgggagggc 3300ttaccatctg gccccagtgc tgcaatgata
ccgcgagacc cacgctcacc ggctccagat 3360ttatcagcaa taaaccagcc agccggaagg
gccgagcgca gaagtggtcc tgcaacttta 3420tccgcctcca tccagtctat taattgttgc
cgggaagcta gagtaagtag ttcgccagtt 3480aatagtttgc gcaacgttgt tgccattgct
acaggcatcg tggtgtcacg ctcgtcgttt 3540ggtatggctt cattcagctc cggttcccaa
cgatcaaggc gagttacatg atcccccatg 3600ttgtgcaaaa aagcggttag ctccttcggt
cctccgatcg ttgtcagaag taagttggcc 3660gcagtgttat cactcatggt tatggcagca
ctgcataatt ctcttactgt catgccatcc 3720gtaagatgct tttctgtgac tggtgagtac
tcaaccaagt cattctgaga atagtgtatg 3780cggcgaccga gttgctcttg cccggcgtca
atacgggata ataccgcgcc acatagcaga 3840actttaaaag tgctcatcat tggaaaacgt
tcttcggggc gaaaactctc aaggatctta 3900ccgctgttga gatccagttc gatgtaaccc
actcgtgcac ccaactgatc ttcagcatct 3960tttactttca ccagcgtttc tgggtgagca
aaaacaggaa ggcaaaatgc cgcaaaaaag 4020ggaataaggg cgacacggaa atgttgaata
ctcatactct tcctttttca atattattga 4080agcatttatc agggttattg tctcatgagc
ggatacatat ttgaatgtat ttagaaaaat 4140aaacaaatag gggttccgcg cacatttccc
cgaaaagtgc cacctgacgt ctaagaaacc 4200attattatca tgacattaac ctataaaaat
aggcgtatca cgaggccctt tcgtctcgcg 4260cgtttcggtg atgacggtga aaacctctga
cacatgcagc tcccggagac ggtcacagct 4320tgtctgtaag cggatgccgg gagcagacaa
gcccgtcagg gcgcgtcagc gggtgttggc 4380gggtgtcggg gctggcttaa ctatgcggca
tcagagcaga ttgtactgag agtgcaccat 4440atgcggtgtg aaataccgca cagatgcgta
aggagaaaat accgcatcag gcgccattcg 4500ccattcaggc tgcgcaactg ttgggaaggg
cgatcggtgc gggcctcttc gctattacgc 4560cagctggcga aagggggatg tgctgcaagg
cgattaagtt gggtaacgcc agggttttcc 4620cagtcacgac gttgtaaaac gacggcca
4648525706DNAArtificial
SequenceDescription of Artificial Sequence Synthetic polynucleotide
52tcacccagca gcagccatga ttataaacac cggagttccg gagcttagta gcgaagcatc
60taaaagcgga aagccttata tttacggcgg aacgggaaat ggaccagtct ttattgaacg
120taccgctgat gtaagaaaag cggtagagga tatcattgca agccgcacct ttgattacgg
180aatcgtgtct gcggcagaac aatatatggt agtagacagt cttattgcag ctgaagtaaa
240agctgagatg ttaagaaacg gtgcctactt catgaacgag gaagaggaga aaaagctaat
300agacctccta aaccttacga gtggaaaggc agatacagaa attatgggaa gaccagccga
360agaacttgcc aaacgagcag gatttatggt acctaatacc acgactgtgc tggtttccga
420acagaaatat atttccgaca ggaacccatt tgcaaaagag cttctttgtc ctgtattggc
480ttactacatc gaaaatgact ggatgcatgc ttgtgagaag tgcatgagtc ttttagtaaa
540cgaaagccat ggacataccc tggtgattca ttccagggat gaagaagtaa taggccagtt
600cgccttaaag aaaccagtag gcagagtact tgtaaatacc cccgctaccc tgggtagtat
660gggtgcaacc acaaacttgt ttccggctat gaccctagga agcattacag caggcgccgg
720aatcacagcg gacaatgttt ctcctatgaa tttcatatac attcgtaaag taggatatgg
780agttcgggga gtacaagaat ttcttggttc ggttgagaaa acctcaagcg gatacgcgaa
840agctcctgaa acaatcagga acaatgccct tgaaacaaac aaggtcaatg cctttgaaac
900aagcaaaggc atggaagatg ctagagatct tttgaaacag attttacaag ccttgtccaa
960agaactagat ggattttcta ttattgcaat gtggaattgg gaacggaaaa attattttat
1020taaagagtag ttcaacaaac gggattgact tttaaaaaag gattgattct aatgaagaaa
1080gcagacaagt aagcctccta aattcacttt agataaaaat ttaggaggca tatcaaatga
1140actttaataa aattgattta gacaattgga agagaaaaga gatatttaat cattatttga
1200accaacaaac gacttttagt ataaccacag aaattgatat tagtgtttta taccgaaaca
1260taaaacaaga aggatataaa ttttaccctg catttatttt cttagtgaca agggtgataa
1320actcaaatac agcttttaga actggttaca atagcgacgg agagttaggt tattgggata
1380agttagagcc actttataca atttttgatg gtgtatctaa aacattctct ggtatttgga
1440ctcctgtaaa gaatgacttc aaagagtttt atgatttata cctttctgat gtagagaaat
1500ataatggttc ggggaaattg tttcccaaaa cacctatacc tgaaaatgct ttttctcttt
1560ctattattcc atggacttca tttactgggt ttaacttaaa tatcaataat aatagtaatt
1620accttctacc cattattaca gcaggaaaat tcattaataa aggtaattca atatatttac
1680cgctatcttt acaggtacat cattctgttt gtgatggtta tcatgcagga ttgtttatga
1740actctattca ggaattgtca gataggccta atgactggct tttataatat gagataatgc
1800cgactgtact ttttacagtc ggttttctaa tgtcactagg gctcgccttt gggaagtttg
1860aagggctggc acgacaggtt tcccgactgg aaagcgggca gtgagcgcaa cgcaattaat
1920gtgagttagc tcactcatta ggcaccccag gctttacact ttatgcttcc ggctcgtatg
1980ttgtgtggaa ttgtgagcgg ataacaattt cacacaggaa acagctatga ccatgattac
2040gccaagcttg catgcctgca ggtcgactct agaggatcca acggagagta atcaaaatgg
2100accatggaaa atcaggagag attgaacgta aggcgtttat atttaacgtg cagaagtaca
2160acatgtatga cgggccggga atcagaacct tggtattctt taaaggctgt cctcttcggt
2220gtaaatggtg ctccaatccg gaaggtctgg aacgaaaatt tcaggtaatg tataagcaaa
2280gtttttgtac aaactgcggg gcgtgcgctg atgtgtgccc cgtaggaatc cacgtgatgt
2340cgaacggaac acatgaaatt gttcgggaaa aggaatgcat cggctgcatg aagtgtaaaa
2400acatctgccc aaagtcggcg cttaccattg caggagaggt aaagaccatt tcagaactgc
2460ttaagattgt ggaagaggac gctgcttttt atgatatgtc cggaggtggc gtgacccttg
2520ggggtggtga agtaaccgca caaccagaag cggccttaaa tcttttgatg gcttgtaaac
2580aggagggaat caacacagca attgaaactt gcggttattc gaatacagag aacattttaa
2640aaattgcgga atatgtggat cttttcctgt ttgatatcaa acatatggat ccagtacgtc
2700acaacgagtt aacaggtgtg aacaatgaac agattcttac taaccttgag gaactgcttc
2760accgccgcta taacgtaaaa gtccgtatgc caatgttaaa aggaattaat gacagcaggg
2820aagaaattga tgcggttatc aagtttttaa tgccataccg tactgataag aactttaagg
2880gaattgactt acttccatac cataagctcg gagttaataa atacaatcag cttgataagg
2940tatatccgat tgacggcgat cctagcttaa gtgctgagga tttagaccga attgaaggtt
3000ggatgaaaga atacgatttt ccggttaacg tggtaaaaca ctaagaaagg ggaaggacgc
3060catggaagaa ggcaagcttg gcgtaatcat ggtcatagct gtttcctgtg tgaaattgtt
3120atccgctcac aattccacac aacatacgag ccggaagcat aaagtgtaaa gcctggggtg
3180cctaatgagt gagctaactc acattaattg cgttgcgctc actgcccgct ttccagtcgg
3240gaaacctgtc gtgccagctg cattaatgaa tcggccaacg cgcggggaga ggcggtttgc
3300gtattgggcg ctcttccgct tcctcgctca ctgactcgct gcgctcggtc gttcggctgc
3360ggcgagcggt atcagctcac tcaaaggcgg taatacggtt atccacagaa tcaggggata
3420acgcaggaaa gaacatgtga gcaaaaggcc agcaaaaggc caggaaccgt aaaaaggccg
3480cgttgctggc gtttttccat aggctccgcc cccctgacga gcatcacaaa aatcgacgct
3540caagtcagag gtggcgaaac ccgacaggac tataaagata ccaggcgttt ccccctggaa
3600gctccctcgt gcgctctcct gttccgaccc tgccgcttac cggatacctg tccgcctttc
3660tcccttcggg aagcgtggcg ctttctcata gctcacgctg taggtatctc agttcggtgt
3720aggtcgttcg ctccaagctg ggctgtgtgc acgaaccccc cgttcagccc gaccgctgcg
3780ccttatccgg taactatcgt cttgagtcca acccggtaag acacgactta tcgccactgg
3840cagcagccac tggtaacagg attagcagag cgaggtatgt aggcggtgct acagagttct
3900tgaagtggtg gcctaactac ggctacacta gaaggacagt atttggtatc tgcgctctgc
3960tgaagccagt taccttcgga aaaagagttg gtagctcttg atccggcaaa caaaccaccg
4020ctggtagcgg tggttttttt gtttgcaagc agcagattac gcgcagaaaa aaaggatctc
4080aagaagatcc tttgatcttt tctacggggt ctgacgctca gtggaacgaa aactcacgtt
4140aagggatttt ggtcatgaga ttatcaaaaa ggatcttcac ctagatcctt ttaaattaaa
4200aatgaagttt taaatcaatc taaagtatat atgagtaaac ttggtctgac agttaccaat
4260gcttaatcag tgaggcacct atctcagcga tctgtctatt tcgttcatcc atagttgcct
4320gactccccgt cgtgtagata actacgatac gggagggctt accatctggc cccagtgctg
4380caatgatacc gcgagaccca cgctcaccgg ctccagattt atcagcaata aaccagccag
4440ccggaagggc cgagcgcaga agtggtcctg caactttatc cgcctccatc cagtctatta
4500attgttgccg ggaagctaga gtaagtagtt cgccagttaa tagtttgcgc aacgttgttg
4560ccattgctac aggcatcgtg gtgtcacgct cgtcgtttgg tatggcttca ttcagctccg
4620gttcccaacg atcaaggcga gttacatgat cccccatgtt gtgcaaaaaa gcggttagct
4680ccttcggtcc tccgatcgtt gtcagaagta agttggccgc agtgttatca ctcatggtta
4740tggcagcact gcataattct cttactgtca tgccatccgt aagatgcttt tctgtgactg
4800gtgagtactc aaccaagtca ttctgagaat agtgtatgcg gcgaccgagt tgctcttgcc
4860cggcgtcaat acgggataat accgcgccac atagcagaac tttaaaagtg ctcatcattg
4920gaaaacgttc ttcggggcga aaactctcaa ggatcttacc gctgttgaga tccagttcga
4980tgtaacccac tcgtgcaccc aactgatctt cagcatcttt tactttcacc agcgtttctg
5040ggtgagcaaa aacaggaagg caaaatgccg caaaaaaggg aataagggcg acacggaaat
5100gttgaatact catactcttc ctttttcaat attattgaag catttatcag ggttattgtc
5160tcatgagcgg atacatattt gaatgtattt agaaaaataa acaaataggg gttccgcgca
5220catttccccg aaaagtgcca cctgacgtct aagaaaccat tattatcatg acattaacct
5280ataaaaatag gcgtatcacg aggccctttc gtctcgcgcg tttcggtgat gacggtgaaa
5340acctctgaca catgcagctc ccggagacgg tcacagcttg tctgtaagcg gatgccggga
5400gcagacaagc ccgtcagggc gcgtcagcgg gtgttggcgg gtgtcggggc tggcttaact
5460atgcggcatc agagcagatt gtactgagag tgcaccatat gcggtgtgaa ataccgcaca
5520gatgcgtaag gagaaaatac cgcatcaggc gccattcgcc attcaggctg cgcaactgtt
5580gggaagggcg atcggtgcgg gcctcttcgc tattacgcca gctggcgaaa gggggatgtg
5640ctgcaaggcg attaagttgg gtaacgccag ggttttccca gtcacgacgt tgtaaaacga
5700cggcca
5706535780DNAArtificial SequenceDescription of Artificial Sequence
Synthetic polynucleotide 53tctatcagct gtccctcctg ttcagctact
gacggggtgg tgcgtaacgg caaaagcacc 60gccggacatc agcgctagcg gagtgtatac
tggcttacta tgttggcact gatgagggtg 120tcagtgaagt gcttcatgtg gcaggagaaa
aaaggctgca ccggtgcgtc agcagaatat 180gtgatacagg atatattccg cttcctcgct
cactgactcg ctacgctcgg tcgttcgact 240gcggcgagcg gaaatggctt acgaacgggg
cggagatttc ctggaagatg ccaggaagat 300acttaacagg gaagtgagag ggccgcggca
aagccgtttt tccataggct ccgcccccct 360gacaagcatc acgaaatctg acgctcaaat
cagtggtggc gaaacccgac aggactataa 420agataccagg cgtttccccc tggcggctcc
ctcgtgcgct ctcctgttcc tgcctttcgg 480tttaccggtg tcattccgct gttatggccg
cgtttgtctc attccacgcc tgacactcag 540ttccgggtag gcagttcgct ccaagctgga
ctgtatgcac gaaccccccg ttcagtccga 600ccgctgcgcc ttatccggta actatcgtct
tgagtccaac ccggaaagac atgcaaaagc 660accactggca gcagccactg gtaattgatt
tagaggagtt agtcttgaag tcatgcgccg 720gttaaggcta aactgaaagg acaagttttg
gtgactgcgc tcctccaagc cagttacctc 780ggttcaaaga gttggtagct cagagaacct
tcgaaaaacc gccctgcaag gcggtttttt 840cgttttcaga gcaagagatt acgcgcagac
caaaacgatc tcaagaagat catcttatta 900atcagataaa atatttctag atttcagtgc
aatttatctc ttcaaatgta gcacctgaag 960tcagccccat acgatataag ttgtaattct
catgtttgac agcttatcat cgataagctt 1020taatgcggta gtttatcaca gttaaattgc
taacgcagtc aggcacctat acatgcattt 1080acttataata cagtttttta gttttgctgg
ccgcatcttc tcaaatatgc ttcccagcct 1140gcttttctgt aacgttcacc ctctacctta
gcatcccttc cctttgcaaa tagtcctctt 1200ccaacaataa taatgtcaga tcctgtagag
accacatcat ccacggttct atactgttga 1260cccaatgcgt ctcccttgtc atctaaaccc
acaccgggtg tcataatcaa ccaatcgtaa 1320ccttcatctc ttccacccat gtctctttga
gcaataaagc cgataacaaa atctttgtcg 1380ctcttcgcaa tgtcaacagt acccttagta
tattctccag tagataggga gcccttgcat 1440gacaattctg ctaacatcaa aaggcctcta
ggttcctttg ttacttcttc tgccgcctgc 1500ttcaaaccgc taacaatacc tgggcccacc
acaccgtgtg cattcgtaat gtctgcccat 1560tctgctattc tgtatacacc cgcagagtac
tgcaatttga ctgtattacc aatgtcagca 1620aattttctgt cttcgaagag taaaaaattg
tacttggcgg ataatgcctt tagcggctta 1680actgtgccct ccatggaaaa atcagtcaag
atatccacat gtgtttttag taaacaaatt 1740ttgggaccta atgcttcaac taactccagt
aattccttgg tggtacgaac atccaatgaa 1800gcacacaagt ttgtttgctt ttcgtgcatg
atattaaata gcttggcagc aacaggacta 1860ggatgagtag cagcacgttc cttatatgta
gctttcgaca tgatttatct tcgtttcctg 1920caggtttttg ttctgtgcag ttgggttaag
aatactgggc aatttcatgt ttcttcaaca 1980ctacatatgc gtatatatac caatctaagt
ctgtgctcct tccttcgttc ttccttctgt 2040tcggagatta ccgaatcaaa aaaatttcaa
agaaaccgaa atcaaaaaaa agaataaaaa 2100aaaaatgatg aattgaattg aaaagctagc
ttatcgatgg gtccttttca tcacgtgcta 2160taaaaataat tataatttaa attttttaat
ataaatatat aaattaaaaa tagaaagtaa 2220aaaaagaaat taaagaaaaa atagtttttg
ttttccgaag atgtaaaaga ctctaggggg 2280atcgccaaca aatactacct tttatcttgc
tcttcctgct ctcaggtatt aatgccgaat 2340tgtttcatct tgtctgtgta gaagaccaca
cacgaaaatc ctgtgatttt acattttact 2400tatcgttaat cgaatgtata tctatttaat
ctgcttttct tgtctaataa atatatatgt 2460aaagtacgct ttttgttgaa attttttaaa
cctttgttta tttttttttc ttcattccgt 2520aactcttcta ccttctttat ttactttcta
aaatccaaat acaaaacata aaaataaata 2580aacacagagt aaattcccaa attattccat
cattaaaaga tacgaggcgc gtgtaagtta 2640caggcaagcg atctctaaga aaccattatt
atcatgacat taacctataa aaaaggcctc 2700tcgagctaga gtcgatcttc gccagcaggg
cgaggatcgt ggcatcaccg aaccgcgccg 2760tgcgcgggtc gtcggtgagc cagagtttca
gcaggccgcc caggcggccc aggtcgccat 2820tgatgcgggc cagctcgcgg acgtgctcat
agtccacgac gcccgtgatt ttgtagccct 2880ggccgacggc cagcaggtag gccgacaggc
tcatgccggc cgccgccgcc ttttcctcaa 2940tcgctcttcg ttcgtctgga aggcagtaca
ccttgatagg tgggctgccc ttcctggttg 3000gcttggtttc atcagccatc cgcttgccct
catctgttac gccggcggta gccggccagc 3060ctcgcagagc aggattcccg ttgagcaccg
ccaggtgcga ataagggaca gtgaagaagg 3120aacacccgct cgcgggtggg cctacttcac
ctatcctgcc cggctgacgc cgttggatac 3180accaaggaaa gtctacacga accctttggc
aaaatcctgt atatcgtgcg aaaaaggatg 3240gatataccga aaaaatcgct ataatgaccc
cgaagcaggg ttatgcagcg gaaaagcgct 3300gcttccctgc tgttttgtgg aatatctacc
gactggaaac aggcaaatgc aggaaattac 3360tgaactgagg ggacaggcga gagacgatgc
caaagagcta caccgacgag ctggccgagt 3420gggttgaatc ccgcgcggcc aagaagcgcc
ggcgtgatga ggctgcggtt gcgttcctgg 3480cggtgagggc ggatgtcgat atgcgtaagg
agaaaatacc gcatcaggcg catatttgaa 3540tgtatttaga aaaataaaca aaaagagttt
gtagaaacgc aaaaaggcca tccgtcagga 3600tggccttctg cttaatttga tgcctggcag
tttatggcgg gcgtcctgcc cgccaccctc 3660cgggccgttg cttcgcaacg ttcaaatccg
ctcccggcgg atttgtccta ctcaggagag 3720cgttcaccga caaacaacag ataaaacgaa
aggcccagtc tttcgactga gcctttcgtt 3780ttatttgatg cctggaaacc cagcgaacca
tttgaggtga taggtaagat tataccgagg 3840tatgaaaacg agaattggac ctttacagaa
ttactctatg aagcgccata tttaaaaagc 3900taccaagacg aagaggatga agaggatgag
gaggcagatt gccttgaata tattgacaat 3960actgataaga taatatatct tttatataga
agatatcgcc gtatgtaagg atttcagggg 4020gcaaggcata ggcagcgcgc ttatcaatat
atctatagaa tgggcaaagc ataaaaactt 4080gcatggacta atgcttgaaa cccaggacaa
taaccttata gcttgtaaat tctatcataa 4140ttgtggtttc aaaatcggct ccgtcgatac
tatgttatac gccaactttc aaaacaactt 4200tgaaaaagct gttttctggt atttaaggtt
ttagaatgca aggaacagtg aattggagtt 4260cgtcttgtta taattagctt cttggggtat
ctttaaatac tgtagaaaag aggaaggaaa 4320taataaatgg ctaaaatgag aatatcaccg
gaattgaaaa aactgatcga aaaataccgc 4380tgcgtaaaag atacggaagg aatgtctcct
gctaaggtat ataagctggt gggagaaaat 4440gaaaacctat atttaaaaat gacggacagc
cggtataaag ggaccaccta tgatgtggaa 4500cgggaaaagg acatgatgct atggctggaa
ggaaagctgc ctgttccaaa ggtcctgcac 4560tttgaacggc atgatggctg gagcaatctg
ctcatgagtg aggccgatgg cgtcctttgc 4620tcggaagagt atgaagatga acaaagccct
gaaaagatta tcgagctgta tgcggagtgc 4680atcaggctct ttcactccat cgacatatcg
gattgtccct atacgaatag cttagacagc 4740cgcttagccg aattggatta cttactgaat
aacgatctgg ccgatgtgga ttgcgaaaac 4800tgggaagaag acactccatt taaagatccg
cgcgagctgt atgatttttt aaagacggaa 4860aagcccgaag aggaacttgt cttttcccac
ggcgacctgg gagacagcaa catctttgtg 4920aaagatggca aagtaagtgg ctttattgat
cttgggagaa gcggcagggc ggacaagtgg 4980tatgacattg ccttctgcgt ccggtcgatc
agggaggata tcggggaaga acagtatgtc 5040gagctatttt ttgacttact ggggatcaag
cctgattggg agaaaataaa atattatatt 5100ttactggatg aattgtttta gtacctagat
ttagatgtct aaaaagcttt ttagacatct 5160aatcttttct gaagtacatc cgcaactgtc
catactctga tgttttatat cttttctaaa 5220agttcgctag ataggggtcc cgagcgccta
cgaggaattt gtatcgtaca aggatatcga 5280gggcgcagat gtagtagttg taacagcagg
tgcggctcaa aagccaggag agtctaggct 5340ggaccttgta aaaaagaata catctatatt
caagtccatg atacctgaac ttttaaaata 5400caatgataaa gctatatacc tgattgtaac
aaatcctgtt gatatattaa cgtatgttac 5460atacaaaata gcgaaacttc cgtgggggcg
tgtattcggt tcaggtactg tccttgacag 5520ttcccgattt aggtatcttt taagtaaaca
ttgcaatatt gatcctagaa atgtacatgg 5580aaggataatt ggagaacacg gcgatacaga
atttgcggcg tggagcataa caaatatttc 5640aggaatatca tttaatgagt actgcaattt
gtgcggacga gtttgtaata caaatttcag 5700aaaggaagtg gaagatgaag ttgtcaatgc
ggcttacaaa attattgata aaaagggtgc 5760cacgtattac gctgtggctg
5780546539DNAArtificial
SequenceDescription of Artificial Sequence Synthetic polynucleotide
54gcacatattg atagagaaca tggctggata attactataa ctccaagaaa acgaatagta
60aaagaatgga ggcgaattaa tgagtaatgt cncaatacaa ttaatagaaa tttgtcggca
120atatgtaaat aataacttaa acataaatga ntntatcgaa gatttccaag tgctttatga
180acaaaagcaa gatttattaa cagatgaaga aatgagtttg tttgatgata tttatatggc
240ttgtgaatat tatgaacagg atgaaaatat aagaaatgaa tatcacttgt atattggaga
300aaatgaatta agacaaaaag tgcaaaaact tgtaaaaaag ttagcagcat aataaaccgc
360taaggcatga tagctaaagc ggtattttta tgcaattaaa aggaaaaatg atatctgata
420aaccgcggaa aagtatttta gaaaacaact ataaagataa tatttcaaag caagaaggat
480aaaataagat taaactatta gacactttta ttagaaaatg ttataatatt attaagagaa
540aatttatatt atttaggagg taattttatg agtaaagtgg ccataatagg ttcaggattt
600gtaggtgcta catctgcatt tacattggct ctaagtggga ctgtgacaga cattgtttta
660gtagatttaa acaaggacaa ggcgataggc gatgcactgg atattagcca aaacccagcg
720aaccatttga ggtgataggt aagattatac cgaggtatga aaacgagaat tggaccttta
780cagaattact ctatgaagcg ccatatttaa aaagctacca agacgaagag gatgaagagg
840atgaggaggc agattgcctt gaatatattg acaatactga taagataata tatcttttat
900atagaagata tcgccgtatg taaggatttc agggggcaag gcataggcag cgcgcttatc
960aatatatcta tagaatgggc aaagcataaa aacttgcatg gactaatgct tgaaacccag
1020gacaataacc ttatagcttg taaattctat cataattgtg gtttcaaaat cggctccgtc
1080gatactatgt tatacgccaa ctttcaaaac aactttgaaa aagctgtttt ctggtattta
1140aggttttaga atgcaaggaa cagtgaattg gagttcgtct tgttataatt agcttcttgg
1200ggtatcttta aatactgtag aaaagaggaa ggaaataata aatggctaaa atgagaatat
1260caccggaatt gaaaaaactg atcgaaaaat accgctgcgt aaaagatacg gaaggaatgt
1320ctcctgctaa ggtatataag ctggtgggag aaaatgaaaa cctatattta aaaatgacgg
1380acagccggta taaagggacc acctatgatg tggaacggga aaaggacatg atgctatggc
1440tggaaggaaa gctgcctgtt ccaaaggtcc tgcactttga acggcatgat ggctggagca
1500atctgctcat gagtgaggcc gatggcgtcc tttgctcgga agagtatgaa gatgaacaaa
1560gccctgaaaa gattatcgag ctgtatgcgg agtgcatcag gctctttcac tccatcgaca
1620tatcggattg tccctatacg aatagcttag acagccgctt agccgaattg gattacttac
1680tgaataacga tctggccgat gtggattgcg aaaactggga agaagacact ccatttaaag
1740atccgcgcga gctgtatgat tttttaaaga cggaaaagcc cgaagaggaa cttgtctttt
1800cccacggcga cctgggagac agcaacatct ttgtgaaaga tggcaaagta agtggcttta
1860ttgatcttgg gagaagcggc agggcggaca agtggtatga cattgccttc tgcgtccggt
1920cgatcaggga ggatatcggg gaagaacagt atgtcgagct attttttgac ttactgggga
1980tcaagcctga ttgggagaaa ataaaatatt atattttact ggatgaattg ttttagtacc
2040tagatttaga tgtctaaaaa gctttttaga catctaatct tttctgaagt acatccgcaa
2100ctgtccatac tctgatgttt tatatctttt ctaaaagttc gctagatagg ggtcccgagc
2160gcctacgagg aatttgtatc gacgtattac gctgtggctg tagcagtaag aagaatagtt
2220gagtgtatca taagggatga aaattcaatt cttacagttt catctccatt aaatggtcaa
2280tacggtgtaa gagatgtatc tttaagcttg ccatcaattg tgggcaaaaa tggtgttgca
2340agggttctgg atttgccttt ggctgatgac gaagttgaga agtttaaaca ttcggcaagc
2400gttatggctg atgttataaa acagttggac atataaaata aatcattgta taaggtttat
2460aagacggctt ttatcatgta tggtaaaggc cgctttttta tgaatataaa aatacaaagt
2520ggaaaatcta aataaaggtg atgcaatatg cagaatatga gtcctcaaga aattatatcg
2580agtgccttta tgaaggcaaa aaaatctgag aatattatac atgctaaggc tatagattat
2640gggaaaaata tatcagataa ccagatgcaa gcgatattga agcaaataga gataacggct
2700ttaaaccatg tggacaaaat agtgacagct gagaagacga tgcatctatc agctgtccct
2760cctgttcagc tactgacggg gtggtgcgta acggcaaaag caccgccgga catcagcgct
2820agcggagtgt atactggctt actatgttgg cactgatgag ggtgtcagtg aagtgcttca
2880tgtggcagga gaaaaaaggc tgcaccggtg cgtcagcaga atatgtgata caggatatat
2940tccgcttcct cgctcactga ctcgctacgc tcggtcgttc gactgcggcg agcggaaatg
3000gcttacgaac ggggcggaga tttcctggaa gatgccagga agatacttaa cagggaagtg
3060agagggccgc ggcaaagccg tttttccata ggctccgccc ccctgacaag catcacgaaa
3120tctgacgctc aaatcagtgg tggcgaaacc cgacaggact ataaagatac caggcgtttc
3180cccctggcgg ctccctcgtg cgctctcctg ttcctgcctt tcggtttacc ggtgtcattc
3240cgctgttatg gccgcgtttg tctcattcca cgcctgacac tcagttccgg gtaggcagtt
3300cgctccaagc tggactgtat gcacgaaccc cccgttcagt ccgaccgctg cgccttatcc
3360ggtaactatc gtcttgagtc caacccggaa agacatgcaa aagcaccact ggcagcagcc
3420actggtaatt gatttagagg agttagtctt gaagtcatgc gccggttaag gctaaactga
3480aaggacaagt tttggtgact gcgctcctcc aagccagtta cctcggttca aagagttggt
3540agctcagaga accttcgaaa aaccgccctg caaggcggtt ttttcgtttt cagagcaaga
3600gattacgcgc agaccaaaac gatctcaaga agatcatctt attaatcaga taaaatattt
3660ctagatttca gtgcaattta tctcttcaaa tgtagcacct gaagtcagcc ccatacgata
3720taagttgtaa ttctcatgtt tgacagctta tcatcgataa gctttaatgc ggtagtttat
3780cacagttaaa ttgctaacgc agtcaggcac ctatacatgc atttacttat aatacagttt
3840tttagttttg ctggccgcat cttctcaaat atgcttccca gcctgctttt ctgtaacgtt
3900caccctctac cttagcatcc cttccctttg caaatagtcc tcttccaaca ataataatgt
3960cagatcctgt agagaccaca tcatccacgg ttctatactg ttgacccaat gcgtctccct
4020tgtcatctaa acccacaccg ggtgtcataa tcaaccaatc gtaaccttca tctcttccac
4080ccatgtctct ttgagcaata aagccgataa caaaatcttt gtcgctcttc gcaatgtcaa
4140cagtaccctt agtatattct ccagtagata gggagccctt gcatgacaat tctgctaaca
4200tcaaaaggcc tctaggttcc tttgttactt cttctgccgc ctgcttcaaa ccgctaacaa
4260tacctgggcc caccacaccg tgtgcattcg taatgtctgc ccattctgct attctgtata
4320cacccgcaga gtactgcaat ttgactgtat taccaatgtc agcaaatttt ctgtcttcga
4380agagtaaaaa attgtacttg gcggataatg cctttagcgg cttaactgtg ccctccatgg
4440aaaaatcagt caagatatcc acatgtgttt ttagtaaaca aattttggga cctaatgctt
4500caactaactc cagtaattcc ttggtggtac gaacatccaa tgaagcacac aagtttgttt
4560gcttttcgtg catgatatta aatagcttgg cagcaacagg actaggatga gtagcagcac
4620gttccttata tgtagctttc gacatgattt atcttcgttt cctgcaggtt tttgttctgt
4680gcagttgggt taagaatact gggcaatttc atgtttcttc aacactacat atgcgtatat
4740ataccaatct aagtctgtgc tccttccttc gttcttcctt ctgttcggag attaccgaat
4800caaaaaaatt tcaaagaaac cgaaatcaaa aaaaagaata aaaaaaaaat gatgaattga
4860attgaaaagc tagcttatcg atgggtcctt ttcatcacgt gctataaaaa taattataat
4920ttaaattttt taatataaat atataaatta aaaatagaaa gtaaaaaaag aaattaaaga
4980aaaaatagtt tttgttttcc gaagatgtaa aagactctag ggggatcgcc aacaaatact
5040accttttatc ttgctcttcc tgctctcagg tattaatgcc gaattgtttc atcttgtctg
5100tgtagaagac cacacacgaa aatcctgtga ttttacattt tacttatcgt taatcgaatg
5160tatatctatt taatctgctt ttcttgtcta ataaatatat atgtaaagta cgctttttgt
5220tgaaattttt taaacctttg tttatttttt tttcttcatt ccgtaactct tctaccttct
5280ttatttactt tctaaaatcc aaatacaaaa cataaaaata aataaacaca gagtaaattc
5340ccaaattatt ccatcattaa aagatacgag gcgcgtgtaa gttacaggca agcgatctct
5400aagaaaccat tattatcatg acattaacct ataaaaaagg cctctcgagc tagagtcgat
5460cttcgccagc agggcgagga tcgtggcatc accgaaccgc gccgtgcgcg ggtcgtcggt
5520gagccagagt ttcagcaggc cgcccaggcg gcccaggtcg ccattgatgc gggccagctc
5580gcggacgtgc tcatagtcca cgacgcccgt gattttgtag ccctggccga cggccagcag
5640gtaggccgac aggctcatgc cggccgccgc cgccttttcc tcaatcgctc ttcgttcgtc
5700tggaaggcag tacaccttga taggtgggct gcccttcctg gttggcttgg tttcatcagc
5760catccgcttg ccctcatctg ttacgccggc ggtagccggc cagcctcgca gagcaggatt
5820cccgttgagc accgccaggt gcgaataagg gacagtgaag aaggaacacc cgctcgcggg
5880tgggcctact tcacctatcc tgcccggctg acgccgttgg atacaccaag gaaagtctac
5940acgaaccctt tggcaaaatc ctgtatatcg tgcgaaaaag gatggatata ccgaaaaaat
6000cgctataatg accccgaagc agggttatgc agcggaaaag cgctgcttcc ctgctgtttt
6060gtggaatatc taccgactgg aaacaggcaa atgcaggaaa ttactgaact gaggggacag
6120gcgagagacg atgccaaaga gctacaccga cgagctggcc gagtgggttg aatcccgcgc
6180ggccaagaag cgccggcgtg atgaggctgc ggttgcgttc ctggcggtga gggcggatgt
6240cgatatgcgt aaggagaaaa taccgcatca ggcgcatatt tgaatgtatt tagaaaaata
6300aacaaaaaga gtttgtagaa acgcaaaaag gccatccgtc aggatggcct tctgcttaat
6360ttgatgcctg gcagtttatg gcgggcgtcc tgcccgccac cctccgggcc gttgcttcgc
6420aacgttcaaa tccgctcccg gcggatttgt cctactcagg agagcgttca ccgacaaaca
6480acagataaaa cgaaaggccc agtctttcga ctgagccttt cgttttattt gatgcctgg
6539556086DNAArtificial SequenceDescription of Artificial Sequence
Synthetic polynucleotide 55tctatcagct gtccctcctg ttcagctact
gacggggtgg tgcgtaacgg caaaagcacc 60gccggacatc agcgctagcg gagtgtatac
tggcttacta tgttggcact gatgagggtg 120tcagtgaagt gcttcatgtg gcaggagaaa
aaaggctgca ccggtgcgtc agcagaatat 180gtgatacagg atatattccg cttcctcgct
cactgactcg ctacgctcgg tcgttcgact 240gcggcgagcg gaaatggctt acgaacgggg
cggagatttc ctggaagatg ccaggaagat 300acttaacagg gaagtgagag ggccgcggca
aagccgtttt tccataggct ccgcccccct 360gacaagcatc acgaaatctg acgctcaaat
cagtggtggc gaaacccgac aggactataa 420agataccagg cgtttccccc tggcggctcc
ctcgtgcgct ctcctgttcc tgcctttcgg 480tttaccggtg tcattccgct gttatggccg
cgtttgtctc attccacgcc tgacactcag 540ttccgggtag gcagttcgct ccaagctgga
ctgtatgcac gaaccccccg ttcagtccga 600ccgctgcgcc ttatccggta actatcgtct
tgagtccaac ccggaaagac atgcaaaagc 660accactggca gcagccactg gtaattgatt
tagaggagtt agtcttgaag tcatgcgccg 720gttaaggcta aactgaaagg acaagttttg
gtgactgcgc tcctccaagc cagttacctc 780ggttcaaaga gttggtagct cagagaacct
tcgaaaaacc gccctgcaag gcggtttttt 840cgttttcaga gcaagagatt acgcgcagac
caaaacgatc tcaagaagat catcttatta 900atcagataaa atatttctag atttcagtgc
aatttatctc ttcaaatgta gcacctgaag 960tcagccccat acgatataag ttgtaattct
catgtttgac agcttatcat cgataagctt 1020taatgcggta gtttatcaca gttaaattgc
taacgcagtc aggcacctat acatgcattt 1080acttataata cagtttttta gttttgctgg
ccgcatcttc tcaaatatgc ttcccagcct 1140gcttttctgt aacgttcacc ctctacctta
gcatcccttc cctttgcaaa tagtcctctt 1200ccaacaataa taatgtcaga tcctgtagag
accacatcat ccacggttct atactgttga 1260cccaatgcgt ctcccttgtc atctaaaccc
acaccgggtg tcataatcaa ccaatcgtaa 1320ccttcatctc ttccacccat gtctctttga
gcaataaagc cgataacaaa atctttgtcg 1380ctcttcgcaa tgtcaacagt acccttagta
tattctccag tagataggga gcccttgcat 1440gacaattctg ctaacatcaa aaggcctcta
ggttcctttg ttacttcttc tgccgcctgc 1500ttcaaaccgc taacaatacc tgggcccacc
acaccgtgtg cattcgtaat gtctgcccat 1560tctgctattc tgtatacacc cgcagagtac
tgcaatttga ctgtattacc aatgtcagca 1620aattttctgt cttcgaagag taaaaaattg
tacttggcgg ataatgcctt tagcggctta 1680actgtgccct ccatggaaaa atcagtcaag
atatccacat gtgtttttag taaacaaatt 1740ttgggaccta atgcttcaac taactccagt
aattccttgg tggtacgaac atccaatgaa 1800gcacacaagt ttgtttgctt ttcgtgcatg
atattaaata gcttggcagc aacaggacta 1860ggatgagtag cagcacgttc cttatatgta
gctttcgaca tgatttatct tcgtttcctg 1920caggtttttg ttctgtgcag ttgggttaag
aatactgggc aatttcatgt ttcttcaaca 1980ctacatatgc gtatatatac caatctaagt
ctgtgctcct tccttcgttc ttccttctgt 2040tcggagatta ccgaatcaaa aaaatttcaa
agaaaccgaa atcaaaaaaa agaataaaaa 2100aaaaatgatg aattgaattg aaaagctagc
ttatcgatgg gtccttttca tcacgtgcta 2160taaaaataat tataatttaa attttttaat
ataaatatat aaattaaaaa tagaaagtaa 2220aaaaagaaat taaagaaaaa atagtttttg
ttttccgaag atgtaaaaga ctctaggggg 2280atcgccaaca aatactacct tttatcttgc
tcttcctgct ctcaggtatt aatgccgaat 2340tgtttcatct tgtctgtgta gaagaccaca
cacgaaaatc ctgtgatttt acattttact 2400tatcgttaat cgaatgtata tctatttaat
ctgcttttct tgtctaataa atatatatgt 2460aaagtacgct ttttgttgaa attttttaaa
cctttgttta tttttttttc ttcattccgt 2520aactcttcta ccttctttat ttactttcta
aaatccaaat acaaaacata aaaataaata 2580aacacagagt aaattcccaa attattccat
cattaaaaga tacgaggcgc gtgtaagtta 2640caggcaagcg atctctaaga aaccattatt
atcatgacat taacctataa aaaaggcctc 2700tcgagctaga gtcgatcttc gccagcaggg
cgaggatcgt ggcatcaccg aaccgcgccg 2760tgcgcgggtc gtcggtgagc cagagtttca
gcaggccgcc caggcggccc aggtcgccat 2820tgatgcgggc cagctcgcgg acgtgctcat
agtccacgac gcccgtgatt ttgtagccct 2880ggccgacggc cagcaggtag gccgacaggc
tcatgccggc cgccgccgcc ttttcctcaa 2940tcgctcttcg ttcgtctgga aggcagtaca
ccttgatagg tgggctgccc ttcctggttg 3000gcttggtttc atcagccatc cgcttgccct
catctgttac gccggcggta gccggccagc 3060ctcgcagagc aggattcccg ttgagcaccg
ccaggtgcga ataagggaca gtgaagaagg 3120aacacccgct cgcgggtggg cctacttcac
ctatcctgcc cggctgacgc cgttggatac 3180accaaggaaa gtctacacga accctttggc
aaaatcctgt atatcgtgcg aaaaaggatg 3240gatataccga aaaaatcgct ataatgaccc
cgaagcaggg ttatgcagcg gaaaagcgct 3300gcttccctgc tgttttgtgg aatatctacc
gactggaaac aggcaaatgc aggaaattac 3360tgaactgagg ggacaggcga gagacgatgc
caaagagcta caccgacgag ctggccgagt 3420gggttgaatc ccgcgcggcc aagaagcgcc
ggcgtgatga ggctgcggtt gcgttcctgg 3480cggtgagggc ggatgtcgat atgcgtaagg
agaaaatacc gcatcaggcg catatttgaa 3540tgtatttaga aaaataaaca aaaagagttt
gtagaaacgc aaaaaggcca tccgtcagga 3600tggccttctg cttaatttga tgcctggcag
tttatggcgg gcgtcctgcc cgccaccctc 3660cgggccgttg cttcgcaacg ttcaaatccg
ctcccggcgg atttgtccta ctcaggagag 3720cgttcaccga caaacaacag ataaaacgaa
aggcccagtc tttcgactga gcctttcgtt 3780ttatttgatg cctggaaacc cagcgaacca
tttgaggtga taggtaagat tataccgagg 3840tatgaaaacg agaattggac ctttacagaa
ttactctatg aagcgccata tttaaaaagc 3900taccaagacg aagaggatga agaggatgag
gaggcagatt gccttgaata tattgacaat 3960actgataaga taatatatct tttatataga
agatatcgcc gtatgtaagg atttcagggg 4020gcaaggcata ggcagcgcgc ttatcaatat
atctatagaa tgggcaaagc ataaaaactt 4080gcatggacta atgcttgaaa cccaggacaa
taaccttata gcttgtaaat tctatcataa 4140ttgtggtttc aaaatcggct ccgtcgatac
tatgttatac gccaactttc aaaacaactt 4200tgaaaaagct gttttctggt atttaaggtt
ttagaatgca aggaacagtg aattggagtt 4260cgtcttgtta taattagctt cttggggtat
ctttaaatac tgtagaaaag aggaaggaaa 4320taataaatgg ctaaaatgag aatatcaccg
gaattgaaaa aactgatcga aaaataccgc 4380tgcgtaaaag atacggaagg aatgtctcct
gctaaggtat ataagctggt gggagaaaat 4440gaaaacctat atttaaaaat gacggacagc
cggtataaag ggaccaccta tgatgtggaa 4500cgggaaaagg acatgatgct atggctggaa
ggaaagctgc ctgttccaaa ggtcctgcac 4560tttgaacggc atgatggctg gagcaatctg
ctcatgagtg aggccgatgg cgtcctttgc 4620tcggaagagt atgaagatga acaaagccct
gaaaagatta tcgagctgta tgcggagtgc 4680atcaggctct ttcactccat cgacatatcg
gattgtccct atacgaatag cttagacagc 4740cgcttagccg aattggatta cttactgaat
aacgatctgg ccgatgtgga ttgcgaaaac 4800tgggaagaag acactccatt taaagatccg
cgcgagctgt atgatttttt aaagacggaa 4860aagcccgaag aggaacttgt cttttcccac
ggcgacctgg gagacagcaa catctttgtg 4920aaagatggca aagtaagtgg ctttattgat
cttgggagaa gcggcagggc ggacaagtgg 4980tatgacattg ccttctgcgt ccggtcgatc
agggaggata tcggggaaga acagtatgtc 5040gagctatttt ttgacttact ggggatcaag
cctgattggg agaaaataaa atattatatt 5100ttactggatg aattgtttta gtacctagat
ttagatgtct aaaaagcttt ttagacatct 5160aatcttttct gaagtacatc cgcaactgtc
catactctga tgttttatat cttttctaaa 5220agttcgctag ataggggtcc cgagcgccta
cgaggaattt gtatcggctg tacaatctgc 5280taatactgca aagaatttgt tgggctttga
accaaaagtt gctatgctat cattttccac 5340aaaaggtagt gcatcacatg aattagtaga
taaagtaaga aaagcgacag aaatagcaaa 5400agaattgatg ccagatgttg ctatcgacgg
tgaattgcaa ttggatgctg ctcttgtcaa 5460agaagttgca gagctaaaag cgccaggaag
caaagttgcg ggatgtgcaa atgtgcttat 5520attccctgat ttacaagctg gtaatatagg
atataagctt gtacagagat tagctagcaa 5580atgcaattgg acctataaca caggaatggg
tgcaccggtt aatgatttat caagaggatg 5640cagctataga gatattgttg acgtaatagc
acacagctgt acaggctcat aaatgtaaag 5700tatggaggat gaaattatga aaatactggt
atatgcgaag tctcactaaa tatcactgat 5760gatcatgatg aaatgtgctg gcaaaggcct
tgctgagaga atcggcataa atgatccctg 5820ttgacacata tgctaacgga gaaaaatcaa
gataaaaaaa gacatgaaag atcacaaaga 5880cgcaataaaa ttgttttaga tgctttggta
agcagtgact acggcgttat aaaggatatg 5940tctgagatag atgctgtagg acatagagtt
gttcacggag gagaatcttt tacatcatca 6000gttctcataa atgatgatgt gttaaaagcg
ataacagatt gcatagaatt agctccactg 6060cacaatcctg ccaatataga aggaat
6086567284DNAArtificial
SequenceDescription of Artificial Sequence Synthetic polynucleotide
56taacagggcc atgaccttct gcaaagcact tttttaaaat ttcaagagaa tattgtggca
60tatatttttc taatgcagaa tagtttttat catttataac ctttctaaga tatgtagcac
120ttgcaaaaga ttctactgtt tccagagaat tatatcctgg accaaatctt tttaatgtca
180aaggcttaat attgctgttt aacttgatta agctttttat gtattcaatt ccaagtatgt
240tatttggatt tccgataata ttgttaatgg tccttccgaa atattttgat aatgctgctt
300cacgtgcctt tgcatatgtg atgccgcttt ttaaataggt tttcaaagag ctttataatc
360ttctggttcg tctaaaagaa attttgatat ttggcataat tcgtcaatag agccatgttc
420gcttccaaac gatatgtaat caatgacatt taatgaatcc agcagcttta ctgctccata
480tgcgaaattt tctgcggtag aaactgcata tatagtgggc aattctatta ctaagtcaat
540acctgatagc aatgcagctt ctgtccttga ccacttgtca ataatagacg gtattcctcg
600ctgaacaaag ttaccactca taattgcaat gacaaaatct gcacctgttg tttcaattga
660tttttttata tggtatatgt gtccgttatg gagagggtta tattctacaa ttactcccaa
720tatactcatt attaaaaacc tttctaaaaa attattaatt gtacttatta ttttataaaa
780aatatgttaa aatgtaaaat gtgtatacaa tatatttctt ctttttagta agaggaatgt
840ataaaaataa atattttaaa ggaagggacg atcttatgag cattattcaa aacatcattg
900aaaaagctaa aagtgataaa aagaaaattg ttctgccgga aggtgcagaa acccagcgaa
960ccatttgagg tgataggtaa gattataccg aggtatgaaa acgagaattg gacctttaca
1020gaattactct atgaagcgcc atatttaaaa agctaccaag acgaagagga tgaagaggat
1080gaggaggcag attgccttga atatattgac aatactgata agataatata tcttttatat
1140agaagatatc gccgtatgta aggatttcag ggggcaaggc ataggcagcg cgcttatcaa
1200tatatctata gaatgggcaa agcataaaaa cttgcatgga ctaatgcttg aaacccagga
1260caataacctt atagcttgta aattctatca taattgtggt ttcaaaatcg gctccgtcga
1320tactatgtta tacgccaact ttcaaaacaa ctttgaaaaa gctgttttct ggtatttaag
1380gttttagaat gcaaggaaca gtgaattgga gttcgtcttg ttataattag cttcttgggg
1440tatctttaaa tactgtagaa aagaggaagg aaataataaa tggctaaaat gagaatatca
1500ccggaattga aaaaactgat cgaaaaatac cgctgcgtaa aagatacgga aggaatgtct
1560cctgctaagg tatataagct ggtgggagaa aatgaaaacc tatatttaaa aatgacggac
1620agccggtata aagggaccac ctatgatgtg gaacgggaaa aggacatgat gctatggctg
1680gaaggaaagc tgcctgttcc aaaggtcctg cactttgaac ggcatgatgg ctggagcaat
1740ctgctcatga gtgaggccga tggcgtcctt tgctcggaag agtatgaaga tgaacaaagc
1800cctgaaaaga ttatcgagct gtatgcggag tgcatcaggc tctttcactc catcgacata
1860tcggattgtc cctatacgaa tagcttagac agccgcttag ccgaattgga ttacttactg
1920aataacgatc tggccgatgt ggattgcgaa aactgggaag aagacactcc atttaaagat
1980ccgcgcgagc tgtatgattt tttaaagacg gaaaagcccg aagaggaact tgtcttttcc
2040cacggcgacc tgggagacag caacatcttt gtgaaagatg gcaaagtaag tggctttatt
2100gatcttggga gaagcggcag ggcggacaag tggtatgaca ttgccttctg cgtccggtcg
2160atcagggagg atatcgggga agaacagtat gtcgagctat tttttgactt actggggatc
2220aagcctgatt gggagaaaat aaaatattat attttactgg atgaattgtt ttagtaccta
2280gatttagatg tctaaaaagc tttttagaca tctaatcttt tctgaagtac atccgcaact
2340gtccatactc tgatgtttta tatcttttct aaaagttcgc tagatagggg tcccgagcgc
2400ctacgaggaa tttgtatcga aagttagcgt gatggttgtg cccactaatg aagaatacat
2460gattgctaaa gatactgaaa agattgtaaa gagtataaaa tagcattctt gacaaatgtt
2520taccccatta gtataattaa ttttggcaat tatattgggg tgagaaaatg aaaattgatt
2580tatcaaaatt aagggacata ggggccgcag catcgaagtc aactacgtag aaaatctgag
2640tgttcttgag gcaaatagca atagatacgt agttataaag cctattagcg taactggaag
2700cataacatac gatagtgaag gaatagtttt aaaacttttg gcacgcgggg ctattaaagt
2760aacatgcgat aggtgccttg acgaatttga gtatgagttc gtaataccta ttgacgaaat
2820agtaaacgag tctgatgatg aattttcagg tgaagtggaa gatgaaaagc ttgatttgac
2880gaaaattgtg attgaaaatg tggaactttc tcttccgatg aagttcattt gctcgaatga
2940ttgcaagggt ctatgttcta cttgcggtaa aaatcttaat catgaaaaat gcgattgcca
3000aataaaagaa attgatccac gcctttcagt tttgaataaa ttactgcaga agatgtagga
3060ggtgtataat atgccagttc caaagcgtag aacatctaag gcaagaagag ataaaagaag
3120gcatagccat agtttagctg tacctgctta tgttttgtgc ccacaatgtc atgaaccaaa
3180attgccccac agagtttgtt taagctgtgg ttattacgac ggtaaagagg tattgaaagt
3240ggaagaaaag taatggagtt ttctctatta cttttctttt ttatttcttg acttttatgt
3300atggcgtaat ttataattat gagtaagtca taaaaacaac ctatatttgg agctgataat
3360gtggccacga agcttagtaa aagagataga ttaaaaaagt taaaaattga aatcgaaaaa
3420tatccatttt acactgatga tgagttagct gatttgtttt cggttagcgt tcagacgata
3480aggctggatt ctatcagctg tccctcctgt tcagctactg acggggtggt gcgtaacggc
3540aaaagcaccg ccggacatca gcgctagcgg agtgtatact ggcttactat gttggcactg
3600atgagggtgt cagtgaagtg cttcatgtgg caggagaaaa aaggctgcac cggtgcgtca
3660gcagaatatg tgatacagga tatattccgc ttcctcgctc actgactcgc tacgctcggt
3720cgttcgactg cggcgagcgg aaatggctta cgaacggggc ggagatttcc tggaagatgc
3780caggaagata cttaacaggg aagtgagagg gccgcggcaa agccgttttt ccataggctc
3840cgcccccctg acaagcatca cgaaatctga cgctcaaatc agtggtggcg aaacccgaca
3900ggactataaa gataccaggc gtttccccct ggcggctccc tcgtgcgctc tcctgttcct
3960gcctttcggt ttaccggtgt cattccgctg ttatggccgc gtttgtctca ttccacgcct
4020gacactcagt tccgggtagg cagttcgctc caagctggac tgtatgcacg aaccccccgt
4080tcagtccgac cgctgcgcct tatccggtaa ctatcgtctt gagtccaacc cggaaagaca
4140tgcaaaagca ccactggcag cagccactgg taattgattt agaggagtta gtcttgaagt
4200catgcgccgg ttaaggctaa actgaaagga caagttttgg tgactgcgct cctccaagcc
4260agttacctcg gttcaaagag ttggtagctc agagaacctt cgaaaaaccg ccctgcaagg
4320cggttttttc gttttcagag caagagatta cgcgcagacc aaaacgatct caagaagatc
4380atcttattaa tcagataaaa tatttctaga tttcagtgca atttatctct tcaaatgtag
4440cacctgaagt cagccccata cgatataagt tgtaattctc atgtttgaca gcttatcatc
4500gataagcttt aatgcggtag tttatcacag ttaaattgct aacgcagtca ggcacctata
4560catgcattta cttataatac agttttttag ttttgctggc cgcatcttct caaatatgct
4620tcccagcctg cttttctgta acgttcaccc tctaccttag catcccttcc ctttgcaaat
4680agtcctcttc caacaataat aatgtcagat cctgtagaga ccacatcatc cacggttcta
4740tactgttgac ccaatgcgtc tcccttgtca tctaaaccca caccgggtgt cataatcaac
4800caatcgtaac cttcatctct tccacccatg tctctttgag caataaagcc gataacaaaa
4860tctttgtcgc tcttcgcaat gtcaacagta cccttagtat attctccagt agatagggag
4920cccttgcatg acaattctgc taacatcaaa aggcctctag gttcctttgt tacttcttct
4980gccgcctgct tcaaaccgct aacaatacct gggcccacca caccgtgtgc attcgtaatg
5040tctgcccatt ctgctattct gtatacaccc gcagagtact gcaatttgac tgtattacca
5100atgtcagcaa attttctgtc ttcgaagagt aaaaaattgt acttggcgga taatgccttt
5160agcggcttaa ctgtgccctc catggaaaaa tcagtcaaga tatccacatg tgtttttagt
5220aaacaaattt tgggacctaa tgcttcaact aactccagta attccttggt ggtacgaaca
5280tccaatgaag cacacaagtt tgtttgcttt tcgtgcatga tattaaatag cttggcagca
5340acaggactag gatgagtagc agcacgttcc ttatatgtag ctttcgacat gatttatctt
5400cgtttcctgc aggtttttgt tctgtgcagt tgggttaaga atactgggca atttcatgtt
5460tcttcaacac tacatatgcg tatatatacc aatctaagtc tgtgctcctt ccttcgttct
5520tccttctgtt cggagattac cgaatcaaaa aaatttcaaa gaaaccgaaa tcaaaaaaaa
5580gaataaaaaa aaaatgatga attgaattga aaagctagct tatcgatggg tccttttcat
5640cacgtgctat aaaaataatt ataatttaaa ttttttaata taaatatata aattaaaaat
5700agaaagtaaa aaaagaaatt aaagaaaaaa tagtttttgt tttccgaaga tgtaaaagac
5760tctaggggga tcgccaacaa atactacctt ttatcttgct cttcctgctc tcaggtatta
5820atgccgaatt gtttcatctt gtctgtgtag aagaccacac acgaaaatcc tgtgatttta
5880cattttactt atcgttaatc gaatgtatat ctatttaatc tgcttttctt gtctaataaa
5940tatatatgta aagtacgctt tttgttgaaa ttttttaaac ctttgtttat ttttttttct
6000tcattccgta actcttctac cttctttatt tactttctaa aatccaaata caaaacataa
6060aaataaataa acacagagta aattcccaaa ttattccatc attaaaagat acgaggcgcg
6120tgtaagttac aggcaagcga tctctaagaa accattatta tcatgacatt aacctataaa
6180aaaggcctct cgagctagag tcgatcttcg ccagcagggc gaggatcgtg gcatcaccga
6240accgcgccgt gcgcgggtcg tcggtgagcc agagtttcag caggccgccc aggcggccca
6300ggtcgccatt gatgcgggcc agctcgcgga cgtgctcata gtccacgacg cccgtgattt
6360tgtagccctg gccgacggcc agcaggtagg ccgacaggct catgccggcc gccgccgcct
6420tttcctcaat cgctcttcgt tcgtctggaa ggcagtacac cttgataggt gggctgccct
6480tcctggttgg cttggtttca tcagccatcc gcttgccctc atctgttacg ccggcggtag
6540ccggccagcc tcgcagagca ggattcccgt tgagcaccgc caggtgcgaa taagggacag
6600tgaagaagga acacccgctc gcgggtgggc ctacttcacc tatcctgccc ggctgacgcc
6660gttggataca ccaaggaaag tctacacgaa ccctttggca aaatcctgta tatcgtgcga
6720aaaaggatgg atataccgaa aaaatcgcta taatgacccc gaagcagggt tatgcagcgg
6780aaaagcgctg cttccctgct gttttgtgga atatctaccg actggaaaca ggcaaatgca
6840ggaaattact gaactgaggg gacaggcgag agacgatgcc aaagagctac accgacgagc
6900tggccgagtg ggttgaatcc cgcgcggcca agaagcgccg gcgtgatgag gctgcggttg
6960cgttcctggc ggtgagggcg gatgtcgata tgcgtaagga gaaaataccg catcaggcgc
7020atatttgaat gtatttagaa aaataaacaa aaagagtttg tagaaacgca aaaaggccat
7080ccgtcaggat ggccttctgc ttaatttgat gcctggcagt ttatggcggg cgtcctgccc
7140gccaccctcc gggccgttgc ttcgcaacgt tcaaatccgc tcccggcgga tttgtcctac
7200tcaggagagc gttcaccgac aaacaacaga taaaacgaaa ggcccagtct ttcgactgag
7260cctttcgttt tatttgatgc ctgg
7284574621DNAArtificial SequenceDescription of Artificial Sequence
Synthetic polynucleotide 57cagcacaaaa accgggagaa accagacttg
acttagtgaa gagaaatacg atgattttta 60aagacatagt ggcaaaactt attaaagtaa
atgacacagc aatatacctt atagttacaa 120atccagtaga tattcttaca tacgttacct
ataaaatatc tggcttgcca tacggaagag 180tattggggtc tggcacagtt ctcgacagtg
cgagattcag atatctttta agcaaacatt 240gtaacataga tccgaggaat atacacggat
atataattgg ggagcatggc gattctgagc 300ttgcagcttg gagcattacg aacatagcag
gcataccaat tgataattac tgcaatttat 360gtggaaaagc atgtgaaaaa gattttagag
aggagatttt taataatgtt gtaagagctg 420cctatacgat aatagaaaaa aagggtgcga
catattatgc ggttgctctc gcagtaagaa 480gaatcgtaga agctattaaa cccagcgaac
catttgaggt gataggtaag attataccga 540ggtatgaaaa cgagaattgg acctttacag
aattactcta tgaagcgcca tatttaaaaa 600gctaccaaga cgaagaggat gaagaggatg
aggaggcaga ttgccttgaa tatattgaca 660atactgataa gataatatat cttttatata
gaagatatcg ccgtatgtaa ggatttcagg 720gggcaaggca taggcagcgc gcttatcaat
atatctatag aatgggcaaa gcataaaaac 780ttgcatggac taatgcttga aacccaggac
aataacctta tagcttgtaa attctatcat 840aattgtggtt tcaaaatcgg ctccgtcgat
actatgttat acgccaactt tcaaaacaac 900tttgaaaaag ctgttttctg gtatttaagg
ttttagaatg caaggaacag tgaattggag 960ttcgtcttgt tataattagc ttcttggggt
atctttaaat actgtagaaa agaggaagga 1020aataataaat ggctaaaatg agaatatcac
cggaattgaa aaaactgatc gaaaaatacc 1080gctgcgtaaa agatacggaa ggaatgtctc
ctgctaaggt atataagctg gtgggagaaa 1140atgaaaacct atatttaaaa atgacggaca
gccggtataa agggaccacc tatgatgtgg 1200aacgggaaaa ggacatgatg ctatggctgg
aaggaaagct gcctgttcca aaggtcctgc 1260actttgaacg gcatgatggc tggagcaatc
tgctcatgag tgaggccgat ggcgtccttt 1320gctcggaaga gtatgaagat gaacaaagcc
ctgaaaagat tatcgagctg tatgcggagt 1380gcatcaggct ctttcactcc atcgacatat
cggattgtcc ctatacgaat agcttagaca 1440gccgcttagc cgaattggat tacttactga
ataacgatct ggccgatgtg gattgcgaaa 1500actgggaaga agacactcca tttaaagatc
cgcgcgagct gtatgatttt ttaaagacgg 1560aaaagcccga agaggaactt gtcttttccc
acggcgacct gggagacagc aacatctttg 1620tgaaagatgg caaagtaagt ggctttattg
atcttgggag aagcggcagg gcggacaagt 1680ggtatgacat tgccttctgc gtccggtcga
tcagggagga tatcggggaa gaacagtatg 1740tcgagctatt ttttgactta ctggggatca
agcctgattg ggagaaaata aaatattata 1800ttttactgga tgaattgttt tagtacctag
atttagatgt ctaaaaagct ttttagacat 1860ctaatctttt ctgaagtaca tccgcaactg
tccatactct gatgttttat atcttttcta 1920aaagttcgct agataggggt cccgagcgcc
tacgaggaat ttgtatcgct gcaggcatgc 1980aagcttggcg taatcatggt catagctgtt
tcctgtgtga aattgttatc cgctcacaat 2040tccacacaac atacgagccg gaagcataaa
gtgtaaagcc tggggtgcct aatgagtgag 2100ctaactcaca ttaattgcgt tgcgctcact
gcccgctttc cagtcgggaa acctgtcgtg 2160ccagctgcat taatgaatcg gccaacgcgc
ggggagaggc ggtttgcgta ttgggcgctc 2220ttccgcttcc tcgctcactg actcgctgcg
ctcggtcgtt cggctgcggc gagcggtatc 2280agctcactca aaggcggtaa tacggttatc
cacagaatca ggggataacg caggaaagaa 2340catgtgagca aaaggccagc aaaaggccag
gaaccgtaaa aaggccgcgt tgctggcgtt 2400tttccatagg ctccgccccc ctgacgagca
tcacaaaaat cgacgctcaa gtcagaggtg 2460gcgaaacccg acaggactat aaagatacca
ggcgtttccc cctggaagct ccctcgtgcg 2520ctctcctgtt ccgaccctgc cgcttaccgg
atacctgtcc gcctttctcc cttcgggaag 2580cgtggcgctt tctcatagct cacgctgtag
gtatctcagt tcggtgtagg tcgttcgctc 2640caagctgggc tgtgtgcacg aaccccccgt
tcagcccgac cgctgcgcct tatccggtaa 2700ctatcgtctt gagtccaacc cggtaagaca
cgacttatcg ccactggcag cagccactgg 2760taacaggatt agcagagcga ggtatgtagg
cggtgctaca gagttcttga agtggtggcc 2820taactacggc tacactagaa ggacagtatt
tggtatctgc gctctgctga agccagttac 2880cttcggaaaa agagttggta gctcttgatc
cggcaaacaa accaccgctg gtagcggtgg 2940tttttttgtt tgcaagcagc agattacgcg
cagaaaaaaa ggatctcaag aagatccttt 3000gatcttttct acggggtctg acgctcagtg
gaacgaaaac tcacgttaag ggattttggt 3060catgagatta tcaaaaagga tcttcaccta
gatcctttta aattaaaaat gaagttttaa 3120atcaatctaa agtatatatg agtaaacttg
gtctgacagt taccaatgct taatcagtga 3180ggcacctatc tcagcgatct gtctatttcg
ttcatccata gttgcctgac tccccgtcgt 3240gtagataact acgatacggg agggcttacc
atctggcccc agtgctgcaa tgataccgcg 3300agacccacgc tcaccggctc cagatttatc
agcaataaac cagccagccg gaagggccga 3360gcgcagaagt ggtcctgcaa ctttatccgc
ctccatccag tctattaatt gttgccggga 3420agctagagta agtagttcgc cagttaatag
tttgcgcaac gttgttgcca ttgctacagg 3480catcgtggtg tcacgctcgt cgtttggtat
ggcttcattc agctccggtt cccaacgatc 3540aaggcgagtt acatgatccc ccatgttgtg
caaaaaagcg gttagctcct tcggtcctcc 3600gatcgttgtc agaagtaagt tggccgcagt
gttatcactc atggttatgg cagcactgca 3660taattctctt actgtcatgc catccgtaag
atgcttttct gtgactggtg agtactcaac 3720caagtcattc tgagaatagt gtatgcggcg
accgagttgc tcttgcccgg cgtcaatacg 3780ggataatacc gcgccacata gcagaacttt
aaaagtgctc atcattggaa aacgttcttc 3840ggggcgaaaa ctctcaagga tcttaccgct
gttgagatcc agttcgatgt aacccactcg 3900tgcacccaac tgatcttcag catcttttac
tttcaccagc gtttctgggt gagcaaaaac 3960aggaaggcaa aatgccgcaa aaaagggaat
aagggcgaca cggaaatgtt gaatactcat 4020actcttcctt tttcaatatt attgaagcat
ttatcagggt tattgtctca tgagcggata 4080catatttgaa tgtatttaga aaaataaaca
aataggggtt ccgcgcacat ttccccgaaa 4140agtgccacct gacgtctaag aaaccattat
tatcatgaca ttaacctata aaaataggcg 4200tatcacgagg ccctttcgtc tcgcgcgttt
cggtgatgac ggtgaaaacc tctgacacat 4260gcagctcccg gagacggtca cagcttgtct
gtaagcggat gccgggagca gacaagcccg 4320tcagggcgcg tcagcgggtg ttggcgggtg
tcggggctgg cttaactatg cggcatcaga 4380gcagattgta ctgagagtgc accatatgcg
gtgtgaaata ccgcacagat gcgtaaggag 4440aaaataccgc atcaggcgcc attcgccatt
caggctgcgc aactgttggg aagggcgatc 4500ggtgcgggcc tcttcgctat tacgccagct
ggcgaaaggg ggatgtgctg caaggcgatt 4560aagttgggta acgccagggt tttcccagtc
acgacgttgt aaaacgacgg ccagtgaatt 4620c
4621586130DNAArtificial
SequenceDescription of Artificial Sequence Synthetic polynucleotide
58aggtttatcg ccgccttgta cagttttaaa ttgcatagaa atcatgagac ccattgcaat
60aaatacaagt atcaaagata aagtttgcaa tgcctttcct ttcaatttct ccacatcctt
120tctctatata aaaagacatc ttcgtcttgc ttttggtttc agcttatatg cacttttata
180aataactatg atactctata aatactataa catggaaaat gttaaaattt attaagaaat
240tattaagttt ttattacaaa aaagttacaa aacctctgac atttttcata tcagaggttg
300tcatttttta ttttattttc tatagaattt tttagtgaca atatttcttc taattcttta
360ttgtatttat ctattttcaa catggtactt ctatataggc gtatatcttc ttcgtttttt
420tgtatacatt ttttaaggga gttttttaca gtttcaaaaa gcgtatcata agtaatgtaa
480ttatgcattt caaggtcgga gattggaact gcgattaatt cctccccttc tattttatag
540tgataaaaaa tgttgtcggg ttttaaaact attttttcat ctatgctttt atcgtaaatt
600attatacctt ctcctatttc aaaagttttt ccgctataac ttctaagttt tatattttcg
660actcctatgt atttttttat tgccaaaagt atgtttttta tttctgaaat ggattttaca
720agcacttctt gcattttttt atttgccatc tctttatctt tttcactttt tatgagttct
780tccatcaaag actttatttc atgtatatct cccataaaat atcacctctt tcttaatatt
840ccacagagga atcattttaa acgttgaata ttttaaatta ttagagaaaa aatagacttg
900actatttttt gaaatttgat agactattat taatagaaaa ttaatattga aaaggagaag
960atattatgaa caaaatatct ataataggtt ctggatttgt cggaaaccca gcgaaccatt
1020tgaggtgata ggtaagatta taccgaggta tgaaaacgag aattggacct ttacagaatt
1080actctatgaa gcgccatatt taaaaagcta ccaagacgaa gaggatgaag aggatgagga
1140ggcagattgc cttgaatata ttgacaatac tgataagata atatatcttt tatatagaag
1200atatcgccgt atgtaaggat ttcagggggc aaggcatagg cagcgcgctt atcaatatat
1260ctatagaatg ggcaaagcat aaaaacttgc atggactaat gcttgaaacc caggacaata
1320accttatagc ttgtaaattc tatcataatt gtggtttcaa aatcggctcc gtcgatacta
1380tgttatacgc caactttcaa aacaactttg aaaaagctgt tttctggtat ttaaggtttt
1440agaatgcaag gaacagtgaa ttggagttcg tcttgttata attagcttct tggggtatct
1500ttaaatactg tagaaaagag gaaggaaata ataaatggct aaaatgagaa tatcaccgga
1560attgaaaaaa ctgatcgaaa aataccgctg cgtaaaagat acggaaggaa tgtctcctgc
1620taaggtatat aagctggtgg gagaaaatga aaacctatat ttaaaaatga cggacagccg
1680gtataaaggg accacctatg atgtggaacg ggaaaaggac atgatgctat ggctggaagg
1740aaagctgcct gttccaaagg tcctgcactt tgaacggcat gatggctgga gcaatctgct
1800catgagtgag gccgatggcg tcctttgctc ggaagagtat gaagatgaac aaagccctga
1860aaagattatc gagctgtatg cggagtgcat caggctcttt cactccatcg acatatcgga
1920ttgtccctat acgaatagct tagacagccg cttagccgaa ttggattact tactgaataa
1980cgatctggcc gatgtggatt gcgaaaactg ggaagaagac actccattta aagatccgcg
2040cgagctgtat gattttttaa agacggaaaa gcccgaagag gaacttgtct tttcccacgg
2100cgacctggga gacagcaaca tctttgtgaa agatggcaaa gtaagtggct ttattgatct
2160tgggagaagc ggcagggcgg acaagtggta tgacattgcc ttctgcgtcc ggtcgatcag
2220ggaggatatc ggggaagaac agtatgtcga gctatttttt gacttactgg ggatcaagcc
2280tgattgggag aaaataaaat attatatttt actggatgaa ttgttttagt acctagattt
2340agatgtctaa aaagcttttt agacatctaa tcttttctga agtacatccg caactgtcca
2400tactctgatg ttttatatct tttctaaaag ttcgctagat aggggtcccg agcgcctacg
2460aggaatttgt atcgaagatc agccgaagtt atcaaaagtg taatacaaga gcttgatata
2520taagagggga aaccctcttt ttttgtatat aaaaagtcac agcgtgaaaa tataataatt
2580aaaataatga ttttttaggg tgtgatagtc gtgcagaaaa taactcagca ggagattatt
2640ttaagtgcct ttgttgaagc acaaaattta gaaaagatac tgttggataa agtaagagaa
2700tatgggaaag aatcagtaga taatcaaata aaagcattgt taaagcaaat tgaaataatg
2760ataaaaaatc ataaagaaga cataaaaaag gcacaaaaga ctatgcatat taattccctt
2820gtcaaaaaaa atatgtctca agagccttta gacatgcttc aagatttatt aaaaaattta
2880gttaatattc aagcctttta taatgaaact gttgtgaata ttactaatcc ttacgttaga
2940cagttgttta ctcaaatgag ggatgatgtt atgagattta tttctattct tcaaatggag
3000attgaaagtc tggaatcgaa accttctatt ccaaataaca cagttttaaa tacaccggag
3060atgagttaat atgaaagtgg ctattattgg tgctggtgtt tcagggctgg ctgcggcaat
3120tacttttcaa aggtatggca ttacaccaga tatttttgaa aaaaagtgca aaataggtga
3180attatttaac catgttgcgg ggttattaaa agtgataaat aggcctataa aggatccgct
3240tcatcatctt aaaaatgttt atggaataga agttaaacca attaacacca ttgacaaaat
3300agtaatgaag gggccaactg taacagcttc tgttactggg agtaatcttg ggtatatgat
3360tttaagagga caggacgcaa actctcttga aaatcaattg tataataagt tagaaatacc
3420agttaatttc aatatagaag ctgattataa gaagttaaaa aataattacg attatgtctg
3480caggcatgca agcttggcgt aatcatggtc atagctgttt cctgtgtgaa attgttatcc
3540gctcacaatt ccacacaaca tacgagccgg aagcataaag tgtaaagcct ggggtgccta
3600atgagtgagc taactcacat taattgcgtt gcgctcactg cccgctttcc agtcgggaaa
3660cctgtcgtgc cagctgcatt aatgaatcgg ccaacgcgcg gggagaggcg gtttgcgtat
3720tgggcgctct tccgcttcct cgctcactga ctcgctgcgc tcggtcgttc ggctgcggcg
3780agcggtatca gctcactcaa aggcggtaat acggttatcc acagaatcag gggataacgc
3840aggaaagaac atgtgagcaa aaggccagca aaaggccagg aaccgtaaaa aggccgcgtt
3900gctggcgttt ttccataggc tccgcccccc tgacgagcat cacaaaaatc gacgctcaag
3960tcagaggtgg cgaaacccga caggactata aagataccag gcgtttcccc ctggaagctc
4020cctcgtgcgc tctcctgttc cgaccctgcc gcttaccgga tacctgtccg cctttctccc
4080ttcgggaagc gtggcgcttt ctcatagctc acgctgtagg tatctcagtt cggtgtaggt
4140cgttcgctcc aagctgggct gtgtgcacga accccccgtt cagcccgacc gctgcgcctt
4200atccggtaac tatcgtcttg agtccaaccc ggtaagacac gacttatcgc cactggcagc
4260agccactggt aacaggatta gcagagcgag gtatgtaggc ggtgctacag agttcttgaa
4320gtggtggcct aactacggct acactagaag gacagtattt ggtatctgcg ctctgctgaa
4380gccagttacc ttcggaaaaa gagttggtag ctcttgatcc ggcaaacaaa ccaccgctgg
4440tagcggtggt ttttttgttt gcaagcagca gattacgcgc agaaaaaaag gatctcaaga
4500agatcctttg atcttttcta cggggtctga cgctcagtgg aacgaaaact cacgttaagg
4560gattttggtc atgagattat caaaaaggat cttcacctag atccttttaa attaaaaatg
4620aagttttaaa tcaatctaaa gtatatatga gtaaacttgg tctgacagtt accaatgctt
4680aatcagtgag gcacctatct cagcgatctg tctatttcgt tcatccatag ttgcctgact
4740ccccgtcgtg tagataacta cgatacggga gggcttacca tctggcccca gtgctgcaat
4800gataccgcga gacccacgct caccggctcc agatttatca gcaataaacc agccagccgg
4860aagggccgag cgcagaagtg gtcctgcaac tttatccgcc tccatccagt ctattaattg
4920ttgccgggaa gctagagtaa gtagttcgcc agttaatagt ttgcgcaacg ttgttgccat
4980tgctacaggc atcgtggtgt cacgctcgtc gtttggtatg gcttcattca gctccggttc
5040ccaacgatca aggcgagtta catgatcccc catgttgtgc aaaaaagcgg ttagctcctt
5100cggtcctccg atcgttgtca gaagtaagtt ggccgcagtg ttatcactca tggttatggc
5160agcactgcat aattctctta ctgtcatgcc atccgtaaga tgcttttctg tgactggtga
5220gtactcaacc aagtcattct gagaatagtg tatgcggcga ccgagttgct cttgcccggc
5280gtcaatacgg gataataccg cgccacatag cagaacttta aaagtgctca tcattggaaa
5340acgttcttcg gggcgaaaac tctcaaggat cttaccgctg ttgagatcca gttcgatgta
5400acccactcgt gcacccaact gatcttcagc atcttttact ttcaccagcg tttctgggtg
5460agcaaaaaca ggaaggcaaa atgccgcaaa aaagggaata agggcgacac ggaaatgttg
5520aatactcata ctcttccttt ttcaatatta ttgaagcatt tatcagggtt attgtctcat
5580gagcggatac atatttgaat gtatttagaa aaataaacaa ataggggttc cgcgcacatt
5640tccccgaaaa gtgccacctg acgtctaaga aaccattatt atcatgacat taacctataa
5700aaataggcgt atcacgaggc cctttcgtct cgcgcgtttc ggtgatgacg gtgaaaacct
5760ctgacacatg cagctcccgg agacggtcac agcttgtctg taagcggatg ccgggagcag
5820acaagcccgt cagggcgcgt cagcgggtgt tggcgggtgt cggggctggc ttaactatgc
5880ggcatcagag cagattgtac tgagagtgca ccatatgcgg tgtgaaatac cgcacagatg
5940cgtaaggaga aaataccgca tcaggcgcca ttcgccattc aggctgcgca actgttggga
6000agggcgatcg gtgcgggcct cttcgctatt acgccagctg gcgaaagggg gatgtgctgc
6060aaggcgatta agttgggtaa cgccagggtt ttcccagtca cgacgttgta aaacgacggc
6120cagtgaattc
6130594680DNAArtificial SequenceDescription of Artificial Sequence
Synthetic polynucleotide 59tattgattga cgatgaggta atcaaaaaat
tagaagcatg tattgacctt gcacctttgc 60acaatcctgc taatattgag ggaataaaag
cttgtcggca gataatgcca ggggtgccaa 120tggtagcagt ttttgatacg gctttccatc
aaacaatgcc agattatgcg tatatttatc 180ccattcctta tgaatactac gaaaaatata
gaataagaag atatggattc catgggactt 240ctcataaata tgtatcttta agagctgctg
aaatattaaa gaggcctatt gaagagttaa 300aaattattac ttgccattta gggaatgggt
ctagtattgc tgcggttaaa ggcggtaagt 360cgatagatac aagtatggga tttactccat
tagaagggct ggctatgggt acaaggtccg 420gaaatgttga tccttcaatt ataactttct
taatggaaaa agaaggattg actgcagaac 480aggttataga tatacttaat aagaaatcag
gtgtatacgg aatttcagga ataagtaatg 540actttagaga tatagaaaac ccagcgaacc
atttgaggtg ataggtaaga ttataccgag 600gtatgaaaac gagaattgga cctttacaga
attactctat gaagcgccat atttaaaaag 660ctaccaagac gaagaggatg aagaggatga
ggaggcagat tgccttgaat atattgacaa 720tactgataag ataatatatc ttttatatag
aagatatcgc cgtatgtaag gatttcaggg 780ggcaaggcat aggcagcgcg cttatcaata
tatctataga atgggcaaag cataaaaact 840tgcatggact aatgcttgaa acccaggaca
ataaccttat agcttgtaaa ttctatcata 900attgtggttt caaaatcggc tccgtcgata
ctatgttata cgccaacttt caaaacaact 960ttgaaaaagc tgttttctgg tatttaaggt
tttagaatgc aaggaacagt gaattggagt 1020tcgtcttgtt ataattagct tcttggggta
tctttaaata ctgtagaaaa gaggaaggaa 1080ataataaatg gctaaaatga gaatatcacc
ggaattgaaa aaactgatcg aaaaataccg 1140ctgcgtaaaa gatacggaag gaatgtctcc
tgctaaggta tataagctgg tgggagaaaa 1200tgaaaaccta tatttaaaaa tgacggacag
ccggtataaa gggaccacct atgatgtgga 1260acgggaaaag gacatgatgc tatggctgga
aggaaagctg cctgttccaa aggtcctgca 1320ctttgaacgg catgatggct ggagcaatct
gctcatgagt gaggccgatg gcgtcctttg 1380ctcggaagag tatgaagatg aacaaagccc
tgaaaagatt atcgagctgt atgcggagtg 1440catcaggctc tttcactcca tcgacatatc
ggattgtccc tatacgaata gcttagacag 1500ccgcttagcc gaattggatt acttactgaa
taacgatctg gccgatgtgg attgcgaaaa 1560ctgggaagaa gacactccat ttaaagatcc
gcgcgagctg tatgattttt taaagacgga 1620aaagcccgaa gaggaacttg tcttttccca
cggcgacctg ggagacagca acatctttgt 1680gaaagatggc aaagtaagtg gctttattga
tcttgggaga agcggcaggg cggacaagtg 1740gtatgacatt gccttctgcg tccggtcgat
cagggaggat atcggggaag aacagtatgt 1800cgagctattt tttgacttac tggggatcaa
gcctgattgg gagaaaataa aatattatat 1860tttactggat gaattgtttt agtacctaga
tttagatgtc taaaaagctt tttagacatc 1920taatcttttc tgaagtacat ccgcaactgt
ccatactctg atgttttata tcttttctaa 1980aagttcgcta gataggggtc ccgagcgcct
acgaggaatt tgtatcgctg caggcatgca 2040agcttggcgt aatcatggtc atagctgttt
cctgtgtgaa attgttatcc gctcacaatt 2100ccacacaaca tacgagccgg aagcataaag
tgtaaagcct ggggtgccta atgagtgagc 2160taactcacat taattgcgtt gcgctcactg
cccgctttcc agtcgggaaa cctgtcgtgc 2220cagctgcatt aatgaatcgg ccaacgcgcg
gggagaggcg gtttgcgtat tgggcgctct 2280tccgcttcct cgctcactga ctcgctgcgc
tcggtcgttc ggctgcggcg agcggtatca 2340gctcactcaa aggcggtaat acggttatcc
acagaatcag gggataacgc aggaaagaac 2400atgtgagcaa aaggccagca aaaggccagg
aaccgtaaaa aggccgcgtt gctggcgttt 2460ttccataggc tccgcccccc tgacgagcat
cacaaaaatc gacgctcaag tcagaggtgg 2520cgaaacccga caggactata aagataccag
gcgtttcccc ctggaagctc cctcgtgcgc 2580tctcctgttc cgaccctgcc gcttaccgga
tacctgtccg cctttctccc ttcgggaagc 2640gtggcgcttt ctcatagctc acgctgtagg
tatctcagtt cggtgtaggt cgttcgctcc 2700aagctgggct gtgtgcacga accccccgtt
cagcccgacc gctgcgcctt atccggtaac 2760tatcgtcttg agtccaaccc ggtaagacac
gacttatcgc cactggcagc agccactggt 2820aacaggatta gcagagcgag gtatgtaggc
ggtgctacag agttcttgaa gtggtggcct 2880aactacggct acactagaag gacagtattt
ggtatctgcg ctctgctgaa gccagttacc 2940ttcggaaaaa gagttggtag ctcttgatcc
ggcaaacaaa ccaccgctgg tagcggtggt 3000ttttttgttt gcaagcagca gattacgcgc
agaaaaaaag gatctcaaga agatcctttg 3060atcttttcta cggggtctga cgctcagtgg
aacgaaaact cacgttaagg gattttggtc 3120atgagattat caaaaaggat cttcacctag
atccttttaa attaaaaatg aagttttaaa 3180tcaatctaaa gtatatatga gtaaacttgg
tctgacagtt accaatgctt aatcagtgag 3240gcacctatct cagcgatctg tctatttcgt
tcatccatag ttgcctgact ccccgtcgtg 3300tagataacta cgatacggga gggcttacca
tctggcccca gtgctgcaat gataccgcga 3360gacccacgct caccggctcc agatttatca
gcaataaacc agccagccgg aagggccgag 3420cgcagaagtg gtcctgcaac tttatccgcc
tccatccagt ctattaattg ttgccgggaa 3480gctagagtaa gtagttcgcc agttaatagt
ttgcgcaacg ttgttgccat tgctacaggc 3540atcgtggtgt cacgctcgtc gtttggtatg
gcttcattca gctccggttc ccaacgatca 3600aggcgagtta catgatcccc catgttgtgc
aaaaaagcgg ttagctcctt cggtcctccg 3660atcgttgtca gaagtaagtt ggccgcagtg
ttatcactca tggttatggc agcactgcat 3720aattctctta ctgtcatgcc atccgtaaga
tgcttttctg tgactggtga gtactcaacc 3780aagtcattct gagaatagtg tatgcggcga
ccgagttgct cttgcccggc gtcaatacgg 3840gataataccg cgccacatag cagaacttta
aaagtgctca tcattggaaa acgttcttcg 3900gggcgaaaac tctcaaggat cttaccgctg
ttgagatcca gttcgatgta acccactcgt 3960gcacccaact gatcttcagc atcttttact
ttcaccagcg tttctgggtg agcaaaaaca 4020ggaaggcaaa atgccgcaaa aaagggaata
agggcgacac ggaaatgttg aatactcata 4080ctcttccttt ttcaatatta ttgaagcatt
tatcagggtt attgtctcat gagcggatac 4140atatttgaat gtatttagaa aaataaacaa
ataggggttc cgcgcacatt tccccgaaaa 4200gtgccacctg acgtctaaga aaccattatt
atcatgacat taacctataa aaataggcgt 4260atcacgaggc cctttcgtct cgcgcgtttc
ggtgatgacg gtgaaaacct ctgacacatg 4320cagctcccgg agacggtcac agcttgtctg
taagcggatg ccgggagcag acaagcccgt 4380cagggcgcgt cagcgggtgt tggcgggtgt
cggggctggc ttaactatgc ggcatcagag 4440cagattgtac tgagagtgca ccatatgcgg
tgtgaaatac cgcacagatg cgtaaggaga 4500aaataccgca tcaggcgcca ttcgccattc
aggctgcgca actgttggga agggcgatcg 4560gtgcgggcct cttcgctatt acgccagctg
gcgaaagggg gatgtgctgc aaggcgatta 4620agttgggtaa cgccagggtt ttcccagtca
cgacgttgta aaacgacggc cagtgaattc 4680605663DNAArtificial
SequenceDescription of Artificial Sequence Synthetic polynucleotide
60gttttataag aagctttgta aattttattt tctaagcctt cagagacatc aaatatgtga
60ttgagaggaa tgtgatttct caacagataa attaaaatat tagaaaattc ctctaaagat
120acgggacctt gttttttttc aaaagaactt tgcaaaattt cttttgtaaa gtccgggata
180tatttttcta agccttctag tcccttttca aaaatatgct ttcttataga agaggcactg
240gcaaattctc cttttagctc taaagaggtg tacaaagagc cttttctctt tatagtaaaa
300ggtgtaatag aactacctat tttctttaaa gatttaaggt attctattgc caatatgtta
360ttggatgttt gtaaaatctt ttctatttca ttattattta taactttttg taatgctaat
420tcccgtgctt ttgcaaaggt tatgccgctt tttaaatatt cttttaatgc ttttctataa
480taaattggct cttctaaaag tatttcagca atttttgtga gttcgtttaa atcgcctttt
540tcactcccaa aggaaaaaca atctactatt tttaaagagt ctaatagttt caccgctcca
600taagcgaaat tttcagctgt agaggtagca taaactactg gtaactcgat taccaaatct
660ataccggctt ttaatgccat ttgagttcgt ttccatttgt ctacaattgc tggttctcct
720ctttgcacga agtttccact cattactgct atagtataat cgcatttggt taattctttt
780gaagtttgca gatggtaaag gtggccattg tgaaaaggat tatattcgac aataattcct
840aaaattccca tacaacttct taccctttca aaaaattttt taagatatac ttattatttt
900acataaaata tgataaaatg taaaagggac atcgtgtata caatattata gtgataaaat
960taaaaaagga agggagattt taaatggcag taatggatag taaaacccag cgaaccattt
1020gaggtgatag gtaagattat accgaggtat gaaaacgaga attggacctt tacagaatta
1080ctctatgaag cgccatattt aaaaagctac caagacgaag aggatgaaga ggatgaggag
1140gcagattgcc ttgaatatat tgacaatact gataagataa tatatctttt atatagaaga
1200tatcgccgta tgtaaggatt tcagggggca aggcataggc agcgcgctta tcaatatatc
1260tatagaatgg gcaaagcata aaaacttgca tggactaatg cttgaaaccc aggacaataa
1320ccttatagct tgtaaattct atcataattg tggtttcaaa atcggctccg tcgatactat
1380gttatacgcc aactttcaaa acaactttga aaaagctgtt ttctggtatt taaggtttta
1440gaatgcaagg aacagtgaat tggagttcgt cttgttataa ttagcttctt ggggtatctt
1500taaatactgt agaaaagagg aaggaaataa taaatggcta aaatgagaat atcaccggaa
1560ttgaaaaaac tgatcgaaaa ataccgctgc gtaaaagata cggaaggaat gtctcctgct
1620aaggtatata agctggtggg agaaaatgaa aacctatatt taaaaatgac ggacagccgg
1680tataaaggga ccacctatga tgtggaacgg gaaaaggaca tgatgctatg gctggaagga
1740aagctgcctg ttccaaaggt cctgcacttt gaacggcatg atggctggag caatctgctc
1800atgagtgagg ccgatggcgt cctttgctcg gaagagtatg aagatgaaca aagccctgaa
1860aagattatcg agctgtatgc ggagtgcatc aggctctttc actccatcga catatcggat
1920tgtccctata cgaatagctt agacagccgc ttagccgaat tggattactt actgaataac
1980gatctggccg atgtggattg cgaaaactgg gaagaagaca ctccatttaa agatccgcgc
2040gagctgtatg attttttaaa gacggaaaag cccgaagagg aacttgtctt ttcccacggc
2100gacctgggag acagcaacat ctttgtgaaa gatggcaaag taagtggctt tattgatctt
2160gggagaagcg gcagggcgga caagtggtat gacattgcct tctgcgtccg gtcgatcagg
2220gaggatatcg gggaagaaca gtatgtcgag ctattttttg acttactggg gatcaagcct
2280gattgggaga aaataaaata ttatatttta ctggatgaat tgttttagta cctagattta
2340gatgtctaaa aagcttttta gacatctaat cttttctgaa gtacatccgc aactgtccat
2400actctgatgt tttatatctt ttctaaaagt tcgctagata ggggtcccga gcgcctacga
2460ggaatttgta tcgtgacttt agagatatag aaaatgcagc ttttaaagaa gggcataaaa
2520gggctatgtt ggcattaaaa gttttcgctt atagggtgaa aaagacaata ggttcttata
2580cagctgctat gggtggggtt gatgtaattg tgtttactgc tggagttgga gaaaatggac
2640cagaaatgag agagtttatt ttagaggatc tagagttttt aggctttaaa ctggacaaag
2700agaagaataa ggtaagagga aaagaggaaa ttatatctac agaagattca aaagttaaag
2760ttatggttat tcctacaaat gaagaatata tgattgctaa agatactgaa aaattggtaa
2820aaggtttaaa gtagataatc ttgacaacgg gttgtggggt tagtataata ggtgatgtca
2880attattttaa ggtgtgagaa gaaaaatgaa aatcgatcta ttaaaaatca aaggacagct
2940tggccgcagc ataaatatag actatgtaga ggacatagag aacattgaat ttaaagggga
3000agaatacaaa ctgcaggcat gcaagcttgg cgtaatcatg gtcatagctg tttcctgtgt
3060gaaattgtta tccgctcaca attccacaca acatacgagc cggaagcata aagtgtaaag
3120cctggggtgc ctaatgagtg agctaactca cattaattgc gttgcgctca ctgcccgctt
3180tccagtcggg aaacctgtcg tgccagctgc attaatgaat cggccaacgc gcggggagag
3240gcggtttgcg tattgggcgc tcttccgctt cctcgctcac tgactcgctg cgctcggtcg
3300ttcggctgcg gcgagcggta tcagctcact caaaggcggt aatacggtta tccacagaat
3360caggggataa cgcaggaaag aacatgtgag caaaaggcca gcaaaaggcc aggaaccgta
3420aaaaggccgc gttgctggcg tttttccata ggctccgccc ccctgacgag catcacaaaa
3480atcgacgctc aagtcagagg tggcgaaacc cgacaggact ataaagatac caggcgtttc
3540cccctggaag ctccctcgtg cgctctcctg ttccgaccct gccgcttacc ggatacctgt
3600ccgcctttct cccttcggga agcgtggcgc tttctcatag ctcacgctgt aggtatctca
3660gttcggtgta ggtcgttcgc tccaagctgg gctgtgtgca cgaacccccc gttcagcccg
3720accgctgcgc cttatccggt aactatcgtc ttgagtccaa cccggtaaga cacgacttat
3780cgccactggc agcagccact ggtaacagga ttagcagagc gaggtatgta ggcggtgcta
3840cagagttctt gaagtggtgg cctaactacg gctacactag aaggacagta tttggtatct
3900gcgctctgct gaagccagtt accttcggaa aaagagttgg tagctcttga tccggcaaac
3960aaaccaccgc tggtagcggt ggtttttttg tttgcaagca gcagattacg cgcagaaaaa
4020aaggatctca agaagatcct ttgatctttt ctacggggtc tgacgctcag tggaacgaaa
4080actcacgtta agggattttg gtcatgagat tatcaaaaag gatcttcacc tagatccttt
4140taaattaaaa atgaagtttt aaatcaatct aaagtatata tgagtaaact tggtctgaca
4200gttaccaatg cttaatcagt gaggcaccta tctcagcgat ctgtctattt cgttcatcca
4260tagttgcctg actccccgtc gtgtagataa ctacgatacg ggagggctta ccatctggcc
4320ccagtgctgc aatgataccg cgagacccac gctcaccggc tccagattta tcagcaataa
4380accagccagc cggaagggcc gagcgcagaa gtggtcctgc aactttatcc gcctccatcc
4440agtctattaa ttgttgccgg gaagctagag taagtagttc gccagttaat agtttgcgca
4500acgttgttgc cattgctaca ggcatcgtgg tgtcacgctc gtcgtttggt atggcttcat
4560tcagctccgg ttcccaacga tcaaggcgag ttacatgatc ccccatgttg tgcaaaaaag
4620cggttagctc cttcggtcct ccgatcgttg tcagaagtaa gttggccgca gtgttatcac
4680tcatggttat ggcagcactg cataattctc ttactgtcat gccatccgta agatgctttt
4740ctgtgactgg tgagtactca accaagtcat tctgagaata gtgtatgcgg cgaccgagtt
4800gctcttgccc ggcgtcaata cgggataata ccgcgccaca tagcagaact ttaaaagtgc
4860tcatcattgg aaaacgttct tcggggcgaa aactctcaag gatcttaccg ctgttgagat
4920ccagttcgat gtaacccact cgtgcaccca actgatcttc agcatctttt actttcacca
4980gcgtttctgg gtgagcaaaa acaggaaggc aaaatgccgc aaaaaaggga ataagggcga
5040cacggaaatg ttgaatactc atactcttcc tttttcaata ttattgaagc atttatcagg
5100gttattgtct catgagcgga tacatatttg aatgtattta gaaaaataaa caaatagggg
5160ttccgcgcac atttccccga aaagtgccac ctgacgtcta agaaaccatt attatcatga
5220cattaaccta taaaaatagg cgtatcacga ggccctttcg tctcgcgcgt ttcggtgatg
5280acggtgaaaa cctctgacac atgcagctcc cggagacggt cacagcttgt ctgtaagcgg
5340atgccgggag cagacaagcc cgtcagggcg cgtcagcggg tgttggcggg tgtcggggct
5400ggcttaacta tgcggcatca gagcagattg tactgagagt gcaccatatg cggtgtgaaa
5460taccgcacag atgcgtaagg agaaaatacc gcatcaggcg ccattcgcca ttcaggctgc
5520gcaactgttg ggaagggcga tcggtgcggg cctcttcgct attacgccag ctggcgaaag
5580ggggatgtgc tgcaaggcga ttaagttggg taacgccagg gttttcccag tcacgacgtt
5640gtaaaacgac ggccagtgaa ttc
5663615395DNAArtificial SequenceDescription of Artificial Sequence
Synthetic polynucleotide 61aattcgagct cggtacccgg ggatcctcta
gagtcgacct gcaggcatgc aagggaagat 60atgcctgctt gacattattg tccgtcattt
ttcggtttat cctggtaaaa aaagttttaa 120tcctctcaag gctttcttcg tgtaaaacaa
gctttttccc caaaacttcc gcaacagtct 180cctttgtaag gtcatcctgc gtggggccga
gtcctccggt cataataaca aggtcgcacc 240tttccaaagc tgcaagaaga cattttttca
gccgaacgga attgtccccc accacactgt 300gataatacac attcacacca atgtcattga
gccttttgga tatatactgg gcattggtat 360ttgctatctg ccccattaaa agctcggttc
caaccgctaa tatctccgca ttcatattga 420aagacccctt aaatttaaac tttttgtaac
ttattatatc aattagtgtt ataaaataaa 480agggaaaaag aattaaaatc aaaggtttca
agagcagccg tatcacccgt aaaagtttca 540gccgattcaa cctttttaca cataaaactt
tcaaaaattg atgacttaca attatcaagt 600aggatataat attactaatg ctaaacagtt
attgataaag gaggaaggaa tatgaacaat 660aacaaagtaa ttaaaaaagt aacccggact
ttctgagaag ctgaatttct tcatcgttga 720aaggcacgtt caatatttcc tcaataccgt
ttacacccac gattgtcgga acacttaagc 780atacatcgct aagtccgtac tgtccttcca
aaaggcttga aacggtaagg atggagtttt 840catttcttac aatggcttca acgattcttc
ttacggcaag ggctacggca taataggttg 900cacctttgtt cctgatgatt tcataagctg
catttttaac actttcatat attttattcc 960gggaaatctg ctcctcgcac tgatggcatt
cgtcacagta gcgatccatg ggaattcccg 1020caatatttgc aagactccag gccgcaactt
cggtgtcacc gtgttcgcca ataatataag 1080catgtacatt tcgtgcatcc acttttacat
gttcgcttaa aagataacgg aacctggctg 1140tgtccaaaac cgttccggaa cctattactt
tgtttttcgg gaatccggat agtttgtaag 1200ttacataggt taaaatatcc accggatttg
tgactaccag aagaatacaa tcgttgttgt 1260actttacaat ttcatttatg atatttttga
atacttccgt gtttctttta acaagatcta 1320ttctcgtttc gccttctttt tggttggcac
cggcggtaat gattactatg tcggatccgg 1380cacagtcttt gtagtcacca cgataaattt
caacgggcct tacaaaaggc atgccgtgat 1440ttaagtccat gacttctccg tcggcttttt
ttgcatttat gtctatcagt acaatttcag 1500atataagtcc gctgagcatc aatgtataag
ctgtggtgga acctacaaag cctgcaccaa 1560ctacgaaaca ctctaaaaga aataataaaa
acactagata tatgaaagtt ctccttttct 1620tttatgaaaa ggagaacttt cattattgat
aaatatataa actagtatat aattttaata 1680taaaacctat tttacataat ggaaattatc
tatcggggga ggaaatatga acaattcagt 1740ggaaatttta aataaaatcg tgtcaaatat
tgaaaaagtc attgttggaa aaaagaaagc 1800tatcgagttg atattaatat cacttatttg
cgatggacat gttttgattg aagatgtccc 1860cggtgtcgga aaaaccagta cttggcgtaa
tcatggtcat agctgtttcc tgtgtgaaat 1920tgttatccgc tcacaattcc acacaacata
cgagccggaa gcataaagtg taaagcctgg 1980ggtgcctaat gagtgagcta actcacatta
attgcgttgc gctcactgcc cgctttccag 2040tcgggaaacc tgtcgtgcca gcccttcaaa
cttcccaaag gcgagcccta gtgacattag 2100aaaaccgact gtaaaaagta cagtcggcat
tatctcatat tataaaagcc agtcattagg 2160cctatctgac aattcctgaa tagagttcat
aaacaatcct gcatgataac catcacaaac 2220agaatgatgt acctgtaaag atagcggtaa
atatattgaa ttacctttat taatgaattt 2280tcctgctgta ataatgggta gaaggtaatt
actattatta ttgatattta agttaaaccc 2340agtaaatgaa gtccatggaa taatagaaag
agaaaaagca ttttcaggta taggtgtttt 2400gggaaacaat ttccccgaac cattatattt
ctctacatca gaaaggtata aatcataaaa 2460ctctttgaag tcattcttta caggagtcca
aataccagag aatgttttag atacaccatc 2520aaaaattgta taaagtggct ctaacttatc
ccaataacct aactctccgt cgctattgta 2580accagttcta aaagctgtat ttgagtttat
cacccttgtc actaagaaaa taaatgcagg 2640gtaaaattta tatccttctt gttttatgtt
tcggtataaa acactaatat caatttctgt 2700ggttatacta aaagtcgttt gttggttcaa
ataatgatta aatatctctt ttctcttcca 2760attgtctaaa tcaattttat taaagttcat
ttgatatgcc tcctaaattt ttatctaaag 2820tgaatttagg aggcttactt gtctgctttc
ttcattagaa tcaatccttt tttaaaagtc 2880aatcccgttt gttgaactac tctttaataa
aataattttt ccgttcccaa ttccacattg 2940caataataga aaatccatct tcatcggctt
tttcgtcatc atctgtatga atcaaatcgc 3000cttcttctgt gtcatcaagg tttaattttt
tatgtatttc ttttaacaaa ccaccatagg 3060agattaacct tttacggtgt aaaccttcct
ccaaatcaga caaacgtttc aaattctttt 3120cttcatcatc ggtcataaaa tccgtatcct
ttacaggata ttttgcagtt tcgtcaattg 3180ccgattgtat atccgattta tatttatttt
tcggtcgaat catttgaact tttacatttg 3240gatcatagtc taatttcatt gcctttttcc
aaaattgaat ccattgtttt tgattcacgt 3300agttttctgt attcttaaaa taagttggtt
ccacacatac caatacatgc atgtgctgat 3360tataagaatt atctttatta tttattgtca
cttccgttgc acgcataaaa ccaacaagat 3420ttttattaat ttttttatat tgcatcattc
ggcgaaatcc ttgagccata tctgacaaac 3480tcttatttaa ttcttcgcca tcataaacat
ttttaactgt taatgtgaga aacaaccaac 3540gaactgttgg cttttgttta ataacttcag
caacaacctt ttgtgactga atgccatgtt 3600tcattgctct cctccagttg cacattggac
aaagcctgga tttacaaaac cacactcgat 3660acaactttct ttcgcctgtt tcacgatttt
gtttatactc taatatttca gcacaatctt 3720ttactctttc agccttttta aattcaagaa
tatgcagaag ttcaaagtaa tcaacattag 3780cgattttctt ttctctccat ggtctcactt
ttccactttt tgtcttgtcc actaaaaccc 3840ttgatttttc atctgaataa atgctactat
taggacacat aatattaaaa gaaaccccca 3900tctatttagt tatttgtttg gtcacttata
actttaacag atggggtttt tctgtgcaac 3960caattttaag ggttttcaat actttaaaac
acatacatac caacacttca acgcaccttt 4020cagcaactaa aataaaaatg acgttatttc
tatatgtatc aagaatagaa agaactcgtt 4080tttcgctacg ctcaaaacgc aaaaaaagca
ctcattcgag tgctttttct tatcgctcca 4140aatcatgcga ttttttcctc tttgcttttc
tttgctcacg aagttctcga tcacgctgca 4200aaacatcttg aagcgaaaaa gtattcttct
tttcttccga tcgctcatgc tgacgcacga 4260aaagccctct aggcgcatag gaacaactcc
taaatgcatg tgaggggttt tctcgtccat 4320gtgaacagtc gcatacgcaa tattttgttt
cccatactgc attaatgaat cggccaacgc 4380gcggggagag gcggtttgcg tattgggcgc
tcttccgctt cctcgctcac tgactcgctg 4440cgctcggtcg ttcggctgcg gcgagcggta
tcagctcact caaaggcggt aatacggtta 4500tccacagaat caggggataa cgcaggaaag
aacatgtgag caaaaggcca gcaaaaggcc 4560aggaaccgta aaaaggccgc gttgctggcg
tttttccata ggctccgccc ccctgacgag 4620catcacaaaa atcgacgctc aagtcagagg
tggcgaaacc cgacaggact ataaagatac 4680caggcgtttc cccctggaag ctccctcgtg
cgctctcctg ttccgaccct gccgcttacc 4740ggatacctgt ccgcctttct cccttcggga
agcgtggcgc tttctcatag ctcacgctgt 4800aggtatctca gttcggtgta ggtcgttcgc
tccaagctgg gctgtgtgca cgaacccccc 4860gttcagcccg accgctgcgc cttatccggt
aactatcgtc ttgagtccaa cccggtaaga 4920cacgacttat cgccactggc agcagccact
ggtaacagga ttagcagagc gaggtatgta 4980ggcggtgcta cagagttctt gaagtggtgg
cctaactacg gctacactag aagaacagta 5040tttggtatct gcgctctgct gaagccagtt
accttcggaa aaagagttgg tagctcttga 5100tccggcaaac aaaccaccgc tggtagcggt
ggtttttttg tttgcaagca gcagattacg 5160cgcagaaaaa aaggatctca agaagatcct
ttgatctttt ctacggggtc tgacgctcag 5220tggaacgaaa actcacgtta agggattttg
gtcatgagat tatcaaaaag gatcttcacc 5280tagatccttt taaattaaaa atgaagtttt
aaatcaatct aaagtatata tgagtaaact 5340tggtctgaca gttaccaatg cttaatcagt
gaggcaccta tctcagcgat ctgtc 53956245DNAArtificial
SequenceDescription of Artificial Sequence Synthetic primer
62atttacctgg ctgggaatac tgagacatat gtcattgagg ccgta
456353DNAArtificial SequenceDescription of Artificial Sequence Synthetic
primer 63aaaaaagctt ataattatcc ttaatttcct actacgtgcg cccagatagg gtg
536460DNAArtificial SequenceDescription of Artificial Sequence
Synthetic primer 64cagattgtac aaatgtggtg ataacagata agtctactac
tgtaacttac ctttctttgt 606549DNAArtificial SequenceDescription of
Artificial Sequence Synthetic primer 65tgaacgcaag tttctaattt
cggttgaaat ccgatagagg aaagtgtct 496645DNAArtificial
SequenceDescription of Artificial Sequence Synthetic primer
66ttaaatgttg ataaggaagc tcttttcaat gaagttaagg tagca
456753DNAArtificial SequenceDescription of Artificial Sequence Synthetic
primer 67aaaaaagctt ataattatcc ttagctctct tcaatgtgcg cccagatagg gtg
536860DNAArtificial SequenceDescription of Artificial Sequence
Synthetic primer 68cagattgtac aaatgtggtg ataacagata agtcttcaat
gataacttac ctttctttgt 606949DNAArtificial SequenceDescription of
Artificial Sequence Synthetic primer 69tgaacgcaag tttctaattt
cgattagagc tcgatagagg aaagtgtct
49701509DNACaldiscellulosiruptor kristjanssonii 70atgtatttta ttggaattga
cgttggaaca tctggaacaa agacaatcct gattgactca 60aaaggtaaga ttctggcttc
tgcaaccttt gaatatcctc tttatcagcc tcagattggc 120tgggctgagc aaaatcccga
agactggtgg gatgcaagcg taaaaggaat taaagctgtg 180cttgaaaagt caaaagtaga
ccccaaggaa gttaaggctg tgggacttac cgggcagatg 240cacgggcttg tgctgcttga
caaaaactac aacgttataa gaccatcaat catctggtgt 300gaccagagaa cggcaaaaga
atgtgatgaa ataacagaaa aggttggcaa ggaaaagctt 360gtggagatta cagcaaaccc
tgcactgaca ggttttacag cgtccaagat tctgtgggtg 420agaaacaacg agccccaaaa
ctatgagaag gtctacaaaa ttttgcttcc caaagactat 480ataaggttta aacttacagg
cgagtttgca acagatgtgt cggacgcctc gggtatgcag 540cttttggaca ttaaaaacag
gtgctggtct gatgaggtac ttgaaaagct tgagatagac 600aaagggcttc ttggaaaagt
ctatgagtcg ccagaggtaa cgggaaaagt tagcgggcaa 660gcaagcgaac ttacaggtct
ttgtgaaggt acgcttgttg ttgcaggtgg aggagaccag 720gcagcaggtg cagttggaaa
tggcatagta aagacgggtg tgatttcatc tacaattggt 780tcgtctggcg ttgtttttgc
ccatcttgac gagtttaaga ttgacccaca gggaagggtt 840cacacatttt gtcatgcagt
gccgggaaaa tggcatgtga tgggtgtaac acaaggtgcc 900ggactttctc tcaagtggtt
tagagacaac tttgcacaca tcgaaaaggc tgcgtttgag 960tttattgaca aagacccata
cattttgatg gaccaggagg cagaacttgc aaacccaggc 1020gcagacggac ttgttttcct
gccatatttg atgggggaaa gaacgcccat tttggaccca 1080tacgccaaag gaatattctt
tggaataaca gcaaagcata cacgaagaga gttcattaga 1140gctgtcatgg aaggtgttgt
attttcactt aaaaactgtc ttgatatttt gtatgagatg 1200ggcatcgagg tgaaggaggt
aagagtttca ggcggtggtg caaagagcaa gctctggaga 1260cagatgcagg cagacatatt
tgagatggat gtatggacac tgaattccaa agaaggacct 1320gcgtttggtg cagctatcct
ggcagcagtt ggtgcaggag aatatcagaa ggttgaagaa 1380gcctgtgata ctatgattca
aaaggtagat aactgcagcc caaatgaaaa actatttgaa 1440atatatagaa aaacttataa
actttacaac agtatatatc caagagttaa ggacttattc 1500aacatgtaa
1509711317DNACaldiscellulosiruptor kristjanssonii 71atgaaatact tcaaagacat
tccagaagta aaatatgaag gaccacagtc agacaatcca 60tttgctttca agtattacaa
tcctgacgag gttattgatg gcaagccttt aaaagaccac 120cttcgttttg caattgctta
ctggcacacg ttctgtgcaa ccggtagcga cccttttgga 180caacctacaa ttaatcgtcc
atgggacagg ttctcaaacc caatggacaa tgcaaaagca 240agagttgaag ctgcatttga
attttttgaa aagctaaatg ttccattttt ctgcttccac 300gacagagaca tcgcacctga
aggagaaaat ttaagagaaa caaacaagaa tttggatgag 360atagtctcta tgataaaaga
atatttaaag acaagcaaaa caagagtttt gtggggaaca 420gcaaacctat tttcacatcc
gcgatatgtt catggtgctg caacatcctg caatgccgat 480gtttttgcgt atgcagcagc
gcaggtgaaa aaggcgttag aggttacaaa agagcttggc 540ggcgaaaact atgtgttctg
gggcggaaga gaaggttatg agacactttt gaacaccgac 600atggagcttg agcttgacaa
cttggcaaga tttttgcaca tggcagttga ctatgcaaaa 660gagatagggt ttgacggtca
gtttttgatt gaaccaaagc caaaagaacc aactaagcat 720cagtacgatt ttgatgccgc
tcatgtttat ggatttttga aaaaatatga ccttgacaag 780tacttcaagc tcaacataga
ggtaaaccat gcgactttgg caggacatga tttccaccat 840gagttgagat ttgcacgaat
aaacaacatg cttggctcaa ttgatgctaa catgggcgac 900ttgcttttgg gctgggatac
agaccagttc ccaacagatg taagacttac cacgcttgct 960atgtatgagg ttattaaagc
tggtggcttt gacaaaggcg ggctcaactt tgacgcaaag 1020gtaagaagag gttcttttga
gcttgaagac ttggtcattg gtcacattgc agggatggac 1080gcttttgcaa aaggatttaa
gatagcatat aagcttgtca aagacggtgt atttgataag 1140tttattgaag aaaggtatag
aagctacaaa gaaggaattg gagctaagat tgtaagtggt 1200caggcagatt ttaagacgtt
agaagaatat gctttgaatc tttcaaagat agaaaacaaa 1260tctggcaagc aagagcttct
tgagatgatt ttgaacaaat atatgttcag tgaataa 1317721536DNAClostridium
straminisolvens 72ttgtcatatt tactgggagt agatataggt acatcaggca cgaaaactgt
tttatatgat 60gaactgggaa ataccgtagc aagcagcctt gaggaatatc cattgtacca
gccccatatt 120gggtgggcag agcaggaacc ggaagactgg tggagggcaa catgcctatc
tatcaaacat 180gttatttcca aaagaggaat tgatgcttcc tctattaagg gaatcggact
ttcaggacag 240atgcacgggg ctgttcttct ggacaaagac ggcaaagtgc taagaaaagc
aattatatgg 300tgtgaccaga gaagttttgc cgagtgcgag cagattactt caattatagg
gaaggaaagg 360ctcgttgaga taactgccaa ccctgcactg acgggattta cagcatcaaa
ggttatgtgg 420gttaaaaata atgaacctga aatttttgag aagatttata agatacttct
ccctaaagac 480tatataagat ataaattaac gggagaattt gctacagagg tatctgatgc
cagtggaatg 540cagtttatgg atataccggg gagaaaatgg agcgacgaag tcataagtaa
actcggactt 600gataaaagca tgctgggaga actctatgag tctcaggaag ttagcgggaa
agtgaataag 660tatgctgctt cattaaccgg acttaaggaa ggaactcctg tcgtgggtgg
agcaggagac 720caggcagcag gagctgtcgg taatggaatt gtgagacccg gggtggtttc
atccactata 780ggaacttcag gagtagtatt tgcattctct gaaaaggtta ctattgatcc
aaagggtaga 840gttcatactt tttgtcatgc ggtaccaaat acctggcaca ttatgggggt
tacacaaggg 900gccgggctgt ctcttaagtg gttccgtgac aatttctgta tagaagaaaa
gagaactgca 960gagctaatga aaatagaccc gtacataatt atggataaag aagctgaaaa
agtggctccg 1020ggctgtaacg gtttaatcta tttaccttat ctgatgggag aaagaacgcc
acatcttgac 1080cctaatgcca agggtgtctt tttcggatta acagcaaagc atgaaaaaca
ggatatctta 1140aggtcgatta tggaaggtgt tgtatatagc cttagagatt gccttgaaat
tattgaggaa 1200atgggtgtta acgtttctga agtaagagct tccggtggag gcggtaaaag
tgaattgtgg 1260agaaaaatgc aggcggatat attcggcact gatattacaa ccgtaaagtc
aagtgaggga 1320ccggcacttg gggtagcact tcttgccgga gtaggaacgg gtgtgtacaa
caacattaat 1380gaagcatgtg aagcagtaat aaaagaaaat acccggcagg cttcggaccc
ggagctatat 1440gtaaaataca cgaagtttta tgatatttat aaacgtctgt ataactcttt
gaaaaaggaa 1500tttgcagacc tttcggctat gctgcaaagt ttatag
1536731320DNAClostridium straminisolvens 73atggcagagt
attttaaaaa tgtaccgaaa atcaaatatg aaggaaagga ttcggacaat 60cctttagcgt
ttaagtacta taatcccgat gaggtcattg gcggtaaaac aatgaaagag 120catctaaggt
ttgctgttgc atattggcat acatatcagg gaacgggtgc agacccattt 180gggccgggta
ctgctgtaag accgtgggat gacatatcgg acccaatgga tcttgcaaag 240gccaaagtgg
ccgcaaattt cgagctgtgt gaaaaattgg gagtaccatt tttctgcttc 300catgacagag
atattgcgcc tgaagcttca actttaagag agaccaataa aagacttgat 360gagattgttg
cactgataaa ggactatatg aaaacaagta gtgtaaaact actctggggt 420acaacaaatg
cttttagcca cccaaggttt gtccatggtg catctacttc tccgaatgca 480gatgtatttg
catatgcagc agctcaggtt aaaaaggcta tggaaattac cctggaactt 540ggcggtcaga
actatgtgtt ctggggtgga agagaaggct atgaaacctt acttaatact 600gatatgaaat
tggagcttga caatatggca aggttccttg gaatggcagt tgactatgca 660aaagagattg
gttttaaagg gcagctcttg attgaaccta agccaaaaga accgacaaag 720caccagtatg
actttgatac agctacagtt atcggtttct taaggactta tggtcttgag 780aattacttca
aaatgaatat tgaagcaaat cacgctacac ttgcagctca tactttccag 840catgaactta
gggtttcaag aattaacggt gtgctaggaa gtatcgatgc aaaccagggt 900gatcttcttt
taggatggga cactgaccaa ttcccgacaa atatctacga tactaccctt 960gctatgtatg
aagtaattaa ggcaggcgga tttacaacgg gaggtctgaa ttttgattct 1020aaagtcagaa
gaggatcatt tgagcctgtg gacctgttct atgcacatat tgcaggtatg 1080gacgcttttg
caaaaggatt taaaatagca tataaaatgg tttccgacgg taagtttgac 1140aaatttattg
atgaaagata tgaaagctat aagagcggta ttggaaaaga tattgtagat 1200ggaaaagtag
ggtttaaaga gcttgaaaaa tatgctttag agcttgatgg tatcaagaat 1260gtgtcgggaa
gacaggaagt tctcgaagct atgttaaaca aatatattct tgaggactag
1320748909DNAArtificial SequenceDescription of Artificial Sequence
Synthetic polynucleotide 74catgagacaa taaccctgat aaatgcttca
ataatattga aaaaggaaga gtatccagta 60ttctgacatg ggtgtatcaa taacccatgc
gtttccgtat tgtatcggaa tggtttcgga 120cagggcggtg ggaatagaca tggaaaagat
ttttttgccc gaggatgcat tgataaagta 180tttcttttcc gaaagagagg aaaagattct
aaagagtttt ggaaatactg atgaatattg 240tgtgcagagt acaattctat ggacaagaaa
agaggctttg tcaaaacttt ttcgtctggg 300aatgaggatg gattttaaaa agctggatac
tttggaggac gaggtggttt ttcaggaaac 360aaacagggcg cgtctgtttt cttttatatg
caataattac tgtatctctc tggcattgcc 420aggttttaat aaagattaaa attattgact
agaaataaaa aaattgtcca taatattaat 480ggacaaaaaa acaaagaatt acatcaaagg
aagataaaaa tactttgtta aaaaattaat 540tattttttat ctaaactatt gaaaatgaaa
ataaaataat ataaaatgaa tcatagtgca 600agagatactt gccagaggat gaatatttta
ctgcattcat gctttatggc agctaataga 660ggcattaaat taaattttaa tttacaatag
gaggcgatat taatgaataa atattttgag 720aacgtatcta aaataaaata tgaaggacca
aaatcaaaca atccttattc ttttaaattt 780tacaatccag aagaagtaat cgatggcaag
acgatggagg agcatctacg cttttctata 840gcttactggc acacttttac tgctgatgga
acagatcaat ttggcaaagc taccatgcaa 900agaccatgga accactacac agatcctatg
gacatagcaa aggcaagggt agaagcagca 960tttgagtttt ttgataagat aaatgcacct
ttcttctgct tccatgacag ggatattgca 1020cctgaaggag acactcttag agagacaaac
aaaaacttag atacaatagt tgccatgata 1080aaggattact tgaagaccag caagacgaaa
gttttgtggg gcaccgcaaa tcttttctcc 1140aatccgagat ttgtacatgg tgcatcaaca
tcctgcaatg ctgatgtttt cgcatattct 1200gcagctcaag ttaaaaaagc tcttgagatt
actaaggagc ttggcggcga aaactacgta 1260ttctggggtg gcagagaagg atatgaaaca
cttctcaata cagacatgga gtttgagctt 1320gacaactttg caagattttt gcacatggct
gttgactacg cgaaggaaat cggctttgaa 1380ggccagttct tgattgagcc gaagccaaag
gagcctacga aacaccaata cgactttgac 1440gtggcaaatg tattggcatt cttgagaaaa
tacggccttg acaaatattt caaagtgaat 1500atcgaggcaa accatgcgac attggcattc
cacgacttcc aacatgagct aagatacgcc 1560agaataaacg gtgtattagg atcaattgac
gcaaatacag gcgatatgct tttaggatgg 1620gatacagacc agttccctac agatatacgc
atgacaacgc ttgctatgta tgaagtcata 1680aagatgggtg gatttgacaa aggcggcctt
aacttcgatg caaaagtaag acgtgcttca 1740tttgaaccag aagatctttt cttaggtcat
atagccggaa tggatgcctt tgcaaaaggc 1800ttcaaagttg cttacaaact tgtgaaagat
ggcgtatttg acaagttcat cgaggaaaga 1860tacgcaagct acaaagacgg cattggcgct
gacattgtaa gcgggaaagc tgacttcaag 1920agccttgaaa agtacgcatt agagcacagc
cagattgtca acaaatcagg caggcaagag 1980ctgttagaat caatcctaaa tcagtatttg
tttgcagaat aatgaaacat gagggcggct 2040tcatgcttca ttaaagctgc cctcaacaaa
aatcatggag gtaaatgtat gtatttttta 2100gggatagatt tagggacatc atcagttaag
ataatactga tgaatgaaag cggcaatgtg 2160gtatcaagcg tttcaaaaga atatcctgtg
tactatccag agccaggctg ggctgagcaa 2220aatccagaag attggtggaa tggcacaagg
gatggaataa gagagattat tgcgaaaagc 2280ggcgtaaatg gcgatgaaat aaagggtgtt
ggcttaagcg ggcagatgca tggactggtg 2340cttttagaca aagacaataa cgttttaacg
ccagccatac tttggtgtga ccagaggaca 2400caggaagaat gcgactacat cacagagaaa
ataggaaaag aaggcctttt gaagtacaca 2460gggaataaag cattgacagg ttttactgca
ccaaagatat tatgggtaaa gaagcacctt 2520aaagacgtat atgaaagaat cgctcatatc
cttttgccaa aagattatat aaggtttaaa 2580ttgacaggtg agtacgctac agaagtttca
gatgcatcag gtacacttct tttcgatgtg 2640gaaaatagaa gatggtcaaa ggaaatgata
gacatatttg aaataccgga aaaagccctt 2700cctaagtgct acgaatcaac agatgtcaca
gggtatgtca ccaaagaggc agcagatttg 2760acagggcttc atgaagggac tattgtcgta
ggcggtggtg gtgaccaagc cagcggcgct 2820gtaggcactg gcacggtgaa aagcggcata
gtgtccatcg cattaggaac ttcaggcgtc 2880gtatttgcat cacaggacaa gtacgcagca
gatgatgagc ttaggcttca ctcattctgc 2940catgcaaacg gcaaatggca tgtgatgggt
gtcatgcttt cggctgcatc atgtcttaaa 3000tggtgggtag atgatgtaaa taattacaag
accgatgtta tgacatttga tggactctta 3060gaagaagcag agaaggtgaa gccaggcagt
gatggattga tattcttgcc atacctgatg 3120ggtgaaagga ccccttacag cgatccttat
gcgagaggca gctttgtagg tttaacaatt 3180acacacaata gaagccacat gacaagatct
atattagaag gcgtcgcatt tggacttagg 3240gattcgctgg agcttataaa ggctttaaat
atacctgtaa atgaagccag ggtaagtggt 3300ggtggtgcta aaagcaggct ttggaggcaa
atacttgccg atgtattcaa tgtaaggata 3360gacatgataa atgctacaga aggaccttca
tttggtgcag caataatggc gtctgtggga 3420tatggccttt acaaaaatgt agatgatgca
tgcaatagtt taataaaagt tacagacagc 3480gtatatccaa tcaaagaaaa cgtcgaaaag
tacaacaaac tgtatccaat ctacgtgagc 3540ttgtattcaa ggcttaaagg cgcctttgaa
gaaattggga agttggattt gtaaaataaa 3600ttcatttgga aataaattta tgacagtaca
agggacattg attaacaaag cttcaggtta 3660ataatagtaa agttaatatt tgctatgaaa
tgaaagcata ataatctgtt ccttgtactt 3720tgctttatca tgtttattta agatactaat
taataaaagt caatttagcc aataataaaa 3780tcctatatat agtaaatatt tacaataaaa
tcactacaaa ataaaaaact ttatttaatc 3840tcttaaaaat atctacataa gggggtgtta
gatgaaaaag gccgtaatca tggtcatagc 3900tgtttcctgt gtgaaattgt tatccgctca
caattccaca caacatacga gccggaagca 3960taaagtgtaa agcctggggt gcctaatgag
tgagctaact cacattaatt gcgttgcgct 4020cactgcccgc tttccagtcg ggaaacctgt
cgtgccagcc cttcaaactt cccaaaggcg 4080agccctagtg acattagaaa accgactgta
aaaagtacag tcggcattat ctcatattat 4140aaaagccagt cattaggcct atctgacaat
tcctgaatag agttcataaa caatcctgca 4200tgataaccat cacaaacaga atgatgtacc
tgtaaagata gcggtaaata tattgaatta 4260cctttattaa tgaattttcc tgctgtaata
atgggtagaa ggtaattact attattattg 4320atatttaagt taaacccagt aaatgaagtc
catggaataa tagaaagaga aaaagcattt 4380tcaggtatag gtgttttggg aaacaatttc
cccgaaccat tatatttctc tacatcagaa 4440aggtataaat cataaaactc tttgaagtca
ttctttacag gagtccaaat accagagaat 4500gttttagata caccatcaaa aattgtataa
agtggctcta acttatccca ataacctaac 4560tctccgtcgc tattgtaacc agttctaaaa
gctgtatttg agtttatcac ccttgtcact 4620aagaaaataa atgcagggta aaatttatat
ccttcttgtt ttatgtttcg gtataaaaca 4680ctaatatcaa tttctgtggt tatactaaaa
gtcgtttgtt ggttcaaata atgattaaat 4740atctcttttc tcttccaatt gtctaaatca
attttattaa agttcatttg atatgcctcc 4800taaattttta tctaaagtga atttaggagg
cttacttgtc tgctttcttc attagaatca 4860atcctttttt aaaagtcaat cccgtttgtt
gaactactct ttaataaaat aatttttccg 4920ttcccaattc cacattgcaa taatagaaaa
tccatcttca tcggcttttt cgtcatcatc 4980tgtatgaatc aaatcgcctt cttctgtgtc
atcaaggttt aattttttat gtatttcttt 5040taacaaacca ccataggaga ttaacctttt
acggtgtaaa ccttcctcca aatcagacaa 5100acgtttcaaa ttcttttctt catcatcggt
cataaaatcc gtatccttta caggatattt 5160tgcagtttcg tcaattgccg attgtatatc
cgatttatat ttatttttcg gtcgaatcat 5220ttgaactttt acatttggat catagtctaa
tttcattgcc tttttccaaa attgaatcca 5280ttgtttttga ttcacgtagt tttctgtatt
cttaaaataa gttggttcca cacataccaa 5340tacatgcatg tgctgattat aagaattatc
tttattattt attgtcactt ccgttgcacg 5400cataaaacca acaagatttt tattaatttt
tttatattgc atcattcggc gaaatccttg 5460agccatatct gacaaactct tatttaattc
ttcgccatca taaacatttt taactgttaa 5520tgtgagaaac aaccaacgaa ctgttggctt
ttgtttaata acttcagcaa caaccttttg 5580tgactgaatg ccatgtttca ttgctctcct
ccagttgcac attggacaaa gcctggattt 5640acaaaaccac actcgataca actttctttc
gcctgtttca cgattttgtt tatactctaa 5700tatttcagca caatctttta ctctttcagc
ctttttaaat tcaagaatat gcagaagttc 5760aaagtaatca acattagcga ttttcttttc
tctccatggt ctcacttttc cactttttgt 5820cttgtccact aaaacccttg atttttcatc
tgaataaatg ctactattag gacacataat 5880attaaaagaa acccccatct atttagttat
ttgtttggtc acttataact ttaacagatg 5940gggtttttct gtgcaaccaa ttttaagggt
tttcaatact ttaaaacaca tacataccaa 6000cacttcaacg cacctttcag caactaaaat
aaaaatgacg ttatttctat atgtatcaag 6060aatagaaaga actcgttttt cgctacgctc
aaaacgcaaa aaaagcactc attcgagtgc 6120tttttcttat cgctccaaat catgcgattt
tttcctcttt gcttttcttt gctcacgaag 6180ttctcgatca cgctgcaaaa catcttgaag
cgaaaaagta ttcttctttt cttccgatcg 6240ctcatgctga cgcacgaaaa gccctctagg
cgcataggaa caactcctaa atgcatgtga 6300ggggttttct cgtccatgtg aacagtcgca
tacgcaatat tttgtttccc atactgcatt 6360aatgaatcgg ccaacgcgcg gggagaggcg
gtttgcgtat tgggcgctct tccgcttcct 6420cgctcactga ctcgctgcgc tcggtcgttc
ggctgcggcg agcggtatca gctcactcaa 6480aggcggtaat acggttatcc acagaatcag
gggataacgc aggaaagaac atgtgagcaa 6540aaggccagca aaaggccagg aaccgtaaaa
aggccgcgtt gctggcgttt ttccataggc 6600tccgcccccc tgacgagcat cacaaaaatc
gacgctcaag tcagaggtgg cgaaacccga 6660caggactata aagataccag gcgtttcccc
ctggaagctc cctcgtgcgc tctcctgttc 6720cgaccctgcc gcttaccgga tacctgtccg
cctttctccc ttcgggaagc gtggcgcttt 6780ctcatagctc acgctgtagg tatctcagtt
cggtgtaggt cgttcgctcc aagctgggct 6840gtgtgcacga accccccgtt cagcccgacc
gctgcgcctt atccggtaac tatcgtcttg 6900agtccaaccc ggtaagacac gacttatcgc
cactggcagc agccactggt aacaggatta 6960gcagagcgag gtatgtaggc ggtgctacag
agttcttgaa gtggtggcct aactacggct 7020acactagaag aacagtattt ggtatctgcg
ctctgctgaa gccagttacc ttcggaaaaa 7080gagttggtag ctcttgatcc ggcaaacaaa
ccaccgctgg tagcggtggt ttttttgttt 7140gcaagcagca gattacgcgc agaaaaaaag
gatctcaaga agatcctttg atcttttcta 7200cggggatcgc ttgcctgtaa cttacacgcg
cctcgtatct tttaatgatg gaataatttg 7260ggaatttact ctgtgtttat ttatttttat
gttttgtatt tggattttag aaagtaaata 7320aagaaggtag aagagttacg gaatgaagaa
aaaaaaataa acaaaggttt aaaaaatttc 7380aacaaaaagc gtactttaca tatatattta
ttagacaaga aaagcagatt aaatagatat 7440acattcgatt aacgataagt aaaatgtaaa
atcacaggat tttcgtgtgt ggtcttctac 7500acagacaaga tgaaacaatt cggcattaat
acctgagagc aggaagagca agataaaagg 7560tagtatttgt tggcgatccc cctagagtct
tttacatctt cggaaaacaa aaactatttt 7620ttctttaatt tcttttttta ctttctattt
ttaatttata tatttatatt aaaaaattta 7680aattataatt atttttatag cacgtgatga
aaaggaccca tcgataagct agcttttcaa 7740ttcaattcat catttttttt ttattctttt
ttttgatttc ggtttctttg aaattttttt 7800gattcggtaa tctccgaaca gaaggaagaa
cgaaggaagg agcacagact tagattggta 7860tatatacgca tatgtagtgt tgaagaaaca
tgaaattgcc cagtattctt aacccaactg 7920cacagaacaa aaacctgcag gaaacgaaga
taaatcatgt cgaaagctac atataaggaa 7980cgtgctgcta ctcatcctag tcctgttgct
gccaagctat ttaatatcat gcacgaaaag 8040caaacaaact tgtgtgcttc attggatgtt
cgtaccacca aggaattact ggagttagtt 8100gaagcattag gtcccaaaat ttgtttacta
aaaacacatg tggatatctt gactgatttt 8160tccatggagg gcacagttaa gccgctaaag
gcattatccg ccaagtacaa ttttttactc 8220ttcgaagaca gaaaatttgc tgacattggt
aatacagtca aattgcagta ctctgcgggt 8280gtatacagaa tagcagaatg ggcagacatt
acgaatgcac acggtgtggt gggcccaggt 8340attgttagcg gtttgaagca ggcggcagaa
gaagtaacaa aggaacctag aggccttttg 8400atgttagcag aattgtcatg caagggctcc
ctatctactg gagaatatac taagggtact 8460gttgacattg cgaagagcga caaagatttt
gttatcggct ttattgctca aagagacatg 8520ggtggaagag atgaaggtta cgattggttg
attatgacac ccggtgtggg tttagatgac 8580aagggagacg cattgggtca acagtataga
accgtggatg atgtggtctc tacaggatct 8640gacattatta ttgttggaag aggactattt
gcaaagggaa gggatgctaa ggtagagggt 8700gaacgttaca gaaaagcagg ctgggaagca
tatttgagaa gatgcggcca gcaaaactaa 8760aaaactgtat tataagtaaa tgcatgtata
ctaaactcac aaattagagc ttcaatttaa 8820ttatatcagt tattacccac ttttcgggga
aatgtgcgcg gaacccctat ttgtttattt 8880ttctaaatac attcaaatat gtatccgct
8909756972DNAArtificial
SequenceDescription of Artificial Sequence Synthetic polynucleotide
75gacgctcagt ggaacgaaaa ctcacgttaa gggattttgg tcatgagatt atcaaaaagg
60atcttcacct agatcctttt aaattaaaaa tgaagtttta aatcaatcta aagtatatat
120gagtaaactt ggtctgacag ttaccaatgc ttaatcagtg aggcacctat ctcagcgatc
180tgtcaattcg agctcggtac ccggggatcc ttaagaagac taataaaaag tttctaaaag
240catgaaatat cctgtatttt ggagttactt gccgttattt atggctaaag ccgaaaaaaa
300gataaggagg gtgttgtata caaaaaacat tttgtatata attagagttg cttggaaccc
360agtaaaatag ctgtttatag tgagtaggta ttattacctg atgagttata ctgtcggtac
420ctatttagcg gagcggcttg tccagattgg tctcaagcat cacttcgcag tcgcgggcga
480ctacaacctc gtccttcttg acaacctgct tttgaacaaa aacatggagc aggtttattg
540ctgtaacgaa ctgaactgcg gtttcagtgc agaaggttat gctcgtgcca aaggcgcagc
600agcagccgtc gttacctaca gcgtcggtgc gctttccgca tttgatgcta tcggtggcgc
660ctatgcagaa aaccttccgg ttatcctgat ctccggtgct ccgaacaaca atgatcacgc
720tgctggtcac gtgttgcatc acgctcttgg caaaaccgac tatcactatc agttggaaat
780ggccaagaac atcacggccg ccgctgaagc gatttacacc ccggaagaag ctccggctaa
840aatcgatcac gtgattaaaa ctgctcttcg tgagaagaag ccggtttatc tcgaaatcgc
900ttgcaacatt gcttccatgc cctgcgccgc tcctggaccg gcaagcgcat tgttcaatga
960cgaagccagc gacgaagctt ctttgaatgc agcggttgaa gaaaccctga aattcatcgc
1020caaccgcgac aaagttgccg tcctcgtcgg cagcaagctg cgcgcagctg gtgctgaaga
1080agctgctgtc aaatttgctg atgctctcgg tggcgcagtt gctaccatgg ctgctgcaaa
1140aagcttcttc ccagaagaaa acccgcatta catcggcacc tcatggggtg aagtcagcta
1200tccgggcgtt gaaaagacga tgaaagaagc cgatgcggtt atcgctctgg ctcctgtctt
1260caacgactac tccaccactg gttggacgga tattcctgat cctaagaaac tggttctcgc
1320tgaaccgcgt tctgtcgtcg ttaacggcat tcgcttcccc agcgtccatc tgaaagacta
1380tctgacccgt ttggctcaga aagtttccaa gaaaaccggt gcattggact tcttcaaatc
1440cctcaatgca ggtgaactga agaaagccgc tccggctgat ccgagtgctc cgttggtcaa
1500cgcagaaatc gcccgtcagg tcgaagctct tctgaccccg aacacgacgg ttattgctga
1560aaccggtgac tcttggttca atgctcagcg catgaagctc ccgaacggtg ctcgcgttga
1620atatgaaatg cagtggggtc acattggttg gtccgttcct gccgccttcg gttatgccgt
1680cggtgctccg gaacgtcgca acatcctcat ggttggtgat ggttccttcc agctgacggc
1740tcaggaagtc gctcagatgg ttcgcctgaa actgccggtt atcatcttct tgatcaataa
1800ctatggttac accatcgaag ttatgatcca tgatggtccg tacaacaaca tcaagaactg
1860ggattatgcc ggtctgatgg aagtgttcaa cggtaacggt ggttatgaca gcggtgctgg
1920taaaggcctg aaggctaaaa ccggtggcga actggcagaa gctatcaagg ttgctctggc
1980aaacaccgac ggcccaaccc tgatcgaatg cttcatcggt cgtgaagact gcactgaaga
2040attggtcaaa tggggtaagc gcgttgctgc cgccaacagc cgtaagcctg ttaacaagct
2100cctctagatt ctgttaaaac cggacattga agaaggtgtt gcgcagcgtt taataaaaac
2160atctgtttat cgaagtttag gaataggaaa attaaaaaaa acaagacggg agtgagtttt
2220tgaaatggct tcttcaactt tttatattcc tttcgtcaac gaaatgggcg aaggttcgct
2280tgaaaaagca atcaaggatc ttaacggcag cggctttaaa aatgcgctga tcgtttctga
2340tgctttcatg aacaaatccg gtgttgtgaa gcaggttgct gacctgttga aagcacaggg
2400tattaattct gctgtttatg atggcgttat gccgaacccg actgttaccg cagttctgga
2460aggccttaag atcctgaagg ataacaattc agacttcgtc atctccctcg gtggtggttc
2520tccccatgac tgcgccaaag ccatcgctct ggtcgcaacc aatggtggtg aagtcaaaga
2580ctacgaaggt atcgacaaat ctaagaaacc tgccctgcct ttgatgtcaa tcaacacgac
2640ggctggtacg gcttctgaaa tgacgcgttt ctgcatcatc actgatgaag tccgtcacgt
2700taagatggcc attgttgacc gtcacgttac cccgatggtt tccgtcaacg atcctctgtt
2760gatggttggt atgccaaaag gcctgaccgc cgccaccggt atggatgctc tgacccacgc
2820atttgaagct tattcttcaa cggcagctac tccgatcacc gatgcttgcg ctttgaaagc
2880agcttccatg atcgctaaga atctgaagac cgcttgcgac aacggtaagg atatgccggc
2940tcgtgaagct atggcttatg cccaattcct cgctggtatg gccttcaaca acgcttcgct
3000tggttatgtc catgctatgg ctcaccagtt gggcggttac tacaacctgc cgcatggtgt
3060ctgcaacgct gttctgcttc cgcatgttct ggcttataac gcctctgtcg ttgctggtcg
3120tctgaaagac gttggtgttg ctatgggtct cgatatcgcc aatctcggtg ataaagaagg
3180cgcagaagcc accattcagg ctgttcgcga tctggctgct tccattggta ttccagcaaa
3240cctgaccgag ctgggtgcta agaaagaaga tgtgccgctt cttgctgacc acgctctgaa
3300agatgcttgt gctctgacca acccgcgtca gggtgatcag aaagaagttg aagaactctt
3360cctgagcgct ttctaaaaga tgcgttataa ttttacaagc ctgttttttt aggaaacggg
3420cttataaaat ttttttattt ttttgccggg ttttttcttg tattaatatg tggaatatgt
3480taataatatt aagaagaaat tccgaattta actaaacaaa attatttttg ttatttaagc
3540caatctgtca tataattctt gacatgaggg ttattagtta gtataatagt ccttgtcggt
3600tttaagaggg atcctctaga gtcgacctgc aggcatgcaa gcttggcgta atcatggtca
3660tagctgtttc ctgtgtgaaa ttgttatccg ctcacaattc cacacaacat acgagccgga
3720agcataaagt gtaaagcctg gggtgcctaa tgagtgagct aactcacatt aattgcgttg
3780cgctcactgc ccgctttcca gtcgggaaac ctgtcgtgcc agcccttcaa acttcccaaa
3840ggcgagccct agtgacatta gaaaaccgac tgtaaaaagt acagtcggca ttatctcata
3900ttataaaagc cagtcattag gcctatctga caattcctga atagagttca taaacaatcc
3960tgcatgataa ccatcacaaa cagaatgatg tacctgtaaa gatagcggta aatatattga
4020attaccttta ttaatgaatt ttcctgctgt aataatgggt agaaggtaat tactattatt
4080attgatattt aagttaaacc cagtaaatga agtccatgga ataatagaaa gagaaaaagc
4140attttcaggt ataggtgttt tgggaaacaa tttccccgaa ccattatatt tctctacatc
4200agaaaggtat aaatcataaa actctttgaa gtcattcttt acaggagtcc aaataccaga
4260gaatgtttta gatacaccat caaaaattgt ataaagtggc tctaacttat cccaataacc
4320taactctccg tcgctattgt aaccagttct aaaagctgta tttgagttta tcacccttgt
4380cactaagaaa ataaatgcag ggtaaaattt atatccttct tgttttatgt ttcggtataa
4440aacactaata tcaatttctg tggttatact aaaagtcgtt tgttggttca aataatgatt
4500aaatatctct tttctcttcc aattgtctaa atcaatttta ttaaagttca tttgatatgc
4560ctcctaaatt tttatctaaa gtgaatttag gaggcttact tgtctgcttt cttcattaga
4620atcaatcctt ttttaaaagt caatcccgtt tgttgaacta ctctttaata aaataatttt
4680tccgttccca attccacatt gcaataatag aaaatccatc ttcatcggct ttttcgtcat
4740catctgtatg aatcaaatcg ccttcttctg tgtcatcaag gtttaatttt ttatgtattt
4800cttttaacaa accaccatag gagattaacc ttttacggtg taaaccttcc tccaaatcag
4860acaaacgttt caaattcttt tcttcatcat cggtcataaa atccgtatcc tttacaggat
4920attttgcagt ttcgtcaatt gccgattgta tatccgattt atatttattt ttcggtcgaa
4980tcatttgaac ttttacattt ggatcatagt ctaatttcat tgcctttttc caaaattgaa
5040tccattgttt ttgattcacg tagttttctg tattcttaaa ataagttggt tccacacata
5100ccaatacatg catgtgctga ttataagaat tatctttatt atttattgtc acttccgttg
5160cacgcataaa accaacaaga tttttattaa tttttttata ttgcatcatt cggcgaaatc
5220cttgagccat atctgacaaa ctcttattta attcttcgcc atcataaaca tttttaactg
5280ttaatgtgag aaacaaccaa cgaactgttg gcttttgttt aataacttca gcaacaacct
5340tttgtgactg aatgccatgt ttcattgctc tcctccagtt gcacattgga caaagcctgg
5400atttacaaaa ccacactcga tacaactttc tttcgcctgt ttcacgattt tgtttatact
5460ctaatatttc agcacaatct tttactcttt cagccttttt aaattcaaga atatgcagaa
5520gttcaaagta atcaacatta gcgattttct tttctctcca tggtctcact tttccacttt
5580ttgtcttgtc cactaaaacc cttgattttt catctgaata aatgctacta ttaggacaca
5640taatattaaa agaaaccccc atctatttag ttatttgttt ggtcacttat aactttaaca
5700gatggggttt ttctgtgcaa ccaattttaa gggttttcaa tactttaaaa cacatacata
5760ccaacacttc aacgcacctt tcagcaacta aaataaaaat gacgttattt ctatatgtat
5820caagaataga aagaactcgt ttttcgctac gctcaaaacg caaaaaaagc actcattcga
5880gtgctttttc ttatcgctcc aaatcatgcg attttttcct ctttgctttt ctttgctcac
5940gaagttctcg atcacgctgc aaaacatctt gaagcgaaaa agtattcttc ttttcttccg
6000atcgctcatg ctgacgcacg aaaagccctc taggcgcata ggaacaactc ctaaatgcat
6060gtgaggggtt ttctcgtcca tgtgaacagt cgcatacgca atattttgtt tcccatactg
6120cattaatgaa tcggccaacg cgcggggaga ggcggtttgc gtattgggcg ctcttccgct
6180tcctcgctca ctgactcgct gcgctcggtc gttcggctgc ggcgagcggt atcagctcac
6240tcaaaggcgg taatacggtt atccacagaa tcaggggata acgcaggaaa gaacatgtga
6300gcaaaaggcc agcaaaaggc caggaaccgt aaaaaggccg cgttgctggc gtttttccat
6360aggctccgcc cccctgacga gcatcacaaa aatcgacgct caagtcagag gtggcgaaac
6420ccgacaggac tataaagata ccaggcgttt ccccctggaa gctccctcgt gcgctctcct
6480gttccgaccc tgccgcttac cggatacctg tccgcctttc tcccttcggg aagcgtggcg
6540ctttctcata gctcacgctg taggtatctc agttcggtgt aggtcgttcg ctccaagctg
6600ggctgtgtgc acgaaccccc cgttcagccc gaccgctgcg ccttatccgg taactatcgt
6660cttgagtcca acccggtaag acacgactta tcgccactgg cagcagccac tggtaacagg
6720attagcagag cgaggtatgt aggcggtgct acagagttct tgaagtggtg gcctaactac
6780ggctacacta gaagaacagt atttggtatc tgcgctctgc tgaagccagt taccttcgga
6840aaaagagttg gtagctcttg atccggcaaa caaaccaccg ctggtagcgg tggttttttt
6900gtttgcaagc agcagattac gcgcagaaaa aaaggatctc aagaagatcc tttgatcttt
6960tctacggggt ct
6972766936DNAArtificial SequenceDescription of Artificial Sequence
Synthetic polynucleotide 76gacgctcagt ggaacgaaaa ctcacgttaa
gggattttgg tcatgagatt atcaaaaagg 60atcttcacct agatcctttt aaattaaaaa
tgaagtttta aatcaatcta aagtatatat 120gagtaaactt ggtctgacag ttaccaatgc
ttaatcagtg aggcacctat ctcagcgatc 180tgtcaattcg agctcggtac ccggggatcc
ttaagaagac taataaaaag tttctaaaag 240catgaaatat cctgtatttt ggagttactt
gccgttattt atggctaaag ccgaaaaaaa 300gataaggagg gtgttgtata caaaaaacat
tttgtatata attagagttg cttggaaccc 360agtaaaatag ctgtttatag tgagtaggta
ttattacctg atgtataccg ttggtatgta 420cttggcagaa cgcctagccc agatcggcct
gaaacaccac tttgccgtgg ccggtgacta 480caacctggtg ttgcttgatc agctcctgct
gaacaaagac atggagcagg tctactgctg 540taacgaactt aactgcggct ttagcgccga
aggttacgct cgtgcacgtg gtgccgccgc 600tgccatcgtc acgttcagcg taggtgctat
ctctgcaatg aacgccatcg gtggcgccta 660tgcagaaaac ctgccggtca tcctgatctc
tggctcaccg aacaccaatg actacggcac 720aggccacatc ctgcaccaca ccattggtac
tactgactat aactatcagc tggaaatggt 780aaaacacgtt acctgcgcac gtgaaagcat
cgtttctgcc gaagaagcac cggcaaaaat 840cgaccacgtc atccgtacgg ctctacgtga
acgcaaaccg gcttatctgg aaatcgcatg 900caacgtcgct ggcgctgaat gtgttcgtcc
gggcccgatc aatagcctgc tgcgtgaact 960cgaagttgac cagaccagtg tcactgccgc
tgtagatgcc gccgtagaat ggctgcagga 1020ccgccagaac gtcgtcatgc tggtcggtag
caaactgcgt gccgctgccg ctgaaaaaca 1080ggctgttgcc ctagcggacc gcctgggctg
cgctgtcacg atcatggctg ccgaaaaagg 1140cttcttcccg gaagatcatc cgaacttccg
cggcctgtac tggggtgaag tcagctccga 1200aggtgcacag gaactggttg aaaacgccga
tgccatcctg tgtctggcac cggtattcaa 1260cgactatgct accgttggct ggaactcctg
gccgaaaggc gacaatgtca tggtcatgga 1320caccgaccgc gtcactttcg caggacagtc
cttcgaaggt ctgtcattga gcaccttcgc 1380cgcagcactg gctgagaaag caccttctcg
cccggcaacg actcaaggca ctcaagcacc 1440ggtactgggt attgaggccg cagagcccaa
tgcaccgctg accaatgacg aaatgacgcg 1500tcagatccag tcgctgatca cttccgacac
tactctgaca gcagaaacag gtgactcttg 1560gttcaacgct tctcgcatgc cgattcctgg
cggtgctcgt gtcgaactgg aaatgcaatg 1620gggtcatatc ggttggtccg taccttctgc
attcggtaac gccgttggtt ctccggagcg 1680tcgccacatc atgatggtcg gtgatggctc
tttccagctg actgctcaag aagttgctca 1740gatgatccgc tatgaaatcc cggtcatcat
cttcctgatc aacaaccgcg gttacgtcat 1800cgaaatcgct atccatgacg gcccttacaa
ctacatcaaa aactggaact acgctggcct 1860gatcgacgtc ttcaatgacg aagatggtca
tggcctgggt ctgaaagctt ctactggtgc 1920agaactagaa ggcgctatca agaaagcact
cgacaatcgt cgcggtccga cgctgatcga 1980atgtaacatc gctcaggacg actgcactga
aaccctgatt gcttggggta aacgtgtagc 2040agctaccaac tctcgcaaac cacaagcgta
aattctgtta aaaccggaca ttgaagaagg 2100tgttgcgcag cgtttaataa aaacatctgt
ttatcgaagt ttaggaatag gaaaattaaa 2160aaaaacaaga cgggagtgag tttttgaaat
ggcttcttca actttttata ttcctttcgt 2220caacgaaatg ggcgaaggtt cgcttgaaaa
agcaatcaag gatcttaacg gcagcggctt 2280taaaaatgcg ctgatcgttt ctgatgcttt
catgaacaaa tccggtgttg tgaagcaggt 2340tgctgacctg ttgaaagcac agggtattaa
ttctgctgtt tatgatggcg ttatgccgaa 2400cccgactgtt accgcagttc tggaaggcct
taagatcctg aaggataaca attcagactt 2460cgtcatctcc ctcggtggtg gttctcccca
tgactgcgcc aaagccatcg ctctggtcgc 2520aaccaatggt ggtgaagtca aagactacga
aggtatcgac aaatctaaga aacctgccct 2580gcctttgatg tcaatcaaca cgacggctgg
tacggcttct gaaatgacgc gtttctgcat 2640catcactgat gaagtccgtc acgttaagat
ggccattgtt gaccgtcacg ttaccccgat 2700ggtttccgtc aacgatcctc tgttgatggt
tggtatgcca aaaggcctga ccgccgccac 2760cggtatggat gctctgaccc acgcatttga
agcttattct tcaacggcag ctactccgat 2820caccgatgct tgcgctttga aagcagcttc
catgatcgct aagaatctga agaccgcttg 2880cgacaacggt aaggatatgc cggctcgtga
agctatggct tatgcccaat tcctcgctgg 2940tatggccttc aacaacgctt cgcttggtta
tgtccatgct atggctcacc agttgggcgg 3000ttactacaac ctgccgcatg gtgtctgcaa
cgctgttctg cttccgcatg ttctggctta 3060taacgcctct gtcgttgctg gtcgtctgaa
agacgttggt gttgctatgg gtctcgatat 3120cgccaatctc ggtgataaag aaggcgcaga
agccaccatt caggctgttc gcgatctggc 3180tgcttccatt ggtattccag caaacctgac
cgagctgggt gctaagaaag aagatgtgcc 3240gcttcttgct gaccacgctc tgaaagatgc
ttgtgctctg accaacccgc gtcagggtga 3300tcagaaagaa gttgaagaac tcttcctgag
cgctttctaa aagatgcgtt ataattttac 3360aagcctgttt ttttaggaaa cgggcttata
aaattttttt atttttttgc cgggtttttt 3420cttgtattaa tatgtggaat atgttaataa
tattaagaag aaattccgaa tttaactaaa 3480caaaattatt tttgttattt aagccaatct
gtcatataat tcttgacatg agggttatta 3540gttagtataa tagtccttgt cggttttaag
agggatcctc tagagtcgac ctgcaggcat 3600gcaagcttgg cgtaatcatg gtcatagctg
tttcctgtgt gaaattgtta tccgctcaca 3660attccacaca acatacgagc cggaagcata
aagtgtaaag cctggggtgc ctaatgagtg 3720agctaactca cattaattgc gttgcgctca
ctgcccgctt tccagtcggg aaacctgtcg 3780tgccagccct tcaaacttcc caaaggcgag
ccctagtgac attagaaaac cgactgtaaa 3840aagtacagtc ggcattatct catattataa
aagccagtca ttaggcctat ctgacaattc 3900ctgaatagag ttcataaaca atcctgcatg
ataaccatca caaacagaat gatgtacctg 3960taaagatagc ggtaaatata ttgaattacc
tttattaatg aattttcctg ctgtaataat 4020gggtagaagg taattactat tattattgat
atttaagtta aacccagtaa atgaagtcca 4080tggaataata gaaagagaaa aagcattttc
aggtataggt gttttgggaa acaatttccc 4140cgaaccatta tatttctcta catcagaaag
gtataaatca taaaactctt tgaagtcatt 4200ctttacagga gtccaaatac cagagaatgt
tttagataca ccatcaaaaa ttgtataaag 4260tggctctaac ttatcccaat aacctaactc
tccgtcgcta ttgtaaccag ttctaaaagc 4320tgtatttgag tttatcaccc ttgtcactaa
gaaaataaat gcagggtaaa atttatatcc 4380ttcttgtttt atgtttcggt ataaaacact
aatatcaatt tctgtggtta tactaaaagt 4440cgtttgttgg ttcaaataat gattaaatat
ctcttttctc ttccaattgt ctaaatcaat 4500tttattaaag ttcatttgat atgcctccta
aatttttatc taaagtgaat ttaggaggct 4560tacttgtctg ctttcttcat tagaatcaat
ccttttttaa aagtcaatcc cgtttgttga 4620actactcttt aataaaataa tttttccgtt
cccaattcca cattgcaata atagaaaatc 4680catcttcatc ggctttttcg tcatcatctg
tatgaatcaa atcgccttct tctgtgtcat 4740caaggtttaa ttttttatgt atttctttta
acaaaccacc ataggagatt aaccttttac 4800ggtgtaaacc ttcctccaaa tcagacaaac
gtttcaaatt cttttcttca tcatcggtca 4860taaaatccgt atcctttaca ggatattttg
cagtttcgtc aattgccgat tgtatatccg 4920atttatattt atttttcggt cgaatcattt
gaacttttac atttggatca tagtctaatt 4980tcattgcctt tttccaaaat tgaatccatt
gtttttgatt cacgtagttt tctgtattct 5040taaaataagt tggttccaca cataccaata
catgcatgtg ctgattataa gaattatctt 5100tattatttat tgtcacttcc gttgcacgca
taaaaccaac aagattttta ttaatttttt 5160tatattgcat cattcggcga aatccttgag
ccatatctga caaactctta tttaattctt 5220cgccatcata aacattttta actgttaatg
tgagaaacaa ccaacgaact gttggctttt 5280gtttaataac ttcagcaaca accttttgtg
actgaatgcc atgtttcatt gctctcctcc 5340agttgcacat tggacaaagc ctggatttac
aaaaccacac tcgatacaac tttctttcgc 5400ctgtttcacg attttgttta tactctaata
tttcagcaca atcttttact ctttcagcct 5460ttttaaattc aagaatatgc agaagttcaa
agtaatcaac attagcgatt ttcttttctc 5520tccatggtct cacttttcca ctttttgtct
tgtccactaa aacccttgat ttttcatctg 5580aataaatgct actattagga cacataatat
taaaagaaac ccccatctat ttagttattt 5640gtttggtcac ttataacttt aacagatggg
gtttttctgt gcaaccaatt ttaagggttt 5700tcaatacttt aaaacacata cataccaaca
cttcaacgca cctttcagca actaaaataa 5760aaatgacgtt atttctatat gtatcaagaa
tagaaagaac tcgtttttcg ctacgctcaa 5820aacgcaaaaa aagcactcat tcgagtgctt
tttcttatcg ctccaaatca tgcgattttt 5880tcctctttgc ttttctttgc tcacgaagtt
ctcgatcacg ctgcaaaaca tcttgaagcg 5940aaaaagtatt cttcttttct tccgatcgct
catgctgacg cacgaaaagc cctctaggcg 6000cataggaaca actcctaaat gcatgtgagg
ggttttctcg tccatgtgaa cagtcgcata 6060cgcaatattt tgtttcccat actgcattaa
tgaatcggcc aacgcgcggg gagaggcggt 6120ttgcgtattg ggcgctcttc cgcttcctcg
ctcactgact cgctgcgctc ggtcgttcgg 6180ctgcggcgag cggtatcagc tcactcaaag
gcggtaatac ggttatccac agaatcaggg 6240gataacgcag gaaagaacat gtgagcaaaa
ggccagcaaa aggccaggaa ccgtaaaaag 6300gccgcgttgc tggcgttttt ccataggctc
cgcccccctg acgagcatca caaaaatcga 6360cgctcaagtc agaggtggcg aaacccgaca
ggactataaa gataccaggc gtttccccct 6420ggaagctccc tcgtgcgctc tcctgttccg
accctgccgc ttaccggata cctgtccgcc 6480tttctccctt cgggaagcgt ggcgctttct
catagctcac gctgtaggta tctcagttcg 6540gtgtaggtcg ttcgctccaa gctgggctgt
gtgcacgaac cccccgttca gcccgaccgc 6600tgcgccttat ccggtaacta tcgtcttgag
tccaacccgg taagacacga cttatcgcca 6660ctggcagcag ccactggtaa caggattagc
agagcgaggt atgtaggcgg tgctacagag 6720ttcttgaagt ggtggcctaa ctacggctac
actagaagaa cagtatttgg tatctgcgct 6780ctgctgaagc cagttacctt cggaaaaaga
gttggtagct cttgatccgg caaacaaacc 6840accgctggta gcggtggttt ttttgtttgc
aagcagcaga ttacgcgcag aaaaaaagga 6900tctcaagaag atcctttgat cttttctacg
gggtct 6936771509DNAClostridium thermocellum
77tttgatcctg gctcaggacg aacgctggcg gcgtgcctaa cacatgcaag tcgagcgggg
60atatacggaa ggtttaccgg aagtatatcc tagcggcgga cgggtgagta acgcgtgggt
120aacctacctc atacaggggg ataacacagg gaaacctgtg ctaataccgc ataacggggc
180ggcatcgtcc tgttatcaaa ggagaaatcc ggtatgagat gggcccgcgt ccgattagct
240agttggtgag gtaacggctc accaaggcga cgatcggtag ccgaactgag aggttggtcg
300gccacattgg gactgagaca cggcccagac tcctacggga ggcagcagtg gggaatattg
360cgcaatgggg gaaaccctga cgcagcaacg ccgcgtgaag gaagaaggcc ttcgggttgt
420aaacttcttt gattggggac gaaggaagtg acggtaccca aagaacaagc cacggctaac
480tacgtgccag cagccgcggt aatacgtagg tggcgagcgt tgtccggaat tactgggtgt
540aaagggcgcg taggcgggat gcaagtcaga tgtgaaattc cggggcttaa ccccggggct
600gcatctgaaa ctgtatctct tgagtgctgg agaggaaagc ggaattccta gtgtagcggt
660gaaatgcgta gatattagga ggaacaccag tggcgaaggc ggctttctgg acagtaactg
720acgctgaggc gcgaaagcgt ggggagcaaa caggattaga taccctggta gtccacgccg
780taaacgatgg atactaggtg taggaggtat cgaccccttc tgtgccggag ttaacacaat
840aagtatccca cctggggagt acggccgcaa ggttgaaact caaaggaatt gacgggggcc
900cgcacaagca gtggagtatg tggtttaatt cgaagcaacg cgaagaacct taccagggct
960tgacatccct ctgacagctc tagagatagg gcttccttcg gggcagagga gacaggtggt
1020gcatggttgt cgtcagctcg tgtcgtgaga tgttgggtta agtcccgcaa cgagcgcaac
1080ccttgtcgtt agttgccagc acgttaaggt gggcactcta gcgagactgc cggcgacaag
1140tcggaggaag gtggggacga cgtcaaatca tcatgcccct tatgtcctgg gctacacacg
1200tactacaatg gctgctacaa agggaagcga taccgcgagg tggagcaaat ccccaaaagc
1260agtcccagtt cggattgcag gctgaaactc gcctgcatga agtcggaatt gctagtaatg
1320gcaggtcagc atactgccgt gaatacgttc ccgggccttg tacacaccgc ccgtcacacc
1380atgagagtct gcaacacccg aagtcatagt ctaaccgcaa ggagggcgct gccgaaggtg
1440gggcagatga ttggggtgaa gtcgtaacaa ggtagccgta tcggaaggtg cggctggatc
1500acctccttt
1509781642DNAClostridium cellulolyticummodified_base(9)..(9)a, c, g or t
78tgatcctgng acaggncgag cgctgncggc gtgcctaaca catgcgagtc gagcggagtt
60acctttagcn ctgagtattc ttgganatga tgctgncccg acagcgtcat ccnnnaacaa
120ccttaatgaa atatttagtt ggagttttgc atcacgcgtt ttatcaaagt gtcaacacat
180aatagtagaa gagaatgttc agtgctgaag gtaacttagc ggcggacggg tgagtaacgc
240gtgggcaacc tgcctgttac agggggataa cacagggaaa cttgtgctaa taccgcataa
300cacaacgaag aagcatttcn ttgttgtcaa aggagcaatc cggtgacaga tgggcccgcg
360tccaattagc tagttggtga tgtaacggat caccaaggcg acgattggta gccgaactga
420gaggttgatc ggccacattg ggnctgagac acggcccaga ctcctacggg aggcagcagt
480ggggaatatt gcacaatggg ggaaaccctg atgcagcaac gccgcgtgaa ggatgaaggt
540tttcggattg taaacttctt tagtcaggga cgaagaaaat gacggtacct gaagaataag
600ccacggctaa ctacgtgcca gcagccgcgg taatacgtag gtggcaagcg ttgtccggaa
660ttactgggtg taaagggcgt gtaggcggga atgtaagtca gatgtgaaat cccagggctt
720aaccctggag ctgcatctga aactatgttt cttgagtgcc ggagaggaaa gcggaattcc
780tagtgtagcg gtgaaatgcg tagatattag gaggaacacc agtggcgaag gcggctttct
840ggacggtaac tgacgctgag gcgcgaaagc gtggggagca aacaggatta gataccctgg
900tagtccacgc tgtaaacgat ggatactagg tgtaggaggt atcgacccct tctgtgccgg
960agttaacaca ataagtatcc cacctgggga gtacggccgc aaggttgaaa ctcaaaggaa
1020ttgacggggg cccgcacaag cagtggagta tgtggtttaa ttcgaagcaa cgcgaagaac
1080cttaccaagg cttgacatat agcggaatnc ggcagagatg tcgtagtcct tcgggactgc
1140tatacacagg tggtgcatgg ttgtcgtcag ctcgtgtcgt gagatgttgg gttaagtccc
1200gcaacgagcg caacccctgt tgctagttga taacattaag atgatcactc tagcgagact
1260gccggtgaca aatcggagga aggtggggac gacgtcaaat catcatgccc cttatgtctt
1320gggctacaca cgtactacaa tggctataac agagggaagc taagctgcaa agtggagcaa
1380atccccaaaa atagtcccag ttcagatggt gggctgcaac ccgcccacat gaagtcggaa
1440ttgctagtaa tggtaggtca gtatactgtc gtgaatacgt tcccgggcct tgtacacacc
1500gcccgtcaca ccatgagagt ctgcaacacc cgaagtcgat agtctaaccg caaggaggac
1560gtcgccgaag gtggggccga tgattggtgt gaagtcgtaa caaggtagcc gtatcggaag
1620gtgcggctgg atcacctcct tt
1642791552DNAThermoanaerobacterium
saccharolyticummodified_base(64)..(64)a, c, g or t 79tttgatcctg
gctcaggacg aacgctggcg gcgtgcctaa cacatgcaag tcgagcgatc 60cggnactcaa
ttaagcgctt acagaaaaag agagagaaan tgagtaaacg caaagttgag 120tgccggatag
cggcggacgg gtgagtaacg cgtggacaat ctaccctgta gtttgggata 180acacctcgaa
aggggtgcta ataccggata atgtcaagaa gtggcatcac tttttgaaga 240aaggagaaat
ccgctatagg atgagtccgc gtcccattag ctagttggcg gggtaaaagc 300ccaccaaggc
gacgatgggt agccggcctg agagggtgaa cgnccacact ggaactgaga 360cacggtccag
actcctacgg gaggcagcag tggggaatat tgttcaatgg gggaaaccct 420gacacagcga
cgccgcgtga gcgaagaagg ccttcgggtc gtaaagctca atagtatggg 480aagatagtga
cggtaccata cgaaagcccc ggctaactac gtgccagcag ccgcggtaat 540acgtaggggg
cgagcgttgt ccggaattac tgggcgtaaa gagcacgtag gcggctgtaa 600aagtcagatg
tgaaaaacct gggctcaacc gagggtgtgc atctgaaact aaacagcttg 660agtcaaggag
aggagagcgg aattcctggt gtagcggtga aatgcgtaga gatcaggaag 720aataccagtg
gcgaaagcgg ctctctggac ttgaactgac gctgaggtgc gaaagcgtgg 780ggagcaaaca
ggattagata ccctggtagt ccacgccgta aacgatggat actaggtgtg 840ggtgaagcat
catccgtgcc ggagttaacg caataagtat cccgcctggg gagtacggcc 900gcaaggttga
aactcaaagg aattgacggg ggcccgcaca agcagcggag catgtggttt 960aattcgaagc
aacgcgaaga accttaccag ggcttgacat ccacagaatc aggtagaaat 1020accagagtgc
ctcgaaagag gagctgtgag acaggtggtg catggttgtc gtcagctcgt 1080gtcgtgagat
gttgggttaa gtcccgcaac gagcgcaacc cctgttggta gttaccagcg 1140taaagacggg
gactctaccg agactgccgt ggagaacacg gaggaaggcg gggatgacgt 1200caaatcatca
tgccctttat gccctgggct acacacgtgc tacaatggcc tgaacagagg 1260gcagcgaagg
agcgatccgg agcgaatccc agaaaacagg tcccagttca gattgcaggc 1320tgcaacccgc
ctgcatgaag acggagttgc tagtaatcgc ggatcagcat gccgcggtga 1380atacgttccc
gggccttgta cacaccgccc gtcacaccac gagagtttac aacacccgaa 1440gtcagtgacc
taaccgcaag ggaggagctg ccgaaggtgg ggtaaatgat tggggtgaag 1500tcgtaacaag
gtagccgtat cggaaggtgc ggctggatca cctcctttcc ct
1552801519DNAClostridium stercorarium 80tttgatcctg gctcaggacg aacgctggcg
gcgtgcctaa cacatgcaag tcgaacggga 60tccgtgttac ggaggtcttt ggaccgaagt
ggcatggtga gagtggcgga cgggcgagta 120acgcgtgagc aacctgccct atgctggggg
ataacaccgg gaaaccggtg ctaataccgc 180ataagaccac agtgacgcat gtacagtggt
aaagctgagg cggcatagga tgggctcgcg 240gtccattagc tagttggtag ggtaacggcc
taccaaggcg acgatcggta gccggactga 300gaggttggcc ggccgcattg ggactgagac
acggcccaga ctcctacggg aggcagcagt 360ggggaatatt gcgcaatggg ggaaaccctg
acgcagcgac gccgcgtgga ggaagaaggc 420ctttgggttg taaactcctt tgatcgggga
cgaagatgac ggtacccgaa gaacaagcca 480cggctaacta cgtgccagca gccgcggtaa
tacgtaggtg gcgagcgttg tccggaatta 540ctgggtgtaa agggcgtgta ggcggggtgc
caagtcaggt gtgaaatacc ggggcttaac 600ctcgggggtg catctgaaac tggtgctctt
gagtgccgga gaggaaagcg gaattcccag 660tgtagcggtg aaatgcgtag atattgggag
gaacaccagt ggcgaaggcg gctttctgga 720cggtaactga cgctgaggcg cgaaagcgtg
gggagcaaac aggattagat accctggtag 780tccacgctgt aaacgatgga tactaggtgt
aggaggtatc gaccccttct gtgccgtagt 840taacacaata agtatcccac ctggggagta
cggccgcaag gctgaaactc aaaggaattg 900acgggggccc gcacaagcag tggagcatgt
ggtttaattc gaagcaacgc gaagaacctt 960accagggctt gacatccccc tgacggatgt
agagatacat cttctccgca aggagcaggg 1020gagacaggtg gtgcatggtg cagctcagct
cgtgtcgtga gatgttgggt taagtcccgc 1080aacgagcgca acccttgtcg ttagttgcca
gcagtaagat gggcactcta acgagactgc 1140cggcgagaag tcggaggaag gtggggatga
cgtcaaatca tcatgcccct tatgtcctgg 1200gctacacacg tgctacaatg gcgactacag
agggaagcaa atccggcagg aggagcaaat 1260cccgaaaggt cgtcccagtt cggattgcag
gctcgaactc gcctgcatga agccggaatt 1320gctagtaatg gcaggtcagc atactgccgt
gaatacgttc ccgggccttg tacacaccgc 1380ccgtcacacc atgagagctg gcaacacccg
aagccgtagc ctaaccgaga ggggggcgcc 1440gtcgaaggtg gggcaggtga ttggggtgaa
gtcgtaacaa ggtagccgta tcggaaggtg 1500cggctggatc acctccttt
1519811500DNAClostridium stercorarium II
81cctggctcag gacgaacgct ggcggcgtgc ctaacacatg caagtcgaac gggatccgtg
60ttacggaggt cttcggaccg aagtggcatg gtgagagtgg cggacgggcg agtaacgcgt
120gagcaacctg ccctatgctg ggggataaca ccgggaaacc ggtgctaata ccgcataaga
180ccacagtgac gcatgtcaca gtggtaaaag ctgaggcggc ataggatggg ctcgcgtccg
240attagctagt tggtagggta acggcctacc aaggcgacga tcggtagccg gactgagagg
300ttggccggcc gcattgggac tgagacacgg cccagactcc tacgggaggc agcagtgggg
360aatattgcgc aatgggggaa accctgacgc agcgacgccg cgtggaggaa gaaggccttt
420gggttgtaaa ctcctttgat cggggacgaa gatgacggta cccgaagaac aagccacggc
480taactacgtg ccagcagccg cggtaatacg taggtggcga gcgttgtccg gaattactgg
540gtgtaaaggg cgtgtaggcg gggtgccaag tcaggtgtga aataccgggg cttaacctcg
600ggggtgcatc tgaaactggt gctcttgagt gccggagagg aaagcggaat tcccagtgta
660gcggtgaaat gcgtagatat tgggaggaac accagtggcg aaggcggctt tctggacggt
720aactgacgct gaggcgcgaa agcgtgggga gcaaacagga ttagataccc tggtagtcca
780cgctgtaaac gatggatact aggtgtagac cccttctgtg ccgtagttaa cacaataagt
840atcccacctg gggagtacga ggtatcgggc cgcaaggctg aaactcaaag gaattgacgg
900gggcccgcac aagcagtgga gcatgtggtt taattcgaag caacgcgaag aaccttacca
960gggcttgaca tccccctgac ggatgtagag atacatcttc tccgcaagga gcaggggaga
1020caggtggtgc atggttgtcg tcagctcgtg tcgtgagatg ttgggttaag tcccgcaacg
1080agcgcaaccc ttgtcgttag ttgccagcag taagatgggc actctaacga gactgccggc
1140gagaagtcgg aggaaggtgg ggatgacgtc aaatcatcat gccccttatg tcctgggcta
1200cacacgtgct acaatggcga ctacagaggg aagcaaatcc gcgaggagga gcaaatcccg
1260aaaggtcgtc ccagttcgga ttgcaggctg caactcgcct gcatgaagcc ggaattgcta
1320gtaatggcag gtcagcatac tgccgtgaat acgttcccgg gccttgtaca caccgcccgt
1380cacaccatga gagctggcaa cacccgaagc cggtagccta accgagaggg gggcgccgtc
1440gaaggtgggg cacccgaagc cggtagccta accgagaggg gggcgccgtc gaaggtgggg
1500821508DNACaldiscellulosiruptor
kristjanssoniimodified_base(213)..(213)a, c, g or t 82ggctcaggac
gaacgctggc ggcgtgccta acgcatgcaa gtcgagcgga gatggtagct 60gaaggtgatg
agctggaagc tatcatctta gcggcggacg ggtgagtaac acgtgagcaa 120cctaccctca
gcacggggat aacagctcga aagggctgct aatacccgat gggaccacgg 180catcgcatgg
tgctgtggtg aaagggtagc cgnagaggct atnccggctg gggatgggct 240cgcggcccat
cagctagttg gtggggtaac ggcctaccaa ggcgacgacg ggtagccggc 300ctgagagggt
gtacggccac agtgggactg agacacggcc cacactccta cgggaggcag 360cagcggggaa
tcttgcgcaa tgggcgaaag cctgacgcag cgacnccgcg tgagggaaga 420agcccttcgg
ggtgtaaacc tctttggacg gggagaagtg gaagatagta cccgtttaaa 480aagccacggc
taactacgtg ccagcagccg cggtaatacg taggtggcga gcgttgtccg 540gaattactgg
gcgtaaaggg tgcgtaggcg gcctggtaag ttgagcgtga aatttttggg 600ctcaacccaa
aaggagcgct caagactgcc gggcttgagt gcgggagagg acggcggaat 660tcccggtgta
gcggtgaaat gcgtagatat cgggaggaac accagtggcg aaggcggccg 720tctggaccgt
aactgacgct gaggcacgaa agcgtgggga gcaaacagga ttagataccc 780tggtagtcca
cgctgtaaac gatggatgct aggtgtgggg gagaagaact cttccgtgcc 840gtagttaaca
caataagcat cccgcctggg gagtacggtc gcaaggttga aactcaaagg 900aattgacggg
ggcccgcaca agcggtggag catgtggttt aattcgaagc aacgcgaaga 960accttaccag
ggcttgacat gccgggaacc ctgccgaaag gcgggggtgc ctgcttgtta 1020agagcaggag
cccggacaca ggtggtgcat ggttgtcgtc agctcgtgtc gtgagatgtt 1080gggttaagtc
ccgcaacgag cgcaacccct gcccttagtt gccagcggtt ttagccgggc 1140actctaaggg
gactgccgcc gatgaggcgg aggaaggtgg ggatgacgtc aaatcatcat 1200gccccttatg
ccctgggcta cacacgtgct acaatgggtg ctacagaggg cggcgaaggc 1260gcgagccgga
gcgaatccca aaaaagcacc cccagttcgg attgcaggct gcaactcgcc 1320tgcatgaagt
cggaatcgct agtaatcgcg gatcagcatg ccgcggtgaa tacgttcccg 1380ggccttgtac
acaccgcccg tcacaccatg agagtcagca acacctgaag acacaggata 1440tctgtgttga
aggtggggct gatgattggg gtgaagtcgt aacaaggtag ccgtacggga 1500acgtgcgg
1508831370DNAClostridium phytofermentansmodified_base(240)..(243)a, c, g
or t 83cttagtggcg gacgggtgag taacgcgtgg gtaacctgcc tcatacaggg ggataacagt
60cggaaacgat tgctaaaacc gcataatata gcgaaaccgc atgattttgc tatcaaatat
120ttataggtat gagatgggcc cgcgtctgat tagctagttg gtggggtaat ggcctaccaa
180ggcgacgatc agtagccggc ttgagagagt gaccggccac attgggactg agacacggcn
240nnnactnctn cgggaggcag cagtggggaa tattggacaa tgggggaaac ccngatccag
300cgacgccgcg tgagtgaaga agtatttcgg tatgtaaagc tctatcagca gggaagataa
360tgacagtacc tgactaagaa gccccggcta actacgtgcc agcagccgcg gtnatacgta
420nnnnnnnagc gttatccgga tttactgggt gtaaagggag cgtaggtggt aggtcaagtc
480agatgtgaaa gnccagggct caaccctggn nctgcatttg aaactggctn actgagtgca
540ggagaggtaa gtggaattcc tagtgtagcg gtgaaatgcg tagatantag gaggaacacc
600agtggcgaag gcggcnnact ggactgtaac tgacactgag gctcgnnngc gtggggagca
660aacaggatta gatnccctgg tagtccncgc cgtaaacgat gaatactagc tgttcggggt
720cnnacagggc ttcggtggcg cacgtaacgc aataagtatt ccacctgggg ngtacgttcg
780caagaatgaa actcaaagga attgacgggg anncgcacaa gcggtggagc atgtggttta
840attcgaanna acgcgaagaa ccttaccaag tcttgacatc cctctgacaa ccgagtaacg
900tcggnnttct tcgggncaga ggngacaggt ggtgcatggt tgtcgtcagc tcgtgtcgtg
960agatgttggg ttaagtcccg caacgagcgc aacccctatc tttagtagcc agcagttcgg
1020ctgnncactc tagagagact gccagggata acctggagga aggcggggat gacgtnnaat
1080catcatgccc cnnatgattt gggctacaca cgtgctacaa tggtgactac aaagagaagc
1140aagcctgcnn gggggagcaa atctcaaaaa ggtcatccca gttcggattg tactctgcaa
1200ctcgagtaca tgaagctgga atcgctagta atcgcgaatc agaatgtcgc ggtgaatacg
1260ttcccgggtc ttgtacacac cgyycgtcac tccatgggag taggtaacgc ccgaagtcag
1320tgaccyaacc gtaaggaggg agctgccgaa ggcgggatct ataactgggg
13708457DNAArtificial SequenceDescription of Artificial Sequence
Synthetic oligonucleotide 84gaattcgagc tcggtacccg gggatcctct
agagtcgacc tgcaggcatg caagctt 57851515DNAThermoanaerobacter
pseudoethanolicusmodified_base(69)..(69)a, c, g or t 85cctggctcag
gacgaacgct ggcggcgtgc ctaacacatg caagtcgagc ggtccggcag 60ccaacttang
ncgggagccg gatagcggcg gacgggtgag taacgcgtgg gcaacctacc 120cttaagaccg
ggataacacc tcgaaagggg tgctaatact ggataagctc cttgtagggc 180atggtatgag
gaggaaggta gcgggactac cgcttaagga tgggcccgcg tcccatcagc 240tagttggtag
ggtaacggcc taccaaggcg acgacgggta gccggcctga gagggtggtc 300ggccacactg
ggactgagac acggcccaga ctcctacggg aggcagcagt ggggaatctt 360gcgcaatggg
cgaaagcctg acgcagcgac gccgcgtgag cgaggaaggc cttcgggtcg 420taaagctcga
tagtgtggga agaagggatg acggtaccac acgaaagccc cggctaacta 480cgtgccagca
gcctcggtaa gacgtagggg gcgagcgttg tccggaatta ctgggcgtaa 540agggcgcgta
ggcggccgtt caagtcaggt gtaaaatacc cgggctcaac ccggggatag 600cacttgaaac
tgggcggcta gagggcagga gaggggagtg gaattcccgg tgtagcggtg 660aaatgcgtag
atatcgggag gaataccagt ggcgaaggcg actctctgga ctgaccctga 720cgctgaggcg
cgaaagcgtg gggagcaaac aggattagat accctggtag tccacgccgt 780aaacgatggg
tactaggtgt gggatgcgga agcattccgt gccgtagtta acgcaataag 840taccccgcct
ggggagtacg gccgcaaggt tgaaactcaa aggaattgac gggggcccgc 900acaagcggtg
gagcatgtgg tttaattcga agcaacgcga agaaccttac cagggcttga 960catgcaggta
gtagcgagcc gaaaggtgag cgaccttacc ttaaaggtga ggagcctgca 1020caggtggtgc
atggttgtcg tcagctcgtg tcgtgagatg ttgggttaag tcccgcaacg 1080agcgcaaccc
ctgcctctag ttgccagcgg gtgaagccgg gcacgctaga gggactgccg 1140tggacaacac
ggaggaaggt ggggatgacg tcaaatcatc atgccctata tgccctgggc 1200cacacacgtg
ctacaatggc cggtacagag ggaagcgaag ccgcgaggtg gagcgaaacc 1260caaaaagccg
gtccaagttc ggattgcagg ctgcaactcg cctgcatgaa gtcggaatcg 1320ctagtaatcg
cggatcagca tgccgcggtg aatacgttcc cgggccttgt acacaccgcc 1380cgtcacacca
cgagagtctg caacacccga agccgtgacc caaccgnaag gagggagccg 1440tcgaaggtgg
ggcagatgat tggggtgaag tcgtaacaag gtagccgtat cggaaggtgc 1500ggctggatca
cctcc
1515861395DNAThermoanaerobacter sp. 86ctacacatgc agtcgagcga agggagtact
acggtacgaa cttagcggcg gacgggtgag 60taacgcgtgg acaatctacc ctgtagaccg
ggataacacc tcgaaagggg tgctaatacc 120ggataatgtc gagaagcggc atcgcttttt
gaagaaagga gagaatccgc tataggagga 180gtccgcgtcc cattagctag ttggcgaggg
taaaagccca ccaaggcgac gatgggtagc 240cggcctgaga gggtgaacgg ccacactgga
actgagacac ggtccagact cctacgggag 300gcagcagtgg ggaatattgt gcaatggggg
aaaccctgac acagcgacgc cgcgtgagtg 360aagaaggcct tcgggtcgta aagctcaata
gtatgggaag aaagaaatga cggtaccata 420cgaaagcccc ggctaactac gtgccagcag
ccgcggtaat acgtaggggg cgagcgttgt 480ccggaattac tgggcgtaaa gagcacgtag
gcggctataa aagtcagatg tgaaaaacct 540gggctcaacc gagggtatgc atctgaaact
aaatagcttg agtcaaggag aggagagcgg 600aattcctggt gtagcggtga aatgcgtaga
gatcaggaag aataccagtg gcgaaagcgg 660ctctctggac ttgaactgac gctgaggtgc
gaaagcgtgg ggagcaaaca ggattagata 720ccctggtagt ccacgccgta aacgatggat
actaggtgtg ggttagatat aatccgtgcc 780ggagttaacg caataagtat cccgcctggg
gagtacggcc gcaaggttga aactcaaagg 840aattgacggg ggcccgcaca agcagcggag
catgtggttt aattcgaagc aacgcgaaga 900accttaccag ggcttgacat ccacagaatc
gagtagaaat acttgagtgc ctcgtaagag 960gagctgtgag acaggtggtg catggttgtc
gtcagctcgt gtcgtgagat gttgggttaa 1020gtcccgcaac gagcgcaacc cctgttggta
gttaccagcg taaagacggg gactctaccg 1080agactgccgt ggataacacg gaggaaggcg
gggatgacgt caaatcatca tgccctttat 1140gccctgggct acacacgtgc tacaatggcc
tgaacagagg gcagcgaagg agcgatccgg 1200agcgaatccc agaaaacagg tcccagttca
gattgcaggc tgcaacccgc ctgcatgaag 1260acggagttgc tagtaatcgc ggatcagcat
gccgcggtga atacgttccc gggccttgta 1320cacaccgccc gtcacaccac gagagtttac
aacacccgaa gtcagtgacc taaccgcaag 1380ggaggagctg ccgaa
1395871552DNAThermoanaerobacterium
saccharolyticummodified_base(64)..(64)a, c, g or t 87tttgatcctg
gctcaggacg aacgctggcg gcgtgcctaa cacatgcaag tcgagcgatc 60cggnactcaa
ttaagcgctt acagaaaaag agagagaaan tgagtaaacg caaagttgag 120tgccggatag
cggcggacgg gtgagtaacg cgtggacaat ctaccctgta gtttgggata 180acacctcgaa
aggggtgcta ataccggata atgtcaagaa gtggcatcac tttttgaaga 240aaggagaaat
ccgctatagg atgagtccgc gtcccattag ctagttggcg gggtaaaagc 300ccaccaaggc
gacgatgggt agccggcctg agagggtgaa cgnccacact ggaactgaga 360cacggtccag
actcctacgg gaggcagcag tggggaatat tgttcaatgg gggaaaccct 420gacacagcga
cgccgcgtga gcgaagaagg ccttcgggtc gtaaagctca atagtatggg 480aagatagtga
cggtaccata cgaaagcccc ggctaactac gtgccagcag ccgcggtaat 540acgtaggggg
cgagcgttgt ccggaattac tgggcgtaaa gagcacgtag gcggctgtaa 600aagtcagatg
tgaaaaacct gggctcaacc gagggtgtgc atctgaaact aaacagcttg 660agtcaaggag
aggagagcgg aattcctggt gtagcggtga aatgcgtaga gatcaggaag 720aataccagtg
gcgaaagcgg ctctctggac ttgaactgac gctgaggtgc gaaagcgtgg 780ggagcaaaca
ggattagata ccctggtagt ccacgccgta aacgatggat actaggtgtg 840ggtgaagcat
catccgtgcc ggagttaacg caataagtat cccgcctggg gagtacggcc 900gcaaggttga
aactcaaagg aattgacggg ggcccgcaca agcagcggag catgtggttt 960aattcgaagc
aacgcgaaga accttaccag ggcttgacat ccacagaatc aggtagaaat 1020accagagtgc
ctcgaaagag gagctgtgag acaggtggtg catggttgtc gtcagctcgt 1080gtcgtgagat
gttgggttaa gtcccgcaac gagcgcaacc cctgttggta gttaccagcg 1140taaagacggg
gactctaccg agactgccgt ggagaacacg gaggaaggcg gggatgacgt 1200caaatcatca
tgccctttat gccctgggct acacacgtgc tacaatggcc tgaacagagg 1260gcagcgaagg
agcgatccgg agcgaatccc agaaaacagg tcccagttca gattgcaggc 1320tgcaacccgc
ctgcatgaag acggagttgc tagtaatcgc ggatcagcat gccgcggtga 1380atacgttccc
gggccttgta cacaccgccc gtcacaccac gagagtttac aacacccgaa 1440gtcagtgacc
taaccgcaag ggaggagctg ccgaaggtgg ggtaaatgat tggggtgaag 1500tcgtaacaag
gtagccgtat cggaaggtgc ggctggatca cctcctttcc ct
1552881553DNAThermoanaerobacterium saccharolyticum 88tttgatcctg
gctcaggacg aacgctggcg gcgtgcctaa cacatgcaag tcgagcgatc 60cggcactcaa
ctaagcgctt acagaaaaag agagagaaaa tgagtaaacg caaagttgag 120tgccggatag
cggcggacgg gtgagtaacg cgtggacaat ctaccctgta gtttgggata 180acacctcgaa
aggggtgcta ataccggata atgtcaagaa gtggcatcac tttttgaaga 240aaggagaaat
ccgctatagg atgagtccgc gtcccattag ctagttggcg gggtaaaagc 300ccaccaaggc
gacgatgggt agccggcctg agagggtgaa cggccacact ggaactgaga 360cacggtccag
actcctacgg gaggcagcag tggggaatat tgtgcaatgg gggaaaccct 420gacacagcga
cgccgcgtga gcgaagaagg ccttcgggtc gtaaagctca atagtatggg 480aagatagtga
cggtaccata cgaaagcccc gggctactac gtgccagcag ccgcggtaat 540acgtaggggg
cgagcgttgt ccggaattac tgggcgtaaa gagcacgtag gcggctgtaa 600aagtcagatg
tgaaaaacct gggctcaacc gagggtgtgc atctgaaact aaacagcttg 660agtcaaggag
aggagagcgg aattcctggt gtagcggtga aatgcgtaga gatcaggaag 720aataccagtg
gcgaaagcgg ctctctggac ttgaactgac gctgaggtgc gaaagcgtgg 780ggagcaaaca
ggattagata ccctggtagt ccacgccgta aacgatggat actaggtgtg 840ggtgaagcat
catccgtgcc ggagttaacg caataagtat cccgcctggg gagtacggcc 900gcaaggttga
aactcaaagg aattgacggg ggcccgcaca agcagcggag catgtggttt 960aattcgaagc
aacgcgaaga accttaccag ggcttgacat ccacagaatc tggtagaaat 1020accggagtgc
ctcgaaagag gagctgtgag acaggtggtg catggttgtc gtcagctcgt 1080gtcgtgagat
gttgggttaa gtcccgcaac gagcgcaacc cctgttggta gttaccagcg 1140taaagacggg
gactctaccg agactgccgt ggagaacacg gaggaaggcg gggatgacgt 1200caaatcatca
tgccctttat gccctgggct acacacgtgc tacaatggcc tgaacagagg 1260gcagcgaagg
agcgatccgg agcgaatccc agaaaacagg tcccagttca gattgcaggc 1320tgcaacccgc
ctgcatgaag acggagttgc tagtaatcgc ggatcagcat gccgcggtga 1380atacgtttcc
cgggccttgt acacaccgcc cgtcacacca cgagagttta caacacccga 1440agtcagtgac
ctaaccgaaa ggaaggagct gccgaaggtg gggtaaatga ttggggtgaa 1500gtcgtaacaa
ggtagccgta tcggaaggtg cggctggatc acctcctttc taa
1553891569DNAUnknownDescription of Unknown Organism Consensus
Sequence 89tttgatcctg gctcaggacg aacgctggcg gcgtgcctaa cacatgcaag
tcgagcgatc 60cggcactcaa ntaagcgctt acagaaaaag angagcgaaa ntgagtaaac
gctaagttga 120gtgccggata gcggcggacg ggtgagtaac gcgtggacaa tctaccctgt
agtttgggat 180aacacctcga aaggggtgct aataccggat aatgtcaaga agtggcatcg
ctttttgaag 240aaaggagagn naatnccgct ataggatgag tccgcgtccc attagctagt
tggcgngggt 300aaaagcccac caaggcgacg atgggtagcc ggcctgagag ggtgaacggc
cacactggaa 360ctgagacacg gtccagactc ctacgggagg cagcagtggg gaatattgtg
caatggggga 420aaccctgaca cagcgacgcc gcgtgagcga agaaggcctt cgggtcgtaa
agctcaatag 480tatgggaaga tagnantgac ggtaccatac gaaagccccg gctaactacg
tgccagcagc 540cgcggtaata cgtagggggc gagcgttgtc cggaattact gggcgtaaag
agcacgtagg 600cggctgtaaa agtcagatgt gaaaaacctg ggctcaaccg agggtgtgca
tctgaaacta 660aacagcttga gtcaaggaga ggagagcgga attcctggtg tagcggtgaa
atgcgtagag 720atcaggaaga ataccagtgg cgaaagcggc tctctggact tgaactgacg
ctgaggtgcg 780aaagcgtggg gagcaaacag gattagatac cctggtagtc cacgccgtaa
acgatggata 840ctaggtgtgg gntgaggcat catnccgtgc cggagttaac gcaataagta
tcccgcctgg 900ggagtacggc cgcaaggttg aaactcaaag gaattgacgg gggcccgcac
aagcagcgga 960gcatgtggtt taattcgaag caacgcgaag aaccttacca gggcttgaca
tccacnnaga 1020atcgggtaga aataccagag tgcctcgnnn aaagaggagc tgtgagnaca
ggtggtgcat 1080ggttgtcgtc agctcgtgtc gtgagatgtt gggttaagtc ccgcaacgag
cgcaacccct 1140gttggtagtt accagcgnnt aaagacgggg actctaccga gactgccgtg
gagaacacgg 1200aggaaggcgg ggatgacgtc aaatcatcat gccctttatg ccctgggcta
cacacgtgct 1260acaatggcct gaacagaggg cagcgaagga gcgatccgga gcgaatccca
gaaaacaggt 1320cccagttcag attgcaggct gcaacccgcc tgcatgaaga cggagttgct
agtaatcgcg 1380gatcagcatg ccgcggtgaa tacgttnccc gggccttgta cacaccgccc
gtcacaccac 1440gagagtttac aacacccgaa gtcagtgacc taaccgcaag ggaggagctg
ccgaaggtgg 1500ggtaaatgat tggggtgaag tcgtaacaag gtagccgtat cggaaggtgc
ggctggatca 1560cctcctttc
1569901061DNAUnknownDescription of Unknown Organism Consensus
Sequence 90gtgtatacaa tatatttctt ctttttagta agaggaatgt ataaaaataa
atattttaaa 60ggaagggacg atcttatgag cattattcaa aacatcattg aaaaagctaa
aagtgataaa 120aagaaaattg ttctgccaga aggtgcagaa cccaggacat taaaagctgc
tgaaatagtt 180ttaaaagaag gaattgcaga tttggtgctt cttggaaatg aagatgagat
aagaaatgct 240gcaaaagact tggacatatc caaagctgaa atcattgacc ctgtaaagtc
tgaaatgttt 300gataggtatg ctaatgattt ttatgagtta aggaagagca aaggaatcac
gttggaaaaa 360gccagagaaa caatcaagga taatatctat tttggatgta tgatggttaa
agaaggttat 420gctgatggat tggtatctgg cgctattcat gctactgcag atttattaag
acctgcattt 480cagataatta aaacggctcc aggagcaaag atagtatcaa gcttttttat
aatggaagtg 540cctaattgtg aatatggtga aaatggtgta ttcttgtttg ctgattgtgc
ggtcaatcca 600tcgcctaatg cagaagaact tgcttctatt gctgtacaat ctgctaatac
tgcaaangaa 660tttgttgggc tttgaaccaa aagttgctat gctatcattt tctacaaaag
gtagtgcatc 720acatgaatta gtagataagg taagaaaagc gacagagata gcaaaagaat
tgatgccaga 780tgttgctant cgatggtgaa ttgcaattgg atgctgctct tgttaaagaa
gttgcagagc 840taaaagcgcc gggaagcaaa gttgcgggat gtgcaaatgt gcttatattc
cctgatttac 900aagctggtaa tataggatat aagcttgtac agagattagc taaggcaaat
gcaattggac 960ctataacaca aggaatgggt gcaccggtta atgatttatc aagaggatgc
agctatagag 1020atattgttga cgtaatagca acaacagctg tgcaggctca a
1061911213DNAUnknownDescription of Unknown Organism Consensus
Sequence 91atgaaaatta tgaaaattct ggttattaat tgtggaagtt cttcactaaa
antatcaatt 60gattgaatca antgatggaa atgtgctggc aaaaggcctt gctgaaagaa
tcggcataaa 120tgattccctg ttgacncata atgctaacgg nnnagaaaaa atcaagataa
aaaaagacat 180gaaagatcac aaagacgcaa taaaattggt tttagatgct ttggtaagta
gtgactacgg 240cgttataaag gatatgtctg agatagatgc tgtaggacat agagttgttc
acggaggaga 300gtcttttaca tcatcagttc tcataaatga tgaagtgtta aaagcgataa
cagattgtat 360agaattagct ccactgcaca atcctgctaa tatagaagga attaaagctt
gccagcaaat 420catgccaaac gttccaatgg tggcggtatt tgatacagcc tttcatcaga
caatgcctga 480ttatgcatat ctttatccaa taccttatga atactacaca aagtacagga
tcagaagata 540tggatttcat ggcacatcgc ataaatatgt ttcaaatagg gctgcagaga
ttttgaataa 600acctattgaa gatttgaaaa tcataacttg tcatcttgga aatggctcca
gtattgctgc 660tgtcaaatat ggtaaatcaa ttgacacaag catgggattt acaccattag
aaggtttggc 720tatgggtaca cgatctggaa gtatagaccc atccattatt tcttatctta
tggaaaaaga 780aaatataagt gctgaagagg tagtaaatat attaaataaa aaatctggtg
tttacggtat 840ttcaggaata agcagcgatt ttagagattt agaagatgcc gcctttaaaa
atggagatga 900aagagctcag ttggctttaa atgtgtttgc atatcgagta aagaagacga
ttggcgctta 960tgcagcagct atgggaggcg ttgatgtcat tgtatttaca gcaggtgttg
gtgaaaatgg 1020tcctgagata cgagaattta tacttgatgg attagagttt ttagggttca
gcttggataa 1080agaaaaaaat aaagtcagag gaaaagaaac tattatatct acgccgaatt
caaaagttag 1140cgtgatggtt gtgcctacta atgaagaata tatgattgct aaagatactg
aaaagattgt 1200aaagagtata aaa
121392935DNAUnknownDescription of Unknown Organism Consensus
Sequence 92atgagcaaag tagcnataat aggttctgga tttgtaggtg ctacatctgc
atttacactg 60gctttaagtg ggactgtgac agatattgtn ttagtagatt taaacaagga
caaggcnata 120ggcgatgcac tggatataag ccatggcata ccgtttatac agcctgtaaa
tgtgtatgca 180ggtgactaca aagatgttga aggcgcagat gtaatagttg tgacagcagg
tgctgctcaa 240aagccgggag agacnaggct tgaccttgtg aagaaaaata cagctatatt
taagtccatg 300atacctgagc ttnttaaagt acaatgacaa ggctatatat ttgattgtna
caaatcctgt 360agatatactg acgtacgtta catacaagat atctggactt ccatggggca
gagttttcgg 420ttctggcact gttcttgaca gttcaaggtt taggtatctt ttaagcaagc
attgcaatat 480agatnccgag aaatgtccac ggaaggataa ttggcgagca tggtgataca
gagtttgcag 540catggagcat aacaaacata tcaggaatat catttaatga gtactgcagt
ttatgcggac 600gcgtctgtaa cacaaatttc agaaaggaag tagaagatga agttgtaaat
gctgcttata 660agataataga caaaaagggt gctacatatt atgctgtggc tgttgcagta
agaaggattg 720tggagtgtat cttaagagat gaaaattcca ttctnacagt ntcatctcca
ttaaatggnc 780aatacggtgt nanagatgtn tctttaagct tgccatcnat tgtnggcaga
aatggngttg 840caaggattct gganttgcct ttntctgang aagaagttga gaagtttaga
cattcagcaa 900gngttatggc agatgtnata aaacagttng atata
935
User Contributions:
Comment about this patent or add new information about this topic: