The invention relates to polypeptides that have epoxide hydrolase activity, polynucleotides that encode polypeptides, antibodies that bind to those polypeptides, and methods of making and using those polynucleotides and polypeptides. Epoxide hydrolases are used to catalyze the hydrolysis of epoxides and arene oxides into their corresponding diols.
REFERENCES TO RELATED APPLICATIONS
This application is entitled to priority under 35 U.S.C. § 119(e) Temporary application of US Ser. no. 60/309,478, filed Aug. 3, 2001, and ser. 60/393,978, filed Jul. 3, 2002. Each of the prior applications is expressly incorporated herein by reference in its entirety and for all purposes.
TECHNICAL SCOPE
The subject invention relates to molecular and cellular biology and biochemistry. In particular, the invention relates to polypeptides having epoxide hydrolase activity, polynucleotides encoding the polypeptides, and methods of making and using these polynucleotides and polypeptides. The polypeptides of the invention can be used as epoxide hydrolases to catalyze the hydrolysis of epoxides and arene oxides to their respective diols.
BACKGROUND
Epoxide hydrolases (EH) catalyze the hydrolysis of epoxides and arene oxides to their corresponding diols. Epoxide hydrolases from microbial sources are very versatile biocatalysts for the asymmetric hydrolysis of epoxides at the preparative level. With kinetic resolution ensuring the appropriate vicinal diol and the remaining unhydrolyzed epoxide in non-racemic form, enantiomeric processes are possible. They are very attractive because they lead to the formation of one enantiomeric diol from a racemic oxirane, see e.g. Steinreiber (2001) Curr. An opinion. Biotechnology. 12:552-558.
Microsomal epoxide hydrolases are biotransformation enzymes that catalyze the conversion of a wide range of xenobiotic epoxide substrates into more polar diol metabolites, see e.g. Omiecinski (2000) Toxicol. Latvian. 112-113: 365-370. Microsomal epoxide hydrolases catalyze the addition of water to epoxides in a two-step reaction involving the initial attack of the active site carboxylate on the oxirane to form an ester intermediate, followed by hydrolysis of the ester. Soluble epoxide hydrolase plays a role in the biosynthesis of inflammatory mediators, see, for example, Morisseau (1999) Proc. Natl. Acad. Science USA 96:8849-8854.
Chiral molecules including alcohols, α-hydroxy acids, and epoxides are important in the synthesis of pharmaceuticals, agrochemicals, and many fine chemicals. The main challenge of modern organic chemistry is the creation of such compounds in high yield, with high stereo- and regioselectivity. Enantiomeric epoxides are versatile synthons for the synthesis of many pharmaceuticals, agrochemicals and other high value compounds.
Currently available methods have drawbacks that limit their use in industrial applications. In recent studies, epoxide hydrolases (hereafter "EHs") have been shown to be promising biocatalysts for the production of chiral epoxides and vicinal diols. They show high enantioselectivity towards their substrates and can be effectively used for the separation of chemically obtained racemic epoxides. As shown in Fig. 1, selective hydrolysis of the racemic epoxide can generate both the corresponding diols and unreacted epoxides with high enantiomeric excess (ee) values. However, in order to fully exploit the potential of EH in industrial applications, the following significant limitations must be urgently overcome: (1) the number of available enzymes is small; and (2) the substrate range is limited.
Of the available enzymes, many are selective for only one enantiomer, limiting access to both enantiomers of a given target. In current synthetic applications, high enzyme concentrations and low substrate concentrations are required due to low catalytic efficiency, especially at high substrate/product concentrations.
As mentioned above, there is currently a need in biotechnology and the chemical industry for molecules that can optimally carry out biological or chemical processes (eg enzymes). For example, molecules and compounds used in established and emerging chemical, pharmaceutical, textile, food and feed and detergent markets must meet strict economic and environmental standards. Expensive processes that produce harmful byproducts and suffer from poor or inefficient catalysis often hinder the synthesis of polymers, drugs, natural products, and agrochemicals. For example, enzymes have many significant advantages that can overcome these problems in catalysis: they act on individual functional groups, distinguish between similar functional groups on a single molecule, and distinguish between enantiomers. In addition, they are biodegradable and act at very low mole fractions in reaction mixtures. Due to their chemo-, regio- and stereospecificity, enzymes offer a unique opportunity to optimally achieve the desired selective transformations. They are often extremely difficult to replicate chemically, especially in one-step reactions. Eliminating the need for protective groups, selectivity, the possibility of carrying out multi-step transformations in one reaction vessel, along with the accompanying reduction of the environmental load, have led to an increased demand for enzymes in the chemical and pharmaceutical industry.
Enzyme-based processes are gradually replacing many conventional chemical methods. Current limitations to wider industrial use are primarily due to the relatively small number of enzymes available on the market. Only about 300 enzymes (excluding DNA-modifying enzymes) are currently commercially available out of the >3000 non-DNA-modifying enzyme activities reported so far.
The use of enzymes for technological applications may also require performance in demanding industrial environments. This includes activities in environments or on substrates for which the currently known arsenal of enzymes has not been evolutionarily selected. However, the natural environment provides extreme conditions, including, for example, extremes of temperature and pH. Many organisms have adapted to these conditions, in part by selecting polypeptides that can withstand these extremes.
Enzymes have evolved through selection pressure to perform very specific biological functions in the environment of a living organism under conditions of temperature, pH and salt concentration. For the most part, the non-DNA-modifying enzyme activities identified so far have been isolated from mesophilic organisms, which represent a very small fraction of the available phylogenetic diversity. The dynamic field of biocatalysis takes on a new dimension with enzymes isolated from microorganisms that thrive in extreme environments. For example, such enzymes must function at temperatures above 100°C in terrestrial hot springs and deep sea thermal vents, below 0°C in arctic waters, in the saturated saline environment of the Dead Sea, at pH values around 0 in coal beds and rich geothermal springs. .in sulfur or at pH values above 11 in sewage sludge. Environmental samples obtained, for example, from extreme conditions containing organisms, polynucleotides or polypeptides (eg enzymes) open a new field in biocatalysis. By rapidly screening polynucleotides encoding polypeptides of interest, the invention not only provides a source of material for the development of biological drugs, therapeutics and enzymes for industrial applications, but also provides new materials for further processing, such as through directed evolution and mutagenesis for the development of molecules or polypeptides modified for a specific activity, specificity or conditions.
With the need for new enzymes for industrial applications, there has been a dramatic increase in the demand for new bioactive compounds. This demand is largely due to demographic changes worldwide along with a clear and growing trend in the number of pathogenic organisms resistant to currently available antibiotics. For example, while emerging countries with young populations have seen an increase in demand for antibacterial drugs, countries with aging populations such as the United States require a growing repertoire of drugs against cancer, diabetes, arthritis and other debilitating diseases. Mortality from infectious diseases increased by 58% between 1980 and 1992, and the emergence of antibiotic-resistant microbes is estimated to have increased health care costs by more than $30 billion annually in the United States alone. (Adams et al., Chemical and Engineering News, 1995; Amann et al., Microbiological Reviews, 59, 1995). In response to this trend, pharmaceutical companies have greatly intensified the screening of microbial diversity for compounds with unique actions or specificities. Accordingly, the invention can be used to obtain and identify polynucleotides and related sequence-specific information of, for example, infectious microorganisms present in the environment, such as, for example, in the guts of various macroorganisms.
One of the solutions to this problem is to identify new enzymes in the environmental sample. By rapidly identifying the polypeptide of interest and polynucleotides encoding the polypeptide of interest, the invention provides methods, compositions, and resources for the development of biological, diagnostic, therapeutic, and industrial drug compositions.
Chiral epoxides and diols are key building blocks for drug synthesis. The epoxide group is readily converted into a wide range of derivatives by acid- or base-catalyzed ring-opening reactions, while diols can be similarly converted into a wide range of structures. Epoxides are widely used in areas such as anticancer agents, beta-blockers, beta-agonists, antivirals, antifungals, and antibacterials. Opportunities for chiral epoxides exist both in the area of small synthons, including C-3 and C-4 units, and in the advanced chemical intermediates for pharmaceuticals.
C-3 synthons are of great importance because they are used in the processes of many pharmaceutical products and can also lead to a wide variety of end products. Glycidols (S-(1) and R-(2)) are the leading chiral epoxides among the representative C-3 synthons shown in Fig. 2. For example, R-glycidol is used as a component of atenolol (an antihypertensive drug), and S-glycidol leads to R-glycidyl butyrate (7), an important synthon in the synthesis of oxazolidinone antibiotics. Oxazolidinones represent a relatively new class of antibiotics and there are currently more than 40 of them in various stages of clinical development. Demand for R- and S-epichlorohydrin is also increasing (3, 4). Among the C-4 synthons, 3,4-epoxy-1-butene (8) is a small molecule with great potential for the chemical industry. Epoxide 8 leads to the production of more than 30 other chiral epoxides that are not readily available. Epoxide 10 is used in the production of saquinavir, an antiviral drug, and its diastereoisomer 11 is used in the synthesis of amprenavir, another antiviral drug (Figure 3). A mixture of the two compounds can be made from phenylalanine via an alkene intermediate. Another epoxide, 12, is a building block for the synthesis of two anticancer drugs, docetaxel and paclitaxel (Figure 4).
Asymmetric chemical synthesis of epoxides and diols
Currently available chemical methods for asymmetric epoxidation of alkenes are Sharpless's asymmetric epoxidation, Jacobsen's epoxidation, and the method developed by Yian Shi. The Sharpless method uses titanium-based catalysts to epoxidize a wide range of allylic alcohols with optical yields often greater than 90%. (Johnson, R. A.; Sharpless, K. B. Catalytic Asymmetric Epoxidation of Allyl Alcohols. In Catalytic Asymmetric Synthesis; Ojima, I. Ed.; VCH: New York, 1993; pp. 103-158). This methodology is compatible with a wide range of functionalities, which has led to its widespread use in synthetic chemistry. However, Sharpless's approach has a significant drawback because alkenes must have a hydroxyl functional group in the allylic position. Unlike the Sharpless reaction, the asymmetric epoxidation methodology developed by Jacobsen and Katsuki, ** (Jacobsen, E. N. Asymmetric catalytic epoxidation of non-functionalized olefins. In Catalytic Asymmetric Synthesis; Ojima, I. Ed.; VCH: New York, 1993. 159-202 and Katsuki, T. Coord. Chem. Rev. 1995, 140, 189-214), which uses optically active (salen)manganese(III) complexes, does not require allylic alcohols. However, the extent of the reaction is somewhat limited due to the spatial and electronic nature of the catalyst, and the best substrates are cis-alkenes linked to aryl, acetylenic and alkenyl groups. This substrate requirement also significantly limits the applicability of this method. Shi Yan's method of asymmetric epoxidation, which uses oxanes derived from oxones and chiral ketones, is effective for trans- and di-substituted olefins. (Zhi-Xian Wang et al., "An Efficient Catalytic Asymmetric Epoxidation Method", J. Am. Chem. Soc. 1997, 119, 11224-11235.) However, the application of oxone and the catalytic performance are two obstacles that hinder its industrial application. .
Where diols are the desired product, an alternative to epoxidation followed by hydrolysis is direct asymmetric dihydroxylation of alkenes. The most successful method of catalytic asymmetric dihydroxylation (AD) of alkenes for the production of vicinal diols was developed by Sharpless. (Johnson, R.A.; Sharpless, K.B. Catalytic asymmetric dihydroxylation. In Catalytic Asymmetric Synthesis; Ojima, I. Ed.; VCH: New York, 1993; pp. 227-272). It uses osmium-based catalysts and can be used in a wide range of alkenes. However, this method is not effective for some cis-alkenes. More importantly, the use of osmium, which is highly toxic, prohibits its use in pharmaceutical production.
Another strategy for the preparation of chiral epoxides and diols is the hydrolytic kinetic resolution of racemic epoxides. The method currently used in industry based on (salen)cobalt catalysts developed by Jacobsen is quite effective on final epoxies. (Tokunaga, M.; Larrow, J.F.; Kakiuchi, F.; Jacobsen, E.N. Science 1997, 277, 936.) However, it is ineffective with internal epoxies. In addition, it is not applicable to many substrates containing heteroatoms (eg pyridyl-type epoxides) due to the interference of these atoms with metal catalysts.
All of the methods discussed above are limited in their application to process chiral synthesis by problematic characteristics that include the use of expensive metal catalysts, low substrate/catalyst ratios, and limited yields and productivity at varying degrees of enantioselectivity. Biocatalysts have focused their attention on overcoming these obstacles. (Besse, PL; Veschambre, H. Tetrahedron. 1994, 50, 8885-8927.) Direct stereospecific epoxidation of alkenes by monooxygenases (eg cytochrome P450s or other monooxygenases) has been reported. (Archelas, A.; Furstoss, R. Top. Curr. Chem. 1999, 200, 159-191.) These enzyme-catalyzed reactions often give high enantiomeric excesses, but in low yields. Epoxides can be produced indirectly from alkenes by haloperoxidases, with initial halohydrin formation followed by ring closure. (Besse, Pl; Veschambre, H. Tetrahedron. 1994, 50, 8885-8927.) Although these enzymes have great potential for use in the synthesis of enantiomerically pure epoxides, there are also serious limitations to their industrial application since they all require cofactors, have complex , multicomponent structure and are generally not very stable. These limitations represent a major challenge both for the discovery of these enzymes and for the development of large-scale industrial biocatalytic applications.
The clear potential shown by microbial EH has encouraged scientists to investigate their use in the synthesis of epoxides and diols at the preparative level. Scheme 8 shows representative examples where multigram epoxides and/or diols with high ee values were obtained. (Choi et al., Appl. Microbiol. Biotechnol. 1999, 53, 7-11; Guerard et al., J. Eur. J. Org. Chem. 1999, 3399-3402; Goswami et al., Tetrahedron : Asymmetry 1999 , 10, 3167-3175, Cleij, M., Archelas, A., Furstoss, R. Tetrahedron: Asymmetry 1998, 9, 1839-1842 and Genzel, Y., Archelas, A., Broxterman, Q. B.; Furstoss, R. Tetrahedron: Asymmetry 2000, 11, 3041-3044.) However, several hurdles need to be overcome before a broad industrial platform for EH-catalyzed synthesis of epoxides and diols can be realized. First, the number of available enzymes is still small, and those that promise synthetic applications are even rarer. The current discovery of new EHs by screening available strains is hampered by limited culture collections and the lack of powerful screening tests. Second, the available enzymes have a limited substrate range and are selective for only one enantiomer as a substrate. For example, A. niger EH prefers styrene oxide substrates and hydrolyzes the R enantiomers in all transformations in FIG. 5. Finally, in most of these preparations, high concentrations of enzyme (either whole cells or crude extract) and relatively low concentrations of substrate had to be used due to the low catalytic efficiency of the enzyme.
New EHs that will offer complementary enantioselectivity (for example, those that recognize S-enantiomers) need to be discovered. It is also necessary to discover EHs suitable for large-scale production of various types of epoxies. Equally important is improving the stereoselectivity and activity of existing and new EHs using protein engineering technology.
ABSTRACT
The invention provides isolated or recombinant nucleic acids comprising a nucleic acid sequence having at least 50% sequence identity with SEQ ID NO:1, SEQ ID NO:3, SEQ ID NO:5, SEQ ID NO:7, SEQ ID NO:11 SEQ ID NO:13, SEQ ID NO:15, SEQ ID NO:17, SEQ ID NO:19, SEQ ID NO:21, SEQ ID NO:31, SEQ ID NO:33, SEQ ID NO:35, SEQ ID NO : 31, SEQ ID NO: 33, SEQ ID NO: 35 ID NO: 41, ID NO: 35 SEQ.: 43, ID NO. SEQ.: 45, ID NO. SEQ.: 47, ID NO. SEQ.: 53, ID NO. SEQ.: 57, ID NO. :63, SEQ ID NO:67, SEQ ID NO:69, SEQ ID NO:71, SEQ ID NO:73, SEQ ID NO:75, SEQ ID NO:77 or SEQ ID NO:79 in a region of at least about 100 residues, a nucleic acid sequence having at least 60% sequence identity with SEQ ID NO:9, SEQ ID NO:23, SEQ ID NO:27, SEQ ID NO:29, SEQ ID NO:39, SEQ ID NO: :49, SEQ ID NO:51, SEQ ID NO:55 or SEQ ID NO:65 over at least about 100 residues, or a nucleic acid sequence having at least 70% sequence identity to SEQ ID NO:25 or SEQ ID NO:37 in a region of at least about 100 residues, wherein the nucleic acid encodes at least one polypeptide having epoxide hydrolase activity, and sequence identities are determined by analysis using a sequence comparison algorithm or by visual inspection.
In alternative aspects, the isolated or recombinant nucleic acids include a nucleic acid sequence having at least 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 98%, 99 % or more sequence identity with SEQ ID NO:1, SEQ ID NO:3, SEQ ID NO:5, SEQ ID NO:7, SEQ ID NO:11, SEQ ID NO:13, SEQ ID NO:15 SEQ ID NO : 17, SEQ ID NO.: 19, SEQ ID NO.: 21, SEQ ID NO.: 31, SEQ ID NO.: 33, SEQ ID NO.: 35, SEQ ID NO.: 41, SEQ ID NO. : 43, SEQ ID NO. ID NO: 45, SEQ ID NO: 47, ID NO: SEQ.: 53, ID NO. SEQ.: 57, ID NO. SEQ.: 59, ID NO. SEQ.: 61, ID NO. SEQ.: 63, ID NO. :69, SEQ ID NO:71, SEQ ID NO:73, SEQ ID NO:75, SEQ ID NO:77 or SEQ ID NO:79 over a region of at least about 50, 100, 150, 200, 250 , 300, 350 A nucleic acid sequence having at least 60%, 65 %, 70%, 75%, 80%, 85%, 90%, 95%, 98%, 99% or more sequence identity with SEQ ID NO:9, SEQ ID NO:23, SEQ ID NO: 27, ID BR . SEQ.: 29, ID NO. SEQ.: 39, ID NO. SEQ.: 49, ID NO. SEQ.: 51, ID NO. SEQ.: 55 or ID NO. , 150, 200, 250, 300, 350, 400, 450, 500, 550, 600, 650, 700, 750, 800, 850, 900, 950, 1000, 1050, 1100, 1200, 1300, 1400 or more residues or a nucleic acid sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, 98%, 99% or greater sequence identity to SEQ ID NO:25 or SEQ ID NO:37 over at least about 100, 150, 200, 250, 300, 350, 400, 450, 500, 550, 600, 650, 700, 750, 800, 850, 900, 950, 1000, 1050, 1100, 1200, 1300, 1400 or more remains.
U jednom aspektu, izolirana ili rekombinantna nukleinska kiselina sadrži sekvencu nukleinske kiseline koja ima najmanje 99% identičnosti sekvence sa SEQ ID NO:1, SEQ ID NO:3, SEQ ID NO:5, SEQ ID NO:7, SEQ ID NO ::11 , SEQ ID NO:13, SEQ ID NO:15, SEQ ID NO:17, SEQ ID NO:19, SEQ ID NO:21, SEQ ID NO:31, SEQ ID NO:33, SEQ ID NO: 35 ID NO SEQ ID NO: 41, SEQ ID NO: 43, SEQ ID NO: 45, SEQ ID NO: 47, SEQ ID NO: 53, SEQ ID NO: 57, SEQ ID NO: 59, SEQ ID NO: 61 , SEQ ID NO: 57, SEQ ID NO: 59, SEQ ID NO: 61, SEQ ID NO: 57 ID NO:63, SEQ ID NO:67, SEQ ID NO:69, SEQ ID NO:71, SEQ ID NO:73, SEQ ID NO:75, SEQ ID NO:77 ili SEQ ID NO:79 u regiji barem oko 100 ostataka.
In one aspect, the isolated or recombinant nucleic acid comprises a nucleic acid having a sequence as set forth in SEQ ID NO:1, a nucleic acid having a sequence as set forth in SEQ ID NO:3, a nucleic acid having a sequence as set forth in SEQ ID NO:5, a nucleic acid having the sequence as set forth in SEQ ID NO:5, a nucleic acid having the sequence set forth in SEQ ID NO:7, a nucleic acid having the sequence set forth in SEQ ID NO:9, a nucleic acid having the sequence shown in SEQ ID NO:9 in SEQ ID NO:11, nucleic acid having the sequence shown in SEQ ID NO:13, nucleic acid having the sequence shown in SEQ ID NO:15, nucleic acid having the sequence shown in SEQ ID NO:15 ID NO:17, a nucleic acid having the sequence shown in SEQ ID NO:19, a nucleic acid having the sequence shown in SEQ ID NO:19 21, a nucleic acid having the sequence shown in SEQ ID NO::23 , a nucleic acid having the sequence shown in SEQ ID NO:25, a nucleic acid having the sequence shown in SEQ ID NO:27, a nucleic acid having the sequence shown in SEQ ID NO:29 a nucleic acid having the sequence shown in SEQ ID NO: 31, a nucleic acid having the sequence shown in SEQ ID NO:33, a nucleic acid having the sequence shown in SEQ ID NO:35, a nucleic acid having the sequence shown in SEQ ID NO:37, a nucleic acid having the sequence shown in SEQ ID NO : 39, nucleic acid having the sequence shown in SEQ ID NO: 41, nucleic acid having the sequence shown in SEQ ID NO: 43, nucleic acid having the sequence shown in SEQ ID NO: 45, nucleic acid having the sequence nucleic acid having the sequence shown in SEQ ID NO: 49, a nucleic acid having the sequence shown in SEQ ID NO: 51, a nucleic acid having the sequence shown in SEQ ID NO: 53 a nucleic acid having the sequence shown in SEQ ID NO: 53, a nucleic acid having the sequence shown in SEQ ID NO: 53 in SEQ ID NO: 55, a nucleic acid having the sequence shown in SEQ ID NO: 57, a nucleic acid having the sequence shown in SEQ ID NO:59, a nucleic acid having the sequence shown in SEQ ID NO :59, a nucleic acid having the sequence shown in SEQ ID NO:59 ID NO:59 in SEQ ID NO:61, a nucleic acid having the sequence shown in SEQ ID NO:63, a nucleic acid having the sequence shown in SEQ ID NO :65, a nucleic acid having the sequence shown in SEQ ID NO:65 ID NO:67, a nucleic acid having the sequence shown in SEQ ID NO: 69, a nucleic acid having the sequence shown in SEQ ID NO: 71, a nucleic acid having the sequence shown in SEQ ID NO: 73, a nucleic acid having the sequence shown in SEQ ID NO: 75, a nucleic acid having the sequence shown in SEQ ID NO:77 or a nucleic acid having the sequence shown in SEQ ID NO:79.
In one aspect, the nucleic acid sequence encodes a polypeptide comprising a polypeptide having the sequence set forth in SEQ ID NO:2, a polypeptide having the sequence set forth in SEQ ID NO:4, a polypeptide having the sequence set forth in SEQ ID NO:4, a polypeptide having the sequence shown in SEQ ID NO:6, the polypeptide having the sequence shown in SEQ ID NO:4, the sequence shown in SEQ ID NO:8, the polypeptide having the sequence shown in SEQ ID NO:10, the polypeptide having the sequence shown in SEQ ID NO:10:12, a polypeptide having the sequence shown in SEQ ID NO:14, a polypeptide having the sequence shown in SEQ ID NO:14 ID NO:16, a polypeptide having the sequence shown in SEQ ID NO: 18, a polypeptide having the sequence shown in SEQ ID NO: 20, a polypeptide having the sequence shown in SEQ ID NO: 22, a polypeptide having the sequence shown in SEQ ID NO: 24, a polypeptide having the sequence shown in SEQ ID NO: 24, shown in SEQ ID NO: 26, a polypeptide having the sequence shown in SEQ ID NO: 28, a polypeptide having the sequence shown in SEQ ID NO: 30, and a polypeptide having the sequence shown in SEQ ID NO:30, a polypeptide having the sequence shown in SEQ ID NO:: 32, a polypeptide having the sequence shown in SEQ ID NO:34, a polypeptide having the sequence shown in SEQ ID NO: 36, a polypeptide having the sequence shown in SEQ ID NO: 38, a polypeptide having the sequence shown in SEQ ID NO: 38, a polypeptide having the sequence shown in SEQ ID NO: 38, a polypeptide having the sequence shown in SEQ ID NO:38 in SEQ ID NO:40, a polypeptide having the sequence shown in SEQ ID NO:42, a polypeptide having the sequence shown in SEQ ID NO:44, a polypeptide having the sequence shown in SEQ ID NO:44, a polypeptide having the sequence shown in SEQ ID NO: 46, a polypeptide having the sequence shown in SEQ ID NO: 48, a polypeptide having the sequence shown in SEQ ID NO: 50, a polypeptide having the sequence shown in SEQ ID NO: 50, and a polypeptide having the sequence shown in SEQ ID NO: 52, a polypeptide having the sequence shown in SEQ ID NO: 52, a polypeptide having the sequence shown in SEQ ID NO: :54, a polypeptide having the sequence shown in SEQ ID NO: 56, a polypeptide having the sequence shown in SEQ ID NO: 58, a polypeptide having the sequence shown in SEQ ID NO: 60, a polypeptide having the sequence shown in SEQ ID NO: 62, a polypeptide having having the sequence shown in SEQ ID NO:64, the polypeptide having the sequence shown in SEQ ID NO:64, the polypeptide having the sequence shown in SEQ ID NO:66, the polypeptide having the sequence shown in SEQ ID NO:68, the polypeptide having the sequence shown in SEQ ID NO:70, a polypeptide having the sequence shown in SEQ ID NO:70, a polypeptide having the sequence shown in SEQ ID NO:72, a polypeptide having the sequence shown in SEQ ID NO:74, a polypeptide having the sequence shown in SEQ ID NO:76, a polypeptide having the sequence shown in SEQ ID NO:76 NO:78 or a polypeptide having the sequence shown in SEQ ID NO:80.
In one aspect, the sequence comparison algorithm is the BLAST version 2.2.2 algorithm, where the filter setting is set to blastall -p blastp -d "no pataa"-F F and all other options are set to default values.
In one embodiment, the epoxide hydrolase activity includes catalyzing the addition of water to the oxirane compound. Epoxide hydrolase activity may further involve the formation of the corresponding diol. Epoxide hydrolase activity may further involve the formation of an enantiomerically enriched epoxide. The oxirane compound may include an epoxide or an arenium oxide. The oxirane compound or the corresponding diol may be optically active. In one embodiment, the oxirane compound or corresponding diol is enantiomerically pure. Epoxide hydrolase activity can be enantioselective.
In one aspect, the epoxide hydrolase activity is thermostable. The polypeptide can retain epoxide hydrolase activity under conditions including a temperature range of about 37°C to about 70°C. In one aspect, the epoxide hydrolase activity is thermotolerant. The polypeptide can retain epoxide hydrolase activity when exposed to temperatures ranging from above 37°C to about 90°C. In one embodiment, the polypeptide retains epoxide hydrolase activity when exposed to temperatures in the range of greater than 37°C to about 50°C.
The invention provides an isolated or recombinant nucleic acid, wherein the nucleic acid comprises a sequence that hybridizes under stringent conditions to a nucleic acid comprising the sequence set forth in SEQ ID NO:1, the sequence set forth in SEQ ID NO:3, the sequence set forth in SEQ ID NO:5 , the sequence shown in SEQ ID NO:7, the sequence shown in SEQ ID NO:9, the sequence shown in SEQ ID NO:11, the sequence shown in SEQ ID NO:13, the sequence shown in SEQ ID NO:15, the sequence shown in SEQ ID NO:17, sequence shown in SEQ ID NO:19, sequence shown in SEQ ID NO:21, sequence shown in SEQ ID NO:23, sequence shown in SEQ ID NO:25, sequence shown in SEQ ID NO:27, the sequence shown in SEQ ID NO:27 in SEQ ID NO:29, the sequence shown in SEQ ID NO:31, the sequence shown in SEQ ID NO:33, the sequence shown in SEQ ID NO: 35, the sequence shown in SEQ ID NO: 35 SEQ ID NO: 37, SEQ ID NO: 39, SEQ ID NO: 41, SEQ ID NO: 43, SEQ ID NO: 45, SEQ ID NO: 47, sequence shown in SEQ ID NO: 49, sequence shown in SEQ ID NO: 51, sequence shown in SEQ ID NO: 53 sequence shown in SEQ ID NO: 55, sequence shown in SEQ ID NO: 57, sequence shown in SEQ ID NO:59, sequence shown in SEQ ID NO:61, sequence shown in SEQ ID NO:63, the sequence shown in SEQ ID NO:65, the sequence shown in SEQ ID NO:67, the sequence shown in SEQ ID NO:69, the sequence shown in SEQ ID NO:71, the sequence shown in SEQ ID NO: 73, the sequence set forth in SEQ ID NO:75, the sequence set forth in SEQ ID NO:77, or the sequence set forth in SEQ ID NO: 77 further to SEQ ID NO: 79, wherein the nucleic acid encodes a polypeptide having epoxide hydrolase activity. In alternative aspects, the nucleic acid has at least about 100, 150, 200, 250, 300, 350, 400, 450, 500, 550, 600, 650, 700, 750, 800, 850, 900, 950, 1000, 1050, 1100 , 1200, 1300, 1400 or more residues or the full length of the gene or transcript. In one embodiment, the stringent conditions include a washing step comprising washing in 0.2 x SSC at about 65°C for about 15 minutes.
The invention provides a nucleic acid probe for identifying a nucleic acid encoding a polypeptide having epoxide hydrolase activity, wherein the probe comprises at least 10 consecutive bases of the sequence shown in SEQ ID NO:1, the sequence shown in SEQ ID NO:1, the sequence shown in SEQ ID NO :3, sequence shown in SEQ ID NO:5, sequence shown in SEQ ID NO:7, sequence shown in SEQ ID NO:9, sequence shown in SEQ ID NO:11, sequence shown in SEQ ID NO:13, sequence shown in SEQ ID NO: 15, the sequence shown in SEQ ID NO:17, the sequence shown in SEQ ID NO:19 the sequence shown in SEQ ID NO:21, the sequence shown in SEQ ID NO:23, the sequence shown in SEQ ID NO:25 , the sequence shown in SEQ ID NO:25 ID NO:27, the sequence shown in SEQ ID NO:29, the sequence shown in SEQ ID NO:31, the sequence shown in SEQ ID NO:33, the sequence shown in SEQ ID NO:35, the sequence shown in SEQ ID NO:37, the sequence shown in SEQ ID NO:39, the sequence shown in SEQ ID NO:41, the sequence shown in SEQ ID NO:43, the sequence shown in SEQ ID NO:43 in SEQ ID NO:45 , the sequence shown in SEQ ID NO: 47, the sequence shown in SEQ ID NO:49, the sequence shown in SEQ ID NO:51, the sequence shown in SEQ ID NO:51 ID NO:53, the sequence shown in SEQ ID NO:55, sequence shown in SEQ ID NO:57, sequence shown in SEQ ID NO:59, sequence shown in SEQ ID NO::61, sequence shown in SEQ ID NO:63, sequence shown in SEQ ID NO:65, sequence shown in SEQ ID NO:67, sequence shown in SEQ ID NO:69 sequence shown in SEQ ID NO:71, sequence shown in SEQ ID NO:73, sequence shown in SEQ ID NO:75, sequence shown in SEQ ID NO:77 or sequence shown in SEQ ID NO:79 where the probe identifies the nucleic acid by binding or hybridization. In alternative aspects, the probe comprises an oligonucleotide comprising at least about 10 to 50, about 20 to 60, about 30 to 70, about 40 to 80, or about 60 to 100 contiguous bases of the sequences of the invention.
The invention provides a nucleic acid probe for the identification of a nucleic acid encoding a polypeptide having epoxide hydrolase activity, wherein the probe may contain a nucleic acid of the invention, for example a nucleic acid sequence having at least 50% sequence identity with SEQ ID NO:1, SEQ ID NO :3, SEQ ID NO:5, SEQ ID NO:7, SEQ ID NO:11, SEQ ID NO:13, SEQ ID NO:15, SEQ ID NO:17, SEQ ID NO:19, SEQ ID NO: 21 , SEQ ID NO: 21 SEQ ID NO: 31, SEQ ID NO: 33, SEQ ID NO: 35, SEQ ID NO: 41, SEQ ID NO: 43, SEQ ID NO: 45, SEQ ID NO: 47, SEQ ID NO: 43, SEQ ID NO: 45, SEQ ID NO: 47, SEQ ID NO: 43 ID NO: 53, ID NO. SEQ.: 57, ID NO. SEQ.: 59, ID NO. SEQ.: 61, ID NO. SEQ.: 63, ID NO. 73, SEQ ID NO: 75, SEQ ID NO: 77 or SEQ ID NO: 79 over at least about 100 residues, a nucleic acid sequence having at least 60% sequence identity to SEQ ID NO: 9, SEQ ID NO: 23, ID NO . SEQ.: 27, ID NO. SEQ.: 29, ID NO. SEQ.: 39, ID NO. SEQ.: 49, ID NO. at least about 100 residues, or a nucleic acid sequence having at least 70% sequence identity to SEQ ID NO: 25 or SEQ ID NO: 37 over at least about 100 residues, wherein the sequence identities are determined by analysis with a sequence comparison algorithm or control visual. In alternative aspects, the probe comprises an oligonucleotide comprising at least about 10 to 50, about 20 to 60, about 30 to 70, about 40 to 80, or about 60 to 100 contiguous nucleic acid bases of the invention, e.g., nucleic acid. the sequence shown in SEQ ID NO:1 or a subsequence thereof, the sequence shown in SEQ ID NO:3 or a subsequence thereof, the sequence shown in SEQ ID NO:5 or a subsequence thereof, the sequence shown in SEQ ID NO:7 or a subsequence thereof, the sequence shown in SEQ ID NO: 9 or its subsequence, the sequence shown in SEQ ID NO: 11 or its subsequence, the sequence shown in SEQ ID NO: 13 or its subsequence, the sequence shown in SEQ ID NO:15 or its subsequence, the sequence shown in SEQ ID NO:17 or its subsequence, the sequence shown in SEQ ID NO:19 or its subsequence, the sequence shown in SEQ ID NO:21 or its subsequence, the sequence shown in SEQ ID NO: 23, or its subsequence, the sequence shown in SEQ ID NO : 25, or its subsequence, the sequence shown in SEQ ID NO: 27 or its subsequence, the sequence shown in SEQ ID NO:29 or its subsequence, the sequence shown in SEQ ID NO:31 or its subsequence, the sequence shown in SEQ ID NO: 33 or a subsequence thereof, the sequence shown in SEQ ID NO: 35 or a subsequence thereof, the sequence shown in SEQ ID NO: 37 or a subsequence thereof, the sequence shown in SEQ ID NO: 39 or a subsequence thereof, the sequence shown in SEQ ID NO: 41 or its subsequence, the sequence shown in SEQ ID NO:43 or a subsequence thereof, the sequence shown in SEQ ID NO:45 or a subsequence thereof, the sequence shown in SEQ ID NO:45 NO:47 or a subsequence thereof, the sequence shown in SEQ ID NO: 51 or its subsequence SEQ ID NO: 53 or its subsequence, SEQ ID NO: 53 or its subsequence, SEQ ID NO: 57 or its subsequence, SEQ ID NO: 59 or its subsequence, the sequence listed in SEQ ID NO: 61 or its subsequence , the sequence set forth in SEQ ID NO: 63 or a subsequence thereof, the sequence set forth in SEQ ID NO: 65 or a subsequence thereof, the sequence as set forth in SEQ ID NO:67 or a subsequence thereof, the sequence set forth in SEQ ID NO:69, or its subsequence, SEQ ID NO:71, or its subsequence, SEQ ID NO:73, or its subsequence, SEQ ID NO:75, or its subsequence, SEQ ID NO:77, or its subsequence, SEQ ID NO:79, or its subsequence.
A probe can comprise a nucleic acid sequence having at least 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or more sequence identity to a region of at least about 50, 100 , 150, 200, 250, 300, 350, 400, 450, 500, 550, 600, 650, 700, 750, 800, 850, 900, 950, 1000, 1050, 1100, 1200, 13,00 0 or more residues nucleic acids containing the sequence listed in SEQ ID NO:1 or its subsequence, the sequence listed in SEQ ID NO:3 or its subsequence, the sequence listed in SEQ ID NO:5 or its subsequence, the sequence listed in SEQ ID NO:7 or its subsequence subsequence, the sequence shown in SEQ ID NO:9 or a subsequence thereof, the sequence shown in SEQ ID NO:11 or a subsequence thereof, the sequence as shown in SEQ ID NO:13 or a subsequence thereof, the sequence as shown in SEQ ID NO:15 or a subsequence thereof, a sequence as set forth in SEQ ID NO:17 or a subsequence thereof, a sequence as set forth in SEQ ID NO:19 or a subsequence thereof, a sequence as set forth in SEQ ID NO:21 or a subsequence thereof, a sequence as set forth in SEQ ID NO:23 or a subsequence thereof a subsequence, a sequence set forth in SEQ ID NO:25 or a subsequence thereof, a sequence set forth in SEQ ID NO:27 or a subsequence thereof, a sequence set forth in SEQ ID NO:29 or a subsequence thereof subsequence, the sequence shown in SEQ ID NO:31 or a subsequence thereof, the sequence shown in SEQ ID NO:33 or a subsequence thereof, the sequence shown in SEQ ID NO:35 or a subsequence thereof, the sequence shown in SEQ ID NO:37 or a subsequence thereof, the sequence shown in SEQ ID NO:39 or a subsequence thereof, SEQ ID NO:41 or a subsequence thereof, SEQ ID NO:43 or a subsequence thereof, SEQ ID NO:45 or a subsequence thereof SEQ ID NO:47 or a subsequence thereof, SEQ ID NO :47 or its subsequence, SEQ ID NO:47 or its subsequence ID NO:51 or its subsequence, the sequence shown in SEQ ID NO:53 or its subsequence the sequence shown in SEQ ID NO: 55 or its subsequence, the sequence shown in SEQ ID NO: 57 or a subsequence thereof, the sequence shown in SEQ ID NO: :59 or a subsequence thereof, the sequence shown in SEQ ID NO: 61 or a subsequence thereof, the sequence shown in SEQ ID NO: 63 or a subsequence thereof, the sequence shown in SEQ ID NO : 65, or a subsequence thereof, the sequence shown in SEQ ID NO: 67 or a subsequence thereof, the sequence shown in SEQ ID NO: 69 or a subsequence of SEQ ID NO:71 or a subsequence of SEQ ID NO:71 or a subsequence of SEQ ID NO:73 or a subsequence SEQ ID NO:75 or a subsequence of SEQ ID NO:77 or a subsequence thereof a subsequence of SEQ ID NO: 79 or a subsequence thereof.
The invention provides a pair of primer sequences for amplifying a nucleic acid encoding a polypeptide having epoxide hydrolase activity, wherein the primer pair is capable of amplifying a nucleic acid of the invention, for example, the sequence shown in SEQ ID NO:1 or its subsequence, the sequence shown in SEQ ID NO: 1 NO:3 or its subsequence, the sequence shown in SEQ ID NO:5 or its subsequence, the sequence shown in SEQ ID NO:7 or its subsequence, the sequence shown in SEQ ID NO:9 or its subsequence, the sequence shown in SEQ ID NO : 11 or its subsequence, the sequence shown in SEQ ID NO::13 or its subsequence, the sequence shown in SEQ ID NO:15 or its subsequence, the sequence shown in SEQ ID NO: 17 or its subsequence, the sequence shown in SEQ ID NO: 13 NO: 19 or its subsequence, the sequence shown in SEQ ID NO: 21 or its subsequence, the sequence shown in SEQ ID NO:23 or its subsequence, the sequence shown in SEQ ID NO:23 ID NO:25 or its subsequence, the sequence shown in SEQ ID NO:27 or a subsequence thereof, the sequence shown in SEQ ID NO:29 or a subsequence thereof, the sequence shown in SEQ ID NO:31 or a subsequence thereof, the sequence shown in SEQ ID NO:33 or a subsequence thereof, the sequence shown in SEQ ID NO:35 or a subsequence thereof, the sequence shown in SEQ ID NO: 35 in SEQ ID NO: 37 or a subsequence thereof, the sequence shown in SEQ ID NO: 39 or a subsequence thereof, the sequence shown in SEQ ID NO: 41 or a subsequence thereof , the sequence as set forth in SEQ ID NO: 43 or a subsequence thereof, the sequence as set forth in SEQ ID NO: 45 or a subsequence thereof, the sequence as set forth in SEQ ID NO: 47 or a subsequence thereof, the sequence as set forth in SEQ ID NO: 51 or its subsequence, the sequence shown in SEQ ID NO: 53 or its subsequence, the sequence shown in SEQ ID NO: 55 or its subsequence, the sequence shown in SEQ ID NO: 57 or its subsequence, the sequence shown in SEQ ID NO: 59 or a subsequence thereof, the sequence of SEQ ID NO:61 or a subsequence thereof, SEQ ID NO:63 or a subsequence thereof, SEQ ID NO:65 or a subsequence thereof, SEQ ID NO:67 or a subsequence thereof, the sequence set forth in SEQ ID NO: 65 or its subsequence SEQ ID NO:69 or its subsequence SEQ ID NO:71 or its subsequence SEQ ID NO:73 or its subsequence SEQ ID NO:75 or its subsequence SEQ ID NO:77 or its subsequence SEQ ID NO:79 or its subsequence. In one embodiment, each member of a pair of amplification primer sequences comprises an oligonucleotide comprising at least about 10 to 50 consecutive bases of sequence.
The invention provides methods of amplifying a nucleic acid encoding a polypeptide having epoxide hydrolase activity, which includes amplifying a template nucleic acid with a pair of amplification primer sequences capable of amplifying a nucleic acid of the invention, e.g., SEQ ID NO:1 or a subsequence thereof, the sequence shown in SEQ ID NO: 3 or its subsequence, the sequence shown in SEQ ID NO: 5 or its subsequence, the sequence shown in SEQ ID NO: 5 in SEQ ID NO: 7 or its subsequence, the sequence shown in SEQ ID NO:9 or its subsequence, the sequence shown in SEQ ID NO:9 SEQ ID NO:1 or a subsequence thereof, a sequence as shown in SEQ ID NO:13 or a subsequence thereof, a sequence as shown in SEQ ID NO:15 or a subsequence thereof, a sequence as shown in SEQ ID NO:17 or a subsequence thereof, the sequence as shown in SEQ ID NO:19 or a subsequence thereof, the sequence shown in SEQ ID NO:21 or a subsequence thereof, the sequence shown in SEQ ID NO:23 or a subsequence thereof, the sequence shown in SEQ ID NO:25 or its subsequence, the sequence shown in SEQ ID NO:27 or its subsequence, the sequence shown in SEQ ID NO:29 or its subsequence, the sequence shown in SEQ ID NO:31 or its subsequence, the sequence shown in SEQ ID NO:33 or its subsequence, the sequence shown in SEQ ID NO:35 or its subsequence, the sequence shown in SEQ ID NO:37 or its subsequence, the sequence shown in SEQ ID NO:39 or its subsequence, the sequence shown in SEQ ID NO: 41 or its subsequence the sequence shown in SEQ ID NO:43 or its subsequence, the sequence shown in SEQ ID NO:45 or its subsequence, the sequence shown in SEQ ID NO:47 or its subsequence the sequence shown in SEQ ID NO:51 or its subsequence , the sequence shown in SEQ ID NO: 53 or a subsequence thereof, the sequence shown in SEQ ID NO: 55 or a subsequence thereof, the sequence shown in SEQ ID NO: 57 or a subsequence thereof, the sequence shown in SEQ ID NO: 59, or a subsequence thereof , the sequence shown in SEQ ID NO: 61, or a subsequence thereof the sequence shown in SEQ ID NO: 63 or a subsequence thereof, the sequence shown in SEQ ID NO:65 or a subsequence thereof, the sequence shown in SEQ ID NO:67 or a subsequence thereof, the sequence shown in SEQ ID NO:69 or its subsequence, SEQ ID NO:71 or its subsequence, SEQ ID NO:73, or its subsequence, SEQ ID NO:75, or its subsequence, SEQ ID NO:77, or its subsequence, SEQ ID NO:79, or a subsequence thereof.
The invention provides expression cassettes containing a nucleic acid of the invention, eg a nucleic acid sequence having at least 50% sequence identity with SEQ ID NO:1, SEQ ID NO:3, SEQ ID NO:5, SEQ ID NO:7 SEQ ID BR : 7 11, SEQ ID NO.: 13, SEQ ID NO.: 15, SEQ ID NO.: 17, SEQ ID NO.: 19, SEQ ID NO.: 21, SEQ ID NO.: 31, SEQ ID NO. : 33, SEQ ID NO. ID NO: 35, ID NO. SEQ.: 41, ID NO. SEQ.: 43, ID NO. SEQ.: 45, ID NO. SEQ.: 47, ID NO. SEQ.: 53, ID NO. :61, SEQ ID NO:63, SEQ ID NO:67, SEQ ID NO:69, SEQ ID NO:71, SEQ ID NO:73, SEQ ID NO:75, SEQ ID NO:77 or SEQ ID NO:79 in a region of at least about 100 residues, a nucleic acid sequence having at least 60% sequence identity to SEQ ID NO: 9, SEQ ID NO: 23, SEQ ID NO: 27, SEQ ID NO: 29, SEQ ID NO: 39, SEQ ID NO:49, SEQ ID NO:51, SEQ ID NO:55 or SEQ ID NO:65 over at least about 100 residues, or a nucleic acid sequence having at least 70% sequence identity to SEQ ID NO:25 or SEQ ID NO: 37 over at least about 100 residues, wherein sequence identities are determined by analysis using a sequence comparison algorithm or by visual inspection; or a nucleic acid that hybridizes under strict conditions to a nucleic acid containing the sequence listed in SEQ ID NO:1 or its subsequence, the sequence listed in SEQ ID NO:3 or its subsequence, the sequence listed in SEQ ID NO: 5 or its subsequence, the sequence of SEQ ID NO: 7 or its subsequence, SEQ ID NO: 9 or its subsequence, SEQ ID NO: 11 or its subsequence, SEQ ID NO: 13 or its subsequence, the sequence shown in SEQ ID NO:11 or its subsequence SEQ ID NO:15 or its subsequence, the sequence listed in SEQ ID NO:17, or its subsequence, the sequence listed in SEQ ID NO:19, or its subsequence, the sequence listed in SEQ ID NO:21, or its subsequence, the sequence listed in SEQ ID NO:21:23 or its subsequence, the sequence shown in SEQ ID NO:25 or its subsequence, the sequence shown in SEQ ID NO: 27 or its subsequence, the sequence shown in SEQ ID NO: 29 or its subsequence, the sequence shown in SEQ ID NO: 31 or its subsequence, the sequence shown in SEQ ID NO:33, or its subsequence, the sequence shown in SEQ ID NO:35 or its subsequence, the sequence shown in SEQ ID NO:37 or its subsequence, the sequence shown in SEQ ID NO:39, or its subsequence, the sequence shown in SEQ ID NO: 41 or its subsequence, the sequence shown in SEQ ID NO: 43 or its subsequence, the sequence shown in SEQ ID NO: 45 or its subsequence, the sequence shown in SEQ ID NO : 47 or its subsequence subsequence, the sequence shown in SEQ ID NO: 51 or its subsequence, the sequence shown in SEQ ID NO: 53 or its subsequence, the sequence shown in SEQ ID NO :55 or its subsequence, the sequence shown in SEQ ID NO: 57 or its subsequence, the sequence shown in SEQ ID NO:59 or its subsequence, the sequence shown in SEQ ID NO:61 or its subsequence, the sequence shown in SEQ ID NO:63 or its subsequence, the sequence shown in SEQ ID NO:63 ID NO:65 or a subsequence thereof, the sequence set forth in SEQ ID NO:67, or a subsequence thereof, the forward set of sequences in SEQ ID NO:69, or a subsequence thereof, SEQ ID NO:71, or a subsequence thereof SEQ ID NO:73, or its subsequence SEQ ID NO:75, or its subsequence SEQ ID NO:77 or its subsequence SEQ ID NO: 79 or its subsequence.
The invention provides vectors containing a nucleic acid of the invention, eg a nucleic acid sequence having at least 50% sequence identity to SEQ ID NO:1, SEQ ID NO:3, SEQ ID NO:5, SEQ ID NO:7, SEQ ID NO : 11, SEQ ID NO.: 13, SEQ ID NO.: 15, SEQ ID NO.: 17, SEQ ID NO.: 19, SEQ ID NO.: 21, SEQ ID NO.: 31, SEQ ID NO.: 33, SEQ ID NO. ID NO.: 35, ID NO. SEQ.: 41, ID NO. SEQ.: 43, ID NO. SEQ.: 45, ID NO. SEQ.: 47, ID NO. SEQ.: 53, ID NO. 61, SEQ ID NO:63, SEQ ID NO:67, SEQ ID NO:69, SEQ ID NO:71, SEQ ID NO:73, SEQ ID NO:75, SEQ ID NO:77 or SEQ ID NO:79 in a region of at least about 100 residues, a nucleic acid sequence having at least 60% sequence identity to SEQ ID NO:9, SEQ ID NO:23, SEQ ID NO:27, SEQ ID NO:29, SEQ ID NO:39, SEQ ID NO:49, SEQ ID NO:51, SEQ ID NO:55 or SEQ ID NO:65 over at least about 100 residues, or a nucleic acid sequence having at least 70% sequence identity to SEQ ID NO:25 or SEQ ID NO:25 . SEQ.: 37 over at least about 100 residues, wherein sequence identities are determined by analysis using a sequence comparison algorithm or by visual inspection; or a nucleic acid that hybridizes under strict conditions to a nucleic acid containing the sequence listed in SEQ ID NO:1 or its subsequence, the sequence listed in SEQ ID NO:3 or its subsequence, the sequence listed in SEQ ID NO: 5 or its subsequence, the sequence of SEQ ID NO: 7 or its subsequence, SEQ ID NO: 9 or its subsequence, SEQ ID NO: 11 or its subsequence, SEQ ID NO: 13 or its subsequence, the sequence shown in SEQ ID NO:11 or its subsequence SEQ ID NO:15 or its subsequence, the sequence listed in SEQ ID NO:17, or its subsequence, the sequence listed in SEQ ID NO:19, or its subsequence, the sequence listed in SEQ ID NO:21, or its subsequence, the sequence listed in SEQ ID NO:21:23 or its subsequence, the sequence shown in SEQ ID NO:25 or its subsequence, the sequence shown in SEQ ID NO: 27 or its subsequence, the sequence shown in SEQ ID NO: 29 or its subsequence, the sequence shown in SEQ ID NO: 31 or its subsequence, the sequence shown in SEQ ID NO:33, or its subsequence, the sequence shown in SEQ ID NO:35 or its subsequence, the sequence shown in SEQ ID NO:37 or its subsequence, the sequence shown in SEQ ID NO:39, or its subsequence, the sequence shown in SEQ ID NO: 41 or its subsequence, the sequence shown in SEQ ID NO: 43 or its subsequence, the sequence shown in SEQ ID NO: 45 or its subsequence, the sequence shown in SEQ ID NO : 47 or its subsequence subsequence, the sequence shown in SEQ ID NO: 51 or its subsequence, the sequence shown in SEQ ID NO: 53 or its subsequence, the sequence shown in SEQ ID NO :55 or its subsequence, the sequence shown in SEQ ID NO: 57 or its subsequence, the sequence shown in SEQ ID NO:59 or its subsequence, the sequence shown in SEQ ID NO:61 or its subsequence, the sequence shown in SEQ ID NO:63 or its subsequence, the sequence shown in SEQ ID NO:63 ID NO:65 or a subsequence thereof, the sequence set forth in SEQ ID NO:67, or a subsequence thereof, the forward set of sequences in SEQ ID NO:69, or a subsequence thereof, SEQ ID NO:71, or a subsequence thereof SEQ ID NO:73, or its subsequence SEQ ID NO:75, or its subsequence SEQ ID NO:77 or its subsequence SEQ ID NO: 79 or its subsequence.
The invention provides a cloning medium containing a vector of the invention, wherein the cloning medium contains a viral vector, plasmid, phage, phagemid, cosmid, cosmid, bacteriophage or artificial chromosome. A viral vector may include an adenoviral vector, a retroviral vector, or an adeno-associated viral vector. The cloning medium may include a bacterial artificial chromosome (BAC), a plasmid, a bacteriophage P1-derived vector (PAC), a yeast artificial chromosome (YAC), or a mammalian artificial chromosome (MAC).
The invention provides transformed cells containing a vector, wherein the vector contains a nucleic acid of the invention, eg a sequence having at least 50% sequence identity with SEQ ID NO:1, SEQ ID NO:3, SEQ ID NO:5, SEQ ID NO: 5, SEQ ID NO: 5, NO: 7, SEQ. NO: 11, SEQ. NO: 13, SEQ. NO: 15, SEQ. NO: 17, SEQ. NO: 19, SEQ. NO: 21, SEQ. NO: 31, SEQ. NO: ID NO: 33, ID NO: SEQ.: 35, ID NO. SEQ.: 41, ID NO. SEQ.: 43, ID NO. SEQ.: 45, ID NO. 59, SEQ ID NO: 61, SEQ ID NO: 63, SEQ ID NO: 67, SEQ ID NO: 69, SEQ ID NO: 71, SEQ ID NO: 73, SEQ ID NO: 75, SEQ ID NO: 77, or SEQ ID NO:79 over a region of at least about 100 residues, a nucleic acid sequence having at least 60% sequence identity to SEQ ID NO:9, SEQ ID NO:23, SEQ ID NO:27, SEQ ID NO: 29, SEQ ID NO:39, SEQ ID NO:49, SEQ ID NO:51, SEQ ID NO:55 or SEQ ID NO:65 over at least about 100 residues, or a nucleic acid sequence having at least 70% sequence identity to SEQ ID NO: :25 or SEQ ID NO:37 over a region of at least about 100 residues, wherein sequence identities are determined by analysis using a sequence comparison algorithm or by visual inspection; or a nucleic acid that hybridizes under strict conditions to a nucleic acid containing the sequence listed in SEQ ID NO:1 or its subsequence, the sequence listed in SEQ ID NO:3 or its subsequence, the sequence listed in SEQ ID NO: 5 or its subsequence, sequence SEQ ID NO: 7 or its subsequence, SEQ ID NO: 9 or its subsequence, SEQ ID NO: 1 or its subsequence, SEQ ID NO: 13 or its subsequence, the sequence listed in SEQ ID NO:1 or its subsequence SEQ ID NO:15 or its subsequence, the sequence shown in SEQ ID NO:17, or its subsequence, the sequence shown in SEQ ID NO: 19, or its subsequence, the sequence shown in SEQ ID NO:21, or its subsequence, the sequence shown in SEQ ID NO:21:23 or its subsequence, the sequence shown in SEQ ID NO:25 or its subsequence, the sequence shown in SEQ ID NO:27 or its subsequence, the sequence shown in SEQ ID NO:29 or its subsequence, the sequence shown in SEQ ID NO:31 or its subsequence, the sequence shown in SEQ ID NO:33, or its subsequence, the sequence shown in SEQ ID NO:35 or its subsequence, the sequence shown in SEQ ID NO:37 or its subsequence, the sequence shown in SEQ ID NO:39, or its subsequence, the sequence set forth in SEQ ID NO: 41 or its subsequence, the sequence set forth in SEQ ID NO: 43 or its subsequence, the sequence set forth in SEQ ID NO: 45 or its subsequence, the sequence set forth in SEQ ID NO : 47 or its subsequence subsequence, the sequence shown in SEQ ID NO: 51 or its subsequence, the sequence shown in SEQ ID NO: 53 or its subsequence, the sequence shown in SEQ ID NO:55 or its subsequence, the sequence shown in SEQ ID NO: 57 or its subsequence, the sequence shown in SEQ ID NO:59 or its subsequence, the sequence shown in SEQ ID NO:61 or its subsequence, the sequence shown in SEQ ID NO:63 or its subsequence, the sequence shown in SEQ ID NO:63 ID NO:65 or its subsequence, the sequence set forth in SEQ ID NO:67, or its subsequence, the sequence set forth in SEQ ID NO:69, or its subsequence, SEQ ID NO:71, or its subsequence SEQ ID NO:73, or its subsequence of SEQ ID NO:75, or subsequence of SEQ ID NO:77 or subsequence of SEQ ID NO: 79 or subsequence thereof.
The invention provides transformed cells containing a nucleic acid of the invention, eg a nucleic acid sequence having at least 50% sequence identity with SEQ ID NO:1, SEQ ID NO:3, SEQ ID NO:5, SEQ ID NO:7 SEQ ID BR : 7 11, SEQ ID NO.: 13, SEQ ID NO.: 15, SEQ ID NO.: 17, SEQ ID NO.: 19, SEQ ID NO.: 21, SEQ ID NO.: 31, SEQ ID NO. : 33, SEQ ID NO. ID NO: 35, ID NO. SEQ.: 41, ID NO. SEQ.: 43, ID NO. SEQ.: 45, ID NO. SEQ.: 47, ID NO. SEQ.: 53, ID NO. :61, SEQ ID NO:63, SEQ ID NO:67, SEQ ID NO:69, SEQ ID NO:71, SEQ ID NO:73, SEQ ID NO:75, SEQ ID NO:77 or SEQ ID NO:79 in a region of at least about 100 residues, a nucleic acid sequence having at least 60% sequence identity to SEQ ID NO: 9, SEQ ID NO: 23, SEQ ID NO: 27, SEQ ID NO: 29, SEQ ID NO: 39, SEQ ID NO:49, SEQ ID NO:51, SEQ ID NO:55 or SEQ ID NO:65 over at least about 100 residues, or a nucleic acid sequence having at least 70% sequence identity to SEQ ID NO:25 or SEQ ID NO: 37 over at least about 100 residues, wherein the sequence identities are determined by analysis with a sequence comparison algorithm or visual inspection, or a nucleic acid that hybridizes under stringent conditions to a nucleic acid comprising the sequence set forth in SEQ ID NO:1 or a subsequence thereof, the sequence set forth in SEQ ID NO:3 or its subsequence, the sequence set forth in SEQ ID NO:5 or its subsequence, the sequence set forth in SEQ ID NO:7 or its subsequence, the sequence shown in SEQ ID NO:9 or its subsequence subsequence, the sequence shown in SEQ ID NO::11 or a subsequence thereof, the sequence shown in SEQ ID NO:13 or a subsequence, the sequence shown in SEQ ID NO:15 or a subsequence thereof, the sequence shown in SEQ ID NO:11 NO:17 or a subsequence thereof, the sequence shown in SEQ ID NO:19 or its subsequence, the sequence shown in SEQ ID NO:21 or its subsequence, the sequence shown in SEQ ID NO:21 ID NO:23 or its subsequence, the sequence shown in SEQ ID NO:23 :25 or its subsequence, the sequence shown in SEQ ID NO: 27 or a subsequence thereof, the sequence shown in SEQ ID NO:29 or a subsequence thereof, the sequence shown in SEQ ID NO:31 or a subsequence thereof, the sequence shown in SEQ ID NO:33 or a subsequence thereof, the sequence shown in SEQ ID NO:33 or a subsequence thereof, the sequence set forth in SEQ ID NO:37 or a subsequence thereof, the sequence set forth in SEQ ID NO:39 or a subsequence thereof, the sequence as set forth in SEQ ID NO:41 or a subsequence thereof, the sequence shown in SEQ ID NO:43 or a subsequence thereof, the sequence shown in SEQ ID NO:45 or a subsequence thereof, the sequence shown in SEQ ID NO:47 or a subsequence thereof, the sequence shown in SEQ ID NO: 51 or a subsequence thereof, the sequence shown in SEQ ID NO: 53 or a subsequence thereof, the sequence shown in SEQ ID NO: 55 or a subsequence thereof, the sequence shown in SEQ ID NO:57 or a subsequence thereof, the sequence shown in SEQ ID NO:59 or a subsequence thereof, the sequence shown in SEQ ID NO:61 or its subsequence, the sequence shown in SEQ ID NO:63 or its subsequence, the sequence shown in SEQ ID NO: 65 or its subsequence, the sequence shown in SEQ ID NO: :67 or its subsequence, the sequence shown in SEQ ID NO: 69 or its subsequence SEQ ID NO:71, or its subsequence SEQ ID NO:73 or its subsequence, SEQ ID NO:75, or its subsequence, SEQ ID NO:77, or its subsequence, SEQ ID NO:79, or its subsequence. In one aspect, the cell is a bacterial cell, mammalian cell, fungal cell, yeast cell, insect cell, or plant cell.
The invention provides transgenic non-human animals comprising a nucleic acid of the invention, eg a nucleic acid sequence having at least 50% sequence identity to SEQ ID NO:1, SEQ ID NO:3, SEQ ID NO:5, SEQ ID NO:1, SEQ ID NO:1 ID NO:3, SEQ ID NO:5, SEQ ID NO:1 ID NO:7, ID NO:7, SEQ ID NO:1 SEQ.: 13, ID NO. SEQ.: 15, ID NO. SEQ.: 17, ID NO. 33, SEQ ID NO:35, SEQ ID NO:41, SEQ ID NO:43, SEQ ID NO:45, SEQ ID NO:47, SEQ ID NO:53, SEQ ID NO:57, SEQ ID NO:59, SEQ ID NO:61, SEQ ID NO:63, SEQ ID NO:67, SEQ ID NO:69, SEQ ID NO:71, SEQ ID NO:73, SEQ ID NO:75, SEQ ID NO:77 or SEQ ID NO:73, SEQ ID NO:75, SEQ ID NO:77 or SEQ ID NO:79 in a region of at least about 100 residues, a nucleic acid sequence having at least 60% sequence identity to SEQ ID NO:9, SEQ ID NO: 23, SEQ ID NO: 27, SEQ ID NO: 29, SEQ ID NO: 39, SEQ ID NO: 49, SEQ ID NO: 51, SEQ ID NO: 55 or SEQ ID NO: 65 in a region comprising at least about 100 residues or a nucleic acid sequence that contains at least 70% sequence identity to SEQ ID NO: 25 or SEQ ID NO: 37 over at least about 100 residues, wherein the sequence identities are determined by analysis using a sequence comparison algorithm or visual inspection; or a nucleic acid that hybridizes under strict conditions to a nucleic acid containing the sequence listed in SEQ ID NO:1 or its subsequence, the sequence listed in SEQ ID NO:3 or its subsequence, the sequence listed in SEQ ID NO: 5 or its subsequence, the sequence of SEQ ID NO: 7 or its subsequence, SEQ ID NO: 9 or its subsequence, SEQ ID NO: 11 or its subsequence, SEQ ID NO: 13 or its subsequence, the sequence shown in SEQ ID NO:11 or its subsequence SEQ ID NO:15 or its subsequence, the sequence listed in SEQ ID NO:17, or its subsequence, the sequence listed in SEQ ID NO:19, or its subsequence, the sequence listed in SEQ ID NO:21, or its subsequence, the sequence listed in SEQ ID NO:21:23 or its subsequence, the sequence shown in SEQ ID NO:25 or its subsequence, the sequence shown in SEQ ID NO: 27 or its subsequence, the sequence shown in SEQ ID NO: 29 or its subsequence, the sequence shown in SEQ ID NO: 31 or its subsequence, the sequence shown in SEQ ID NO:33, or its subsequence, the sequence shown in SEQ ID NO:35 or its subsequence, the sequence shown in SEQ ID NO:37 or its subsequence, the sequence shown in SEQ ID NO:39, or its subsequence, the sequence shown in SEQ ID NO: 41 or its subsequence, the sequence shown in SEQ ID NO: 43 or its subsequence, the sequence shown in SEQ ID NO: 45 or its subsequence, the sequence shown in SEQ ID NO : 47 or its subsequence subsequence, the sequence shown in SEQ ID NO: 51 or its subsequence, the sequence shown in SEQ ID NO: 53 or its subsequence, the sequence shown in SEQ ID NO :55 or its subsequence, the sequence shown in SEQ ID NO: 57 or its subsequence, the sequence shown in SEQ ID NO:59 or its subsequence, the sequence shown in SEQ ID NO:61 or its subsequence, the sequence shown in SEQ ID NO:63 or its subsequence, the sequence shown in SEQ ID NO:63 ID NO:65 or a subsequence thereof, the sequence set forth in SEQ ID NO:67, or a subsequence thereof, the forward set of sequences in SEQ ID NO:69, or a subsequence thereof, SEQ ID NO:71, or a subsequence thereof SEQ ID NO:73, or its subsequence SEQ ID NO:75, or its subsequence SEQ ID NO:77 or its subsequence SEQ ID NO: 79 or its subsequence. The transgenic non-human animal can be a mouse or a rat.
The invention provides transgenic plants containing a nucleic acid of the invention, eg a nucleic acid sequence having at least 50% sequence identity with SEQ ID NO:1, SEQ ID NO:3, SEQ ID NO:5, SEQ ID NO:7 SEQ ID BR : 7 11, SEQ ID NO.: 13, SEQ ID NO.: 15, SEQ ID NO.: 17, SEQ ID NO.: 19, SEQ ID NO.: 21, SEQ ID NO.: 31, SEQ ID NO. : 33, SEQ ID NO. ID NO: 35, ID NO. SEQ.: 41, ID NO. SEQ.: 43, ID NO. SEQ.: 45, ID NO. SEQ.: 47, ID NO. SEQ.: 53, ID NO. :61, SEQ ID NO:63, SEQ ID NO:67, SEQ ID NO:69, SEQ ID NO:71, SEQ ID NO:73, SEQ ID NO:75, SEQ ID NO:77 or SEQ ID NO:79 in a region of at least about 100 residues, a nucleic acid sequence having at least 60% sequence identity to SEQ ID NO: 9, SEQ ID NO: 23, SEQ ID NO: 27, SEQ ID NO: 29, SEQ ID NO: 39, SEQ ID NO:49, SEQ ID NO:51, SEQ ID NO:55 or SEQ ID NO:65 over at least about 100 residues, or a nucleic acid sequence having at least 70% sequence identity to SEQ ID NO:25 or SEQ ID NO: 37 over at least about 100 residues, wherein sequence identities are determined by analysis using a sequence comparison algorithm or by visual inspection; or a nucleic acid that hybridizes under strict conditions to a nucleic acid containing the sequence listed in SEQ ID NO:1 or its subsequence, the sequence listed in SEQ ID NO:3 or its subsequence, the sequence listed in SEQ ID NO: 5 or its subsequence, sequence SEQ ID NO: 7 or its subsequence, SEQ ID NO: 9 or its subsequence, SEQ ID NO: 1 or its subsequence, SEQ ID NO: 13 or its subsequence, the sequence listed in SEQ ID NO:1 or its subsequence SEQ ID NO:15 or its subsequence, the sequence set forth in SEQ ID NO:17, or its subsequence, the sequence set forth in SEQ ID NO: 19, or its subsequence, the sequence set forth in SEQ ID NO: 21, or its subsequence, the sequence set forth in SEQ ID NO: 23 or a subsequence thereof, the sequence shown in SEQ ID NO: 25 or a subsequence thereof, the sequence shown in SEQ ID NO:27 or its subsequence the sequence shown in SEQ ID NO:29 or its subsequence, the sequence shown in SEQ ID NO :31 or a subsequence thereof, the sequence shown in SEQ ID NO:33 or a subsequence thereof, the sequence set forth in SEQ ID NO:35 or a subsequence thereof, the sequence set forth in SEQ ID NO:37 or a subsequence thereof, the sequence set forth in SEQ ID NO:39 , or its subsequence subsequence, the sequence set forth in SEQ ID NO: 41, or its subsequence, the sequence set forth in SEQ ID NO: 43, or its subsequence, the sequence set forth in SEQ ID NO: 45, or its subsequence, the sequence set forth in SEQ ID NO:47, or its subsequence SEQ ID NO:51 or its subsequence, SEQ ID NO:53 or its subsequence, SEQ ID NO:55 or its subsequence, SEQ ID NO: 57 or its subsequence, the sequence of SEQ ID NO: 59 or its subsequence, SEQ ID NO: 61 or its subsequence, SEQ ID NO: 63 or its subsequence, SEQ ID NO: 65 or its subsequence, the sequence listed in SEQ ID NO:67 or its subsequence, the sequence listed in SEQ ID NO:69 or its subsequence, SEQ ID NO:71 or its subsequence SEQ ID NO:73 or its subsequence SEQ ID NO:75 or its subsequence SEQ ID NO::77 or its subsequence SEQ ID NO:79 or its subsequence. The plant can be a corn plant, a potato plant, a tomato plant, a wheat plant, an oilseed plant, an oilseed rape plant, a soybean plant or a tobacco plant.
The invention provides transgenic seeds containing a nucleic acid of the invention, eg a nucleic acid sequence having at least 50% sequence identity to SEQ ID NO:1, SEQ ID NO:3, SEQ ID NO:5, SEQ ID NO:7 SEQ ID BR : 7 11, SEQ ID NO.: 13, SEQ ID NO.: 15, SEQ ID NO.: 17, SEQ ID NO.: 19, SEQ ID NO.: 21, SEQ ID NO.: 31, SEQ ID NO. : 33, SEQ ID NO. ID NO: 35, ID NO. SEQ.: 41, ID NO. SEQ.: 43, ID NO. SEQ.: 45, ID NO. SEQ.: 47, ID NO. SEQ.: 53, ID NO. :61, SEQ ID NO:63, SEQ ID NO:67, SEQ ID NO:69, SEQ ID NO:71, SEQ ID NO:73, SEQ ID NO:75, SEQ ID NO:77 or SEQ ID NO:79 in a region of at least about 100 residues, a nucleic acid sequence having at least 60% sequence identity to SEQ ID NO: 9, SEQ ID NO: 23, SEQ ID NO: 27, SEQ ID NO: 29, SEQ ID NO: 39, SEQ ID NO:49, SEQ ID NO:51, SEQ ID NO:55 or SEQ ID NO:65 over at least about 100 residues, or a nucleic acid sequence having at least 70% sequence identity to SEQ ID NO:25 or SEQ ID NO: 37 over at least about 100 residues, wherein sequence identities are determined by analysis using a sequence comparison algorithm or by visual inspection; or a nucleic acid that hybridizes under strict conditions to a nucleic acid containing the sequence listed in SEQ ID NO:1 or its subsequence, the sequence listed in SEQ ID NO:3 or its subsequence, the sequence listed in SEQ ID NO: 5 or its subsequence, sequence SEQ ID NO: 7 or its subsequence, SEQ ID NO: 9 or its subsequence, SEQ ID NO: 1 or its subsequence, SEQ ID NO: 13 or its subsequence, the sequence listed in SEQ ID NO:1 or its subsequence SEQ ID NO:15 or its subsequence, the sequence shown in SEQ ID NO:17, or its subsequence, the sequence shown in SEQ ID NO: 19, or its subsequence, the sequence shown in SEQ ID NO:21, or its subsequence, the sequence shown in SEQ ID NO:21:23 or its subsequence, the sequence shown in SEQ ID NO:25 or its subsequence, the sequence shown in SEQ ID NO:27 or its subsequence, the sequence shown in SEQ ID NO:29 or its subsequence, the sequence shown in SEQ ID NO:31 or its subsequence, the sequence shown in SEQ ID NO:33, or its subsequence, the sequence shown in SEQ ID NO:35 or its subsequence, the sequence shown in SEQ ID NO:37 or its subsequence, the sequence shown in SEQ ID NO:39, or its subsequence, the sequence set forth in SEQ ID NO: 41 or its subsequence, the sequence set forth in SEQ ID NO: 43 or its subsequence, the sequence set forth in SEQ ID NO: 45 or its subsequence, the sequence set forth in SEQ ID NO : 47 or its subsequence subsequence, the sequence shown in SEQ ID NO: 51 or its subsequence, the sequence shown in SEQ ID NO: 53 or its subsequence, the sequence shown in SEQ ID NO:55 or its subsequence, the sequence shown in SEQ ID NO: 57 or its subsequence, the sequence shown in SEQ ID NO:59 or its subsequence, the sequence shown in SEQ ID NO:61 or its subsequence, the sequence shown in SEQ ID NO:63 or its subsequence, the sequence shown in SEQ ID NO:63 ID NO:65 or its subsequence, the sequence set forth in SEQ ID NO:67, or its subsequence, the sequence set forth in SEQ ID NO:69, or its subsequence, SEQ ID NO:71, or its subsequence SEQ ID NO:73, or its subsequence of SEQ ID NO:75, or subsequence of SEQ ID NO:77 or subsequence of SEQ ID NO: 79 or subsequence thereof. Transgenic seeds can be corn seeds, wheat seeds, oilseeds, canola seeds, soybean seeds, palm seeds, sunflower seeds, sesame seeds, peanuts or tobacco plant seeds.
The invention provides antisense oligonucleotides comprising a nucleic acid of the invention, e.g., a nucleic acid sequence that is complementary or capable of hybridizing under stringent conditions with a nucleic acid sequence that has at least 50% sequence identity to SEQ ID NO:1, SEQ ID NO:3, SEQ ID NO::5, SEQ ID NO:7, SEQ ID NO:11, SEQ ID NO:13, SEQ ID NO:15, SEQ ID NO:17, SEQ ID NO:19, SEQ ID NO: 21 SEQ ID NO : 31, SEQ ID NO: 33, SEQ ID NO: 35, SEQ ID NO: 41, SEQ ID NO: 43, SEQ ID NO: 45, SEQ ID NO: 47, SEQ ID NO: 53, SEQ ID NO: 45 , SEQ ID NO: 47, SEQ ID NO: 53, SEQ ID NO: 45 ID NO: 57, SEQ ID NO: 59, ID NO: SEQ.: 61, ID NO. SEQ.: 63, ID NO. SEQ.: 67, ID NO. SEQ.: 69, ID NO. SEQ.: 71, ID NO. :75, SEQ ID NO:77 or SEQ ID NO:79 in a region of at least about 100 residues, a nucleic acid sequence having at least 60% sequence identity to SEQ ID NO:9, SEQ ID NO:23, SEQ ID NO:23 ID NO: 27, ID NO. SEQ.: 29, ID NO. SEQ.: 39, ID NO. SEQ.: 49, ID NO. SEQ.: 51, ID NO. SEQ.: 55 or ID NO. SEQ ID NO:65 over a region of at least about 100 residues, or a nucleic acid sequence having at least 70% sequence identity to SEQ ID NO:25 or SEQ ID NO:37 over a region of at least about 100 residues, where the sequence identities are determined by analysis sequence comparison algorithm or visual inspection; or a nucleic acid that hybridizes under strict conditions to a nucleic acid containing the sequence listed in SEQ ID NO:1 or its subsequence, the sequence listed in SEQ ID NO:3 or its subsequence, the sequence listed in SEQ ID NO: 5 or its subsequence, the sequence of SEQ ID NO: 7 or its subsequence, SEQ ID NO: 9 or its subsequence, SEQ ID NO: 11 or its subsequence, SEQ ID NO: 13 or its subsequence, the sequence shown in SEQ ID NO:11 or its subsequence SEQ ID NO:15 or its subsequence, the sequence listed in SEQ ID NO:17, or its subsequence, the sequence listed in SEQ ID NO:19, or its subsequence, the sequence listed in SEQ ID NO:21, or its subsequence, the sequence listed in SEQ ID NO:21:23 or its subsequence, the sequence shown in SEQ ID NO:25 or its subsequence, the sequence shown in SEQ ID NO: 27 or its subsequence, the sequence shown in SEQ ID NO: 29 or its subsequence, the sequence shown in SEQ ID NO: 31 or its subsequence, the sequence shown in SEQ ID NO:33, or its subsequence, the sequence shown in SEQ ID NO:35 or its subsequence, the sequence shown in SEQ ID NO:37 or its subsequence, the sequence shown in SEQ ID NO:39, or its subsequence, the sequence shown in SEQ ID NO: 41 or its subsequence, the sequence shown in SEQ ID NO: 43 or its subsequence, the sequence shown in SEQ ID NO: 45 or its subsequence, the sequence shown in SEQ ID NO : 47 or its subsequence subsequence, the sequence shown in SEQ ID NO: 51 or its subsequence, the sequence shown in SEQ ID NO: 53 or its subsequence, the sequence shown in SEQ ID NO :55 or its subsequence, the sequence shown in SEQ ID NO: 57 or its subsequence, the sequence shown in SEQ ID NO:59 or its subsequence, the sequence shown in SEQ ID NO:61 or its subsequence, the sequence shown in SEQ ID NO:63 or its subsequence, the sequence shown in SEQ ID NO:63 ID NO:65 or a subsequence thereof, the sequence set forth in SEQ ID NO:67, or a subsequence thereof, the forward set of sequences in SEQ ID NO:69, or a subsequence thereof, SEQ ID NO:71, or a subsequence thereof SEQ ID NO:73, or its subsequence SEQ ID NO:75, or its subsequence SEQ ID NO:77 or its subsequence SEQ ID NO: 79 or its subsequence. An antisense oligonucleotide can be about 10 to 50, about 20 to 60, about 30 to 70, about 40 to 80, or about 60 to 100 bases in length.
The invention provides methods of inhibiting the translation of epoxide hydrolase messages in a cell, comprising administering to a cell or expressing in a cell an antisense oligonucleotide comprising a nucleic acid of the invention, e.g., stringent requirements for a nucleic acid comprising a nucleic acid sequence having at least 50% sequence identity with SEQ ID NO: 1, SEQ ID NO: 3, SEQ ID NO: 5, SEQ ID NO: 7, SEQ ID NO: 11, SEQ ID NO: 11 ID NO: 13, ID NO: SEQ.: 15, ID NO. SEQ.: 17, ID NO. SEQ.: 49, ID NO. SEQ.: 21, ID NO. SEQ.: 31, ID NO. :41, SEQ ID NO:43, SEQ ID NO:45, SEQ ID NO:47, SEQ ID NO:53, SEQ ID NO:57, SEQ ID NO:59, SEQ ID NO:61, SEQ ID NO:63 , SEQ ID NO:67, SEQ ID NO:69, SEQ ID NO:71, SEQ ID NO:73, SEQ ID NO:75, SEQ ID NO:77 or SEQ ID NO:79 over at least about 100 residues, and the sequence nucleic acid having at least 60% sequence identity with SEQ ID NO:9, SEQ ID NO:23, SEQ ID NO:27, SEQ ID NO:29, SEQ ID NO:39, SEQ ID NO:49, SEQ ID NO: 51, SEQ ID NO:55 or SEQ ID NO:65 over at least about 100 residues, or a nucleic acid sequence that shares at least 70% sequence identity with SEQ ID NO:25 or SEQ ID NO:25 ID NO:37 over at least about 100 residues, wherein sequence identities were determined by analysis using a sequence comparison algorithm or by visual inspection; or a nucleic acid that hybridizes under strict conditions to a nucleic acid containing the sequence listed in SEQ ID NO:1 or its subsequence, the sequence listed in SEQ ID NO:3 or its subsequence, the sequence listed in SEQ ID NO: 5 or its subsequence, sequence SEQ ID NO: 7 or its subsequence, SEQ ID NO: 9 or its subsequence, SEQ ID NO: 1 or its subsequence, SEQ ID NO: 13 or its subsequence, the sequence listed in SEQ ID NO:1 or its subsequence SEQ ID NO:15 or its subsequence, the sequence shown in SEQ ID NO:17, or its subsequence, the sequence shown in SEQ ID NO: 19, or its subsequence, the sequence shown in SEQ ID NO:21, or its subsequence, the sequence shown in SEQ ID NO:21:23 or its subsequence, the sequence shown in SEQ ID NO:25 or its subsequence, the sequence shown in SEQ ID NO:27 or its subsequence, the sequence shown in SEQ ID NO:29 or its subsequence, the sequence shown in SEQ ID NO:31 or its subsequence, the sequence shown in SEQ ID NO:33, or its subsequence, the sequence shown in SEQ ID NO:35 or its subsequence, the sequence shown in SEQ ID NO:37 or its subsequence, the sequence shown in SEQ ID NO:39, or its subsequence, the sequence set forth in SEQ ID NO: 41 or its subsequence, the sequence set forth in SEQ ID NO: 43 or its subsequence, the sequence set forth in SEQ ID NO: 45 or its subsequence, the sequence set forth in SEQ ID NO : 47 or its subsequence subsequence, the sequence shown in SEQ ID NO: 51 or its subsequence, the sequence shown in SEQ ID NO: 53 or its subsequence, the sequence shown in SEQ ID NO:55 or its subsequence, the sequence shown in SEQ ID NO: 57 or its subsequence, the sequence shown in SEQ ID NO:59 or its subsequence, the sequence shown in SEQ ID NO:61 or its subsequence, the sequence shown in SEQ ID NO:63 or its subsequence, the sequence shown in SEQ ID NO:63 ID NO:65 or its subsequence, the sequence set forth in SEQ ID NO:67, or its subsequence, the sequence set forth in SEQ ID NO:69, or its subsequence, SEQ ID NO:71, or its subsequence SEQ ID NO:73, or its subsequence of SEQ ID NO:75, or subsequence of SEQ ID NO:77 or subsequence of SEQ ID NO: 79 or subsequence thereof.
The invention provides isolated or recombinant polypeptides containing an amino acid sequence having at least 50% identity with SEQ ID NO:2, SEQ ID NO:4, SEQ ID NO:6, SEQ ID NO:8, SEQ ID NO:12, SEQ ID BR .: ID NO.: 14, ID NO. SEQ.: 16, ID NO. SEQ.: 18, ID NO. SEQ.: 20, ID NO. SEQ.: 22, ID NO. SEQ.: 32, ID NO. :42, SEQ ID NO:44, SEQ ID NO:46, SEQ ID NO:48, SEQ ID NO:54, SEQ ID NO:58, SEQ ID NO:60, SEQ ID NO:62, SEQ ID NO:64 , SEQ ID NO:68, SEQ ID NO:70, SEQ ID NO:72, SEQ ID NO:74, SEQ ID NO:76, SEQ ID NO:78 or SEQ ID NO:80 over at least about 100 residues, amino acid sequence which has at least 60% identity with SEQ ID NO:10, SEQ ID NO:24, SEQ ID NO:28, SEQ ID NO:30, SEQ ID NO:40, SEQ ID NO:50, SEQ ID NO: 52, SEQ ID NO:56 or SEQ ID NO:66 over at least about 100 residues, an amino acid sequence that shares at least 70% identity with SEQ ID NO:26 or SEQ ID NO:38 over a region of at least about 100 residues, or a polypeptide encoded by a nucleic acid of of the invention, eg a nucleic acid comprising (i) a nucleic acid sequence having at least 50% sequence identity with SEQ ID NO:1, SEQ ID NO:3, SEQ ID NO:3. SEQ.: 5, ID NO. SEQ.: 7, ID NO. SEQ.: 11, ID NO. SEQ.: 13, ID NO. :21, SEQ ID NO:31, SEQ ID NO:33, SEQ ID NO:35, SEQ ID NO:41, SEQ ID NO:43, SEQ ID NO:45, SEQ ID NO:47, SEQ ID NO:53 SEQ ID NO: 57, SEQ ID NO: 59, SEQ ID NO: 61, SEQ ID NO: 63, SEQ ID NO: 67, SEQ ID NO: 69, SEQ ID NO: 71, SEQ ID NO: 73, SEQ ID NO: 73, SEQ ID NO: 69.: 75, ID NO. SEQ.: 77 or ID NO. SEQ ID NO: 79 in a region of at least about 100 residues, a nucleic acid sequence having at least 60% sequence identity to ID NO. SEQ.: 9, ID NO. SEQ ID NO:23, SEQ ID NO:27, SEQ ID NO:29, SEQ ID NO:39, SEQ ID NO:49, SEQ ID NO:51, SEQ ID NO:55 or SEQ ID NO:65 in the region of at least about 100 residues, or a nucleic acid sequence having at least about 70% sequence identity to SEQ ID NO: 25 or SEQ ID NO: 37 over at least about 100 residues, wherein the sequence identities are determined by sequence comparison analysis or visual inspection; or (ii) a nucleic acid that hybridizes under stringent conditions to a nucleic acid of the invention.
In one aspect, the polypeptide has epoxide hydrolase activity. Epoxide hydrolase activity may involve catalyzing the addition of water to an oxirane compound. Epoxide hydrolase activity may further involve the formation of the corresponding diol. Epoxide hydrolase activity may further involve the formation of an enantiomerically enriched epoxide. The oxirane compound may include an epoxide or an arenium oxide. The oxirane compound or the corresponding diol may be optically active.
In one embodiment, the oxirane compound or corresponding diol is enantiomerically pure. Epoxide hydrolase activity can be enantioselective. Epoxide hydrolase activity may involve the hydrolysis of monosubstituted, 2,2-disubstituted, 2,3-disubstituted, trisubstituted epoxide or styrene oxide.
In one aspect, the epoxide hydrolase activity is thermostable. The polypeptide can retain epoxide hydrolase activity under conditions including a temperature range of about 37°C to about 70°C. Epoxide hydrolase activity can be thermotolerant. The polypeptide can retain epoxide hydrolase activity when exposed to temperatures ranging from above 37°C to about 90°C. The polypeptide can retain epoxide hydrolase activity when exposed to temperatures in the range above 37°C. up to about 50°C.
In alternative aspects, the polypeptide comprises an amino acid sequence that contains at least 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 98%, 99%, or more identity. with SEQ ID NO:2, SEQ ID NO:4, SEQ ID NO:6, SEQ ID NO:8, SEQ ID NO:12, SEQ ID NO:14, SEQ ID NO:16, SEQ ID NO: 18, SEQ ID NO:20, SEQ ID NO:22, SEQ ID NO:32, SEQ ID NO:34, SEQ ID NO:36, SEQ ID NO:42, SEQ ID NO:44, SEQ ID NO:46, SEQ ID NO : 48, SEQ ID NO: 54, SEQ ID NO: 58, SEQ ID NO: 60, SEQ ID NO: 62, SEQ ID NO: 64, SEQ ID NO: 68, SEQ ID NO: 70, SEQ ID NO: 64 , SEQ ID NO: 68, SEQ ID NO: 70, SEQ ID NO: NO: 72, ID NO. SEQ.: 74, ID NO. SEQ.: 76, ID NO. SEQ.: 78 or ID NO. SEQ:80 over a region of at least about 50, 100, 150, 200, 250, 300, 350, 400, 450, 500, 550, 600, 650, 700, 750, 800, 850, 900, 950, 1000 or more residues , an amino acid sequence having at least 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 98%, 99% or more identity with SEQ ID NO:10 , SEQ ID NO: 24 , SEQ ID NO: 28, SEQ ID NO: 30, SEQ ID NO: 40, SEQ ID NO: 28, SEQ ID NO: 30, SEQ ID NO: 40 ID NO: 50, ID NO: 50, SEQ ID NO: 30, SEQ ID NO: 40 SEQ.: 52, ID NO. SEQ.: 56 or ID NO. residues or an amino acid sequence having at least about 70%, 75%, 80%, 85%, 90%, 95%, 98%, 99% or more identity to SEQ ID NO:26 or SEQ ID NO:38 with respect to at least about 50, 100, 150, 200, 250, 300, 350, 400, 450, 500, 550, 600, 650, 700, 750, 800, 850, 900, 950, 1000 or more residues.
The invention provides isolated or recombinant polypeptides, wherein the polypeptide comprises the amino acid sequence shown in SEQ ID NO:2, the amino acid sequence shown in SEQ ID NO:4, the amino acid sequence shown in SEQ ID NO: NR:6, the amino acid sequence shown in SEQ ID NO : 8, amino acid sequence shown in SEQ ID NO: 10, amino acid sequence shown in SEQ ID NO: 12, amino acid sequence shown in SEQ ID NO: 14, amino acid sequence shown in SEQ ID NO:16, amino acid sequence shown in SEQ ID NO:16, amino acid sequence shown in SEQ ID NO:16 NO:18, amino acid sequence shown in SEQ ID NO:18, 20, amino acid sequence shown in SEQ ID NO:22, amino acid sequence shown in SEQ ID NO:24, sequence amino acid shown in SEQ ID NO:26, amino acid sequence shown in SEQ ID NO:26, amino acid sequence shown in SEQ ID NO:26, amino acid sequence shown in SEQ ID NO:28, amino acid sequence shown in SEQ ID NO:30, sequence amino acid shown in SEQ ID NO:32, amino acid sequence shown in SEQ ID NO: 34, amino acid sequence shown in SEQ ID NO: 36, amino acid sequence shown in SEQ ID NO: 38, amino acid sequence shown in SEQ ID NO: 40, amino acid sequence shown in SEQ ID NO:42, amino acid sequence shown in SEQ ID NO::44, amino acid sequence shown in SEQ ID NO:46, amino acid sequence shown in SEQ ID NO:48, acid sequence shown in SEQ ID NO:50 , amino acid sequence shown in SEQ ID NO:52, amino acid sequence shown in SEQ ID NO::54, amino acid sequence shown in SEQ ID NO:54, amino acid sequence shown in SEQ ID NO:54 NO:56, amino acid sequence shown in SEQ ID NO:58, amino acid sequence shown in SEQ ID NO:60, amino acid sequence shown in SEQ ID NO:62, amino acid sequence as shown in SEQ ID NO:64, amino acid sequence shown in SEQ ID NO:66, amino acid sequence shown in SEQ ID NO:68, amino acid sequence shown in SEQ ID NO:70, amino acid sequence shown in SEQ ID NO:72, amino acid sequence shown in SEQ ID NO:74, amino acid sequence shown in SEQ ID NO:76, amino acid sequence shown in SEQ ID NO:76, the amino acid sequence shown in SEQ ID NO:76, the amino acid sequence shown in SEQ ID NO:76, the amino acid sequence shown in SEQ ID NO:76 in SEQ ID NO:76. shown in SEQ ID NO: 78 or the amino acid sequence shown in SEQ ID NO: 80 or a subsequence thereof.
In one embodiment, an isolated or recombinant polypeptide comprising a polypeptide of the invention lacks a signal sequence.
In one embodiment, the epoxide hydrolase activity includes a specific activity at about 37°C in the range of about 100 to about 1000 units per milligram of protein. In another embodiment, the epoxide hydrolase activity comprises a specific activity of about 500 to about 1200 units per milligram of protein. Alternatively, epoxide hydrolase activity includes a specific activity at 37°C ranging from about 500 to about 1000 units per milligram of protein. In one embodiment, the epoxide hydrolase activity comprises a specific activity at 37°C ranging from about 750 to about 1000 units per milligram of protein.
The invention provides an isolated or recombinant polypeptide wherein the thermotolerance includes retention of at least half of the specific epoxide hydrolase activity at 37°C when heated to an elevated temperature. In one aspect, thermotolerance includes maintaining a specific activity at 37°C in the range of about 500 to about 1200 units per milligram of protein when heated to an elevated temperature.
The invention provides a polypeptide of the invention, wherein the polypeptide contains at least one glycosylation site. In one embodiment, the glycosylation can be N-glycosylation. In one embodiment, the epoxide hydrolase is glycosylated when expressed in P. pastoris or S. pombe.
In one embodiment, the polypeptide can retain epoxide hydrolase activity under conditions including about pH 4.5 or pH 5. Alternatively, the polypeptide can retain epoxide hydrolase activity under conditions including about pH 9.0, pH 9.5, or pH 10.
The invention provides protein preparations containing the polypeptide of the invention, wherein the protein preparation comprises a liquid, solid or gel.
The invention provides heterodimers comprising a polypeptide of the invention and a second domain. In one embodiment, the second domain is a polypeptide and the heterodimer is a fusion protein. In one embodiment, the second domain can be an epitope or tag.
The invention provides an immobilized polypeptide having epoxide hydrolase activity, wherein the polypeptide comprises a polypeptide of the invention or a polypeptide encoded by a nucleic acid of the invention or a polypeptide comprising a polypeptide of the invention and a second domain. The polypeptide can be immobilized on a cell, metal, resin, polymer, ceramic, glass, microelectrode, graphite particle, bead, gel, plate, chip, or capillary tube.
The invention provides arrays comprising an immobilized polypeptide, wherein the polypeptide comprises a polypeptide of the invention or a polypeptide encoded by a nucleic acid of the invention or a polypeptide comprising a polypeptide of the invention and a second domain.
The invention provides arrays containing an immobilized nucleic acid of the invention. The invention provides matrices containing the antibody of the invention.
The invention provides isolated or recombinant antibodies that specifically bind to the polypeptide of the invention or to the polypeptide encoded by the nucleic acid of the invention. The antibody can be a monoclonal or polyclonal antibody. The invention provides hybridomas that contain an antibody that specifically binds to a polypeptide of the invention or to a polypeptide encoded by a nucleic acid of the invention.
The invention provides methods for isolating or identifying a polypeptide having epoxide hydrolase activity comprising the steps of: (a) providing an antibody of the invention; (b) providing a sample containing the polypeptides; and (c) contacting the sample of step (b) with the antibody of step (a) under conditions in which the antibody can specifically bind to the polypeptide, thereby isolating or identifying a polypeptide having epoxide hydrolase activity.
The invention provides methods for producing antibodies against epoxide hydrolase comprising administering to a non-human animal an amount of a nucleic acid of the invention or a polypeptide of the invention sufficient to elicit a humoral immune response, thereby producing an anti-epoxide hydrolase antibody.
The invention provides methods for the production of a recombinant polypeptide comprising the steps of: (a) providing a nucleic acid of the invention operably linked to a promoter; and (b) expressing the nucleic acid of step (a) under conditions that permit expression of the polypeptide, thereby producing the recombinant polypeptide. In one aspect, the method may further comprise transforming the host cell with the nucleic acid of step (a) and then expressing the nucleic acid of step (a), thereby producing the recombinant polypeptide in the transformed cell.
The invention provides methods for identifying a polypeptide having epoxide hydrolase activity comprising the steps of: (a) obtaining a polypeptide of the invention or a polypeptide encoded by a nucleic acid of the invention; (b) providing an epoxide hydrolase substrate; and (c) contacting the polypeptide or fragment or variant thereof of step (a) with the substrate of step (b) and detecting a decrease in the amount of the substrate or an increase in the amount of the reaction product, wherein the decrease in the amount of the substrate or the increase in the amount of the reaction product detects the polypeptide having epoxide hydrolase activity. In one embodiment, the substrate may be an epoxy.
The invention provides methods for identifying epoxide hydrolase substrates comprising the steps of: (a) obtaining a polypeptide of the invention or a polypeptide encoded by a nucleic acid of the invention; (b) providing a test medium; and (c) contacting the polypeptide of step (a) with the test substrate of step (b) and detecting a decrease in the amount of substrate or an increase in the amount of reaction product, wherein the decrease in the amount of substrate or increase in the amount of reaction product identifies the test substrate as an epoxide hydrolase substrate .
The invention provides methods for determining whether a test compound specifically binds to a polypeptide comprising the steps of: (a) expressing a nucleic acid or a vector containing the nucleic acid under conditions that permit the translation of the nucleic acid into a polypeptide, the acid containing the nucleic acid of the invention or, providing a polypeptide according to the invention; (b) securing the test joint; (c) contacting the polypeptide with the test compound; and (d) determining whether the test compound of step (b) specifically binds to the polypeptide.
The invention provides methods for identifying modulators of epoxide hydrolase activity comprising the steps of: (a) obtaining a polypeptide of the invention or a polypeptide encoded by a nucleic acid of the invention; (b) providing a test compound; (c) contacting the polypeptide of step (a) with the test compound of step (b) and measuring the epoxide hydrolase activity, wherein the change in epoxide hydrolase activity measured in the presence of the test compound compared to the activity in the absence of the test compound ensures that the test compound modulates epoxide hydrolase activity. In one aspect, epoxide hydrolase activity is measured by providing an epoxide hydrolase substrate and detecting a decrease in the amount of substrate or an increase in the amount of reaction product or an increase in the amount of substrate or a decrease in the amount of reaction product. A decrease in the amount of substrate or an increase in the amount of reaction product with the test compound compared to the amount of substrate or reaction product without the test compound identifies the test compound as an activator of epoxide hydrolase activity. An increase in the amount of substrate or a decrease in the amount of reaction product with the test compound compared to the amount of substrate or reaction product without the test compound identifies the test compound as an inhibitor of epoxide hydrolase activity.
The invention provides computer systems consisting of a processor and a data storage device, said data storage device stores a polypeptide sequence or a nucleic acid sequence, a polypeptide sequence containing the polypeptide of the invention or its subsequence and a nucleic acid containing the nucleic acid of the invention. In one aspect, the computer system may further include a sequence comparison algorithm and a data storage device having at least one reference sequence stored thereon. In one aspect, the sequence comparison algorithm comprises a computer program that indicates polymorphisms. In another aspect, the computer system may further include an identifier that identifies one or more features in said array.
The invention provides a computer-readable medium on which a polypeptide sequence or a nucleic acid sequence is stored, wherein the polypeptide sequence contains the polypeptide of the invention or its subsequence, and the nucleic acid contains the nucleic acid of the invention or its subsequence.
The invention provides methods for identifying a feature in a sequence comprising the steps of: (a) reading the sequence using a computer program that identifies one or more features in the sequence, wherein the sequence comprises a polypeptide sequence or a nucleic acid sequence, wherein the polypeptide sequence comprises a polypeptide of the invention or its polypeptide subsequence, and the nucleic acid comprises the nucleic acid of the invention or its subsequence; and (b) identifying one or more features in the sequence using a computer program.
The invention provides methods for comparing a first sequence with a second sequence comprising the steps of: (a) reading the first sequence and the second sequence using a computer program that compares the sequences, wherein the first sequence comprises a polypeptide sequence or a nucleic acid sequence a polypeptide sequence comprising a polypeptide according to nucleic acid the acid of the invention or its subsequence, and the nucleic acid contains the nucleic acid of the invention or its subsequence; and (b) determining differences between the first sequence and the second sequence using a computer program. In one aspect, the step of determining the differences between the first sequence and the second sequence further includes the step of identifying polymorphisms. In one aspect, the method may further include an identifier that identifies one or more features in the array. In another aspect, the method may further comprise reading the first sequence using a computer program and identifying one or more features in the sequence.
The invention provides methods of isolating or recovering a nucleic acid encoding an epoxide hydrolase polypeptide from an environmental sample, comprising the steps of: (a) providing a pair of amplification primer sequences for amplifying a nucleic acid encoding an epoxide hydrolase polypeptide, wherein the primer pair is capable of amplifying SEQ ID NO:1, SEQ ID NO:3, SEQ ID NO:5, SEQ ID NO:7, SEQ ID NO:9, SEQ ID NO:11, SEQ ID NO:13 SEQ ID NO:15, SEQ ID NO :17, SEQ ID NO.: 19, SEQ ID NO.: 21, SEQ ID NO.: 23, SEQ ID NO.: 25, SEQ ID NO.: 27, SEQ ID NO.: 29, SEQ ID NO.: 27, SEQ ID NO.: 29, SEQ ID NO. ID NO.: 31, ID NO. SEQ.: 33, ID NO. SEQ.: 35, ID NO. SEQ.: 37, ID NO. SEQ.: 39, ID NO. :47, SEQ ID NO:49, SEQ ID NO:51, SEQ ID NO:53, SEQ ID NO:55, SEQ ID NO:57, SEQ ID NO:59, SEQ ID NO:61, SEQ ID NO:63 SEQ ID NO:65, SEQ ID NO:67, SEQ ID NO:69 or their subsequences; (b) isolating the nucleic acid from the environmental sample or treating the environmental sample so that the nucleic acid in the sample is available for hybridization with a pair of amplification primers; and (c) combining the nucleic acid of step (b) with the amplification primer pair of step (a) and amplifying the nucleic acid from the environmental sample, thereby isolating or recovering the nucleic acid encoding the epoxide hydrolase polypeptide from the environmental sample. In one aspect, one and each member of a pair of amplification primer sequences comprises an oligonucleotide comprising at least about 10 to 50 contiguous bases of the sequence as set forth in SEQ ID NO:1, SEQ ID NO:3, SEQ ID NO:5 SEQ ID NO: 5 7, SEQ ID NO: 9, SEQ ID NO: 11, SEQ ID NO: 13, SEQ ID NO: 15, SEQ ID NO: 17, SEQ ID NO: 19, SEQ ID NO: 21, SEQ ID NO: 17 , SEQ ID NO: 19, SEQ ID NO: 21, SEQ ID NO. ID NO.: 23, ID NO. SEQ.: 25, ID NO. SEQ.: 27, ID NO. SEQ.: 29, ID NO. SEQ.: 31, ID NO. SEQ.: 33, ID NO. :39, SEQ ID NO:41, SEQ ID NO:43, SEQ ID NO:45, SEQ ID NO:47, SEQ ID NO:49, SEQ ID NO:51, SEQ ID NO:53, SEQ ID NO:55 SEQ ID NO: 57, SEQ ID NO: 59, SEQ ID NO: 61, SEQ ID NO: 63, SEQ ID NO: 65, SEQ ID NO: 67, SEQ ID NO: 69, SEQ ID NO: 71, SEQ ID NO: 67, SEQ ID NO: 69, SEQ ID NO: 71, SEQ ID NO: 67 SEQ. ID NO: 73, ID NO. SEQ.: 75, ID NO. SEQ.: 77, ID NO. SEQ.:79 or its subsequences.
The invention provides methods for isolating or recovering a nucleic acid encoding an epoxide hydrolase polypeptide from an environmental sample, comprising the steps of: (a) providing a polynucleotide probe containing the nucleic acid of the invention or a subsequence thereof; (b) isolating the nucleic acid from the environmental sample or treating the environmental sample so that the nucleic acid in the sample is available for hybridization with the polynucleotide probe of step (a); (c) combining the isolated nucleic acid or processed environmental sample of step (b) with the polynucleotide probe of step (a); and (d) isolating a nucleic acid that specifically hybridizes to the polynucleotide probe of step (a), thereby isolating or recovering nucleic acid encoding the epoxide hydrolase polypeptide from the environmental sample. In one embodiment, the environmental sample includes a water sample, a liquid sample, a soil sample, an air sample, or a biological sample. A biological sample may be from a bacterial cell, a protozoan cell, an insect cell, a yeast cell, a plant cell, a fungal cell, or a mammalian cell.
The invention provides methods for producing a nucleic acid variant encoding a polypeptide having epoxide hydrolase activity, comprising the steps of: (a) providing a template nucleic acid comprising a nucleic acid of the invention; and (b) modifying, deleting, or adding one or more nucleotides to the template sequence, or a combination thereof, to produce the template nucleic acid variant. In one embodiment, the method may further comprise expressing the variant nucleic acid to produce the variant epoxide hydrolase polypeptide.
In one embodiment, the modifications, additions or deletions are made by a method including error-prone PCR, shuffling, oligonucleotide-directed mutagenesis, splicing PCR, sex-specific PCR mutagenesis, in vivo mutagenesis, cassette mutagenesis, recursive team mutagenesis, exponential team mutagenesis, gene-specific mutagenesis. gene site reassembly, gene site saturation mutagenesis (GSSM™), synthetic ligation reassembly (SLR) and combinations thereof. In another aspect, the modifications, additions or deletions are introduced by a method comprising recombination, sequence recursive recombination, phosphorothioate-modified DNA mutagenesis, uracil-containing template mutagenesis, gap junction duplex mutagenesis, point mismatch repair mutagenesis, repair-deficient host strain mutagenesis, chemical mutagenesis, radiogenic, deletion mutagenesis, restriction selection mutagenesis, restriction purification mutagenesis, artificial gene synthesis, ensemble mutagenesis, creation of chimeric multimers of nucleic acids and their combinations.
In one embodiment, the method can be repeated iteratively until an epoxide hydrolase with altered or different activity or altered or different stability than the polypeptide encoded by the sample nucleic acid is produced. In one embodiment, the variant epoxide hydrolase polypeptide may be thermotolerant and retain some activity when exposed to elevated temperatures. In another aspect, the epoxide hydrolase polypeptide variant has increased glycosylation compared to the epoxide hydrolase encoded by the sample nucleic acid. Alternatively, a variant epoxide hydrolase polypeptide exhibits epoxide hydrolase activity at high temperature, wherein the epoxide hydrolase encoded by the parent nucleic acid is inactive at high temperature. In one embodiment, the method is repeated iteratively until a sequence encoding an epoxide hydrolase having altered codon usage relative to the sample nucleic acid is generated. In another embodiment, the method is repeated iteratively until an epoxide hydrolase gene is generated with a higher or lower level of message expression or stability than the level of the sample nucleic acid.
The invention provides methods of modifying codons in a nucleic acid encoding an epoxide hydrolase polypeptide to increase its expression in a host cell, and the method includes the steps of: (a) providing a nucleic acid encoding an epoxide hydrolase polypeptide comprising the nucleic acid of the invention; and (b) identifying an unpreferred or less preferred codon in the nucleic acid of step (a) and replacing it with a preferred or neutrally used codon encoding the same amino acid as the replaced codon, wherein the preferred codon is a codon that is overrepresented in the coding sequences of the genes in the host cell, and a non-preferred or less preferred codon is a codon that is underrepresented in the coding sequences of the gene in the host cell, thereby modifying the nucleic acid to increase its expression in the host cell.
The invention provides methods for modifying codons in a nucleic acid encoding an epoxide hydrolase polypeptide, the method comprising the steps of: (a) providing a nucleic acid encoding an epoxide hydrolase polypeptide comprising the nucleic acid of the invention; and (b) identifying the codon in the nucleic acid of step (a) and replacing it with another codon encoding the same amino acid as the replaced codon, thereby modifying the codons in the nucleic acid encoding the epoxide hydrolase.
The invention provides methods of modifying codons in a nucleic acid encoding an epoxide hydrolase polypeptide to increase its expression in a host cell, and the method includes the steps of: (a) providing a nucleic acid encoding an epoxide hydrolase polypeptide comprising the nucleic acid of the invention; and (b) identifying an unpreferred or less preferred codon in the nucleic acid of step (a) and replacing it with a preferred or neutrally used codon encoding the same amino acid as the replaced codon, wherein the preferred codon is a codon that is overrepresented in the coding sequences of the genes in the host cell, and a non-preferred or less preferred codon is a codon that is underrepresented in the coding sequences of the gene in the host cell, thereby modifying the nucleic acid to increase its expression in the host cell.
The invention provides methods of modifying codons in a nucleic acid encoding a polypeptide having epoxide hydrolase activity to reduce its expression in a host cell, and the method includes the steps of: (a) providing a nucleic acid encoding an epoxide hydrolase polypeptide comprising the nucleic acid of the invention; and (b) identifying at least one preferred codon in the nucleic acid of step (a) and replacing it with an unpreferred or less preferred codon encoding the same amino acid as the replaced codon, wherein the preferred codon is a codon that is overrepresented in coding sequences in genes in to the host cell, and an undesirable or less desirable codon is a codon that is underrepresented in the coding sequences of the gene in the host cell, thereby modifying the nucleic acid to reduce its expression in the host cell. In one embodiment, the host cell can be a bacterial cell, a fungal cell, an insect cell, a yeast cell, a plant cell, or a mammalian cell.
The invention provides methods for producing a library of nucleic acids encoding a plurality of modified epoxide hydrolase active sites or substrate binding sites, wherein the modified active sites or substrate binding sites are derived from a first nucleic acid comprising a first active site or a first substrate binding site coding sequence , the method includes the following steps: (a) providing a first nucleic acid encoding a first active site or a first substrate binding site, wherein the first nucleic acid sequence comprises a sequence that hybridizes under stringent conditions to a sequence as defined below in SEQ ID NO:1 , SEQ ID NO:3, SEQ ID NO:5, SEQ ID NO:7, SEQ ID NO:9, SEQ ID NO:11, SEQ ID NO:13, SEQ ID NO:15, SEQ ID NO:17, SEQ ID NO:19, SEQ ID NO:21, SEQ ID NO: 23, SEQ ID NO: 25, SEQ ID NO: 27, SEQ ID NO: 29, SEQ ID NO: 31, SEQ ID NO: 29, SEQ ID NO : 31, SEQ ID NO: 29 ID NO.: 33, ID NO. SEQ.: 35, ID NO. SEQ.: 37, ID NO. SEQ.: 39, ID NO. SEQ.: 41, ID NO. 49, SEQ ID NO:51, SEQ ID NO:53, SEQ ID NO:55, SEQ ID NO:57, SEQ ID NO:59, SEQ ID NO:61, SEQ ID NO:63, SEQ ID NO:65, SEQ ID NO:67, SEQ ID NO:69, SEQ ID NO:71, SEQ ID NO:73, SEQ ID NO:75, SEQ ID NO:77, SEQ ID NO:79 or a subsequence thereof, and the acid codes for the active epoxide hydrolase site or epoxide hydrolase substrate binding site; (b) providing a set of mutagenic oligonucleotides encoding naturally occurring amino acid variants at multiple target codons in the first nucleic acid; and (c) using a set of mutagenic oligonucleotides to generate a set of variant nucleic acids encoding active sites or substrate binding sites, encoding a sequence of variant amino acids in each amino acid codon that is mutated, thereby creating a library of nucleic acids encoding multiple modified epoxide hydrolase active sites or binding sites substrate. In one aspect, the method may further comprise mutagenizing the first nucleic acid of step (a) by a method including an optimized site-specific evolutionary system, gene site saturation mutagenesis (GSSM), synthetic ligation reassembly (SLR), error-prone PCR shuffling, site-directed mutagenesis to oligonucleotides, PCR assembly, sex-specific PCR mutagenesis, in vivo mutagenesis, cassette mutagenesis, recursive assembly mutagenesis, exponential assembly mutagenesis, site-specific mutagenesis, gene reassembly, or a combination thereof. In another aspect, the method may further comprise mutagenesis of the first nucleic acid of step (a) or a variant thereof by a method including recombinant, recursive sequence recombination, phosphorothioate-modified DNA mutagenesis, uracil-containing mutagenesis, gap junction duplex mutagenesis, mutagenesis site mismatch repair, repair-deficient host strain mutagenesis, chemical mutagenesis, radiogenic mutagenesis, deletion mutagenesis, mutagenesis with restriction selection, mutagenesis with restriction purification, artificial gene synthesis, team mutagenesis, creation of chimeric nucleic acid multimers and their combinations.
The invention provides methods for the production of a small molecule comprising the steps of: (a) providing a plurality of biosynthetic enzymes capable of synthesizing or modifying the small molecule, one of the enzymes comprising an epoxide hydrolase enzyme encoded by a nucleic acid comprising the nucleic acid of the invention; (b) providing a substrate for at least one of the enzymes of step (a); and (c) reacting the substrate of step (b) with the enzymes under conditions that facilitate multiple biocatalytic reactions to produce the small molecule through a series of biocatalytic reactions.
The invention provides methods for modifying a small molecule comprising the steps of: (a) providing an epoxide hydrolase enzyme, wherein the enzyme comprises a polypeptide of the invention or is encoded by a nucleic acid of the invention; (b) providing a small molecule; and (c) reacting the enzyme of step (a) with the small molecule of step (b) under conditions that promote an enzymatic reaction catalyzed by the epoxide hydrolase enzyme, thereby modifying the small molecule by the epoxide hydrolase enzyme reaction. In one embodiment, the method may further include multiple small molecule substrates for the enzyme of step (a), thereby creating a library of modified small molecules produced by at least one enzyme reaction catalyzed by an epoxide hydrolase enzyme. In one embodiment, the method may further include multiple additional enzymes under conditions that facilitate multiple biocatalytic enzymatic reactions to form a library of modified small molecules produced by multiple enzymatic reactions. In one embodiment, the method may include the step of screening the library to determine whether a particular modified small molecule exhibiting a desired activity is present in the library. The step of testing the library may include the steps of systematically removing all but one of the biocatalytic reactions used to generate a portion of the plurality of modified small molecules in the library by testing the modified small molecule portion for the presence or absence of a particular modified small molecule molecule with a desired activity and identifying at least one specific reaction biocatalytic process, which results in a specific modified small molecule with the desired activity.
The invention provides methods for determining a functional fragment of an epoxide hydrolase enzyme comprising the steps of: (a) providing an epoxide hydrolase enzyme, wherein the enzyme comprises a polypeptide of the invention or is encoded by a nucleic acid of the invention; and (b) removing multiple amino acid residues from the sequence of step (a) and testing the remaining subsequence for epoxide hydrolase activity, thereby determining a functional fragment of the epoxide hydrolase enzyme. In one aspect, epoxide hydrolase activity can be measured by providing an epoxide hydrolase substrate and detecting a decrease in the amount of substrate or an increase in the amount of reaction product.
The invention provides methods for whole-cell engineering of new or modified phenotypes using real-time metabolic flux analysis, the method comprising the steps of: (a) producing a modified cell by modifying the genetic makeup of the cell, wherein the genetic makeup is modified by adding a cell nucleic acid according to the invention; (b) culturing the modified cell to produce more modified cells; (c) measuring at least one metabolic parameter of the cell by monitoring the cell culture of step (b) in real time; and (d) analyzing the data of step (c) to determine whether the measured parameter differs from a comparable measurement in an unmodified cell under similar conditions, thereby identifying the modified phenotype in the cell using real-time metabolic flux analysis. In one embodiment, the genetic makeup of the cell is modified by a process that includes deleting a sequence or modifying a sequence in the cell or eliminating gene expression. In one embodiment, the method may further comprise selecting a cell comprising the newly modified phenotype. In one embodiment, the method may further comprise culturing the selected cell, thereby generating a new cell strain comprising the newly designed phenotype.
The invention provides methods for hydrolysis of epoxides comprising the steps of: (a) providing a polypeptide having epoxide hydrolase activity, wherein the polypeptide comprises a polypeptide of the invention or a polypeptide encoded by a nucleic acid of the invention; (b) obtaining an epoxy-containing composition; and (c) contacting the polypeptide of step (a) with the composition of step (b) under conditions in which the polypeptide hydrolyzes the epoxide. In one embodiment, the epoxide is monosubstituted, 2,2-disubstituted, 2,3-disubstituted, trisubstituted or styrene oxide.
The invention provides processes for the production of a chiral diol comprising (the following steps: (a) obtaining a polypeptide having epoxide hydrolase activity, wherein the polypeptide comprises a polypeptide of the invention or a polypeptide encoded by a nucleic acid of the invention; (b) providing a composition containing a chiral epoxide and (c) contacting the polypeptide from step (a) with the composition from step (b) under conditions in which the polypeptide catalyzes the conversion of a chiral epoxide into a chiral diol.
The invention provides methods for the production of a chiral epoxide comprising the steps: (a) obtaining a polypeptide having epoxide hydrolase activity, wherein the polypeptide is a polypeptide of the invention or a polypeptide encoded by a nucleic acid of the invention, wherein the epoxide hydrolase activity is enantioselective or enantiospecific; (b) obtaining a composition containing a racemic mixture of chiral epoxides; (c) combining the polypeptide of step (a) with the composition of step (b) under conditions in which the enantioselective or enantiomospecific polypeptide converts the epoxide substrate of the specified chirality to a diol, thereby accumulating an unreacted epoxide of the opposite chirality.
The invention provides methods of increasing the thermotolerance or thermostability of the epoxide hydrolase polypeptide, and the method includes glycosylation of the epoxide hydrolase polypeptide, wherein the polypeptide contains at least thirty consecutive amino acids of the polypeptide of the invention or the polypeptide encoded by the nucleic acid of the invention, thereby increasing the thermotolerance or thermostability of the polypeptide epoxide hydrolase. In one aspect, the specific activity of the epoxide hydrolase is thermostable or thermotolerant at a temperature in the range of greater than about 37°C to about 90°C.
The invention provides methods for overexpressing a recombinant epoxide hydrolase polypeptide in a cell, including expressing a vector containing a nucleic acid sequence having at least 50% sequence identity to a nucleic acid of the invention over a region of at least about 100 residues, wherein the sequence identities are determined by analysis by an algorithm for sequence comparison or visual inspection, where overexpression is carried out using a highly active promoter, a dicistronic vector or gene amplification of the vector.
The invention provides growth-based methods for selecting a cell containing a nucleic acid encoding an epoxide hydrolase comprising the steps of: (a) providing a plurality of cells, wherein the cells lack the composition necessary for growth; (b) providing a precursor or substrate, wherein the precursor or substrate can be converted by an epoxide hydrolase to a composition necessary for cell growth; (c) growing the cells in a medium without the carbon source necessary for growth and adding the precursor or substrate from step (b); and (d) screening cells for growth, wherein cells in the growth-promoting clone are identified as containing a nucleic acid encoding an epoxide hydrolase capable of converting a precursor or substrate to a compound necessary for growth, thereby selecting a cell containing a nucleic acid encoding an epoxide hydrolase .
The invention provides growth-based methods for selecting a nucleic acid encoding an epoxide hydrolase comprising the steps of: (a) providing a nucleic acid encoding a polypeptide; (b) providing a precursor or substrate, wherein the precursor or substrate can be converted by an epoxide hydrolase to a composition necessary for cell growth; (c) providing a plurality of cells wherein the cells cannot produce the composition of step (b); (d) inserting the nucleic acid into the cells and growing the cells under conditions in which the nucleic acid is expressed and the polypeptide it encodes is translated, and the cells are grown in a medium without a carbon source necessary for growth, and adding the precursor or substrate of step (b); and (e) screening the cells for growth, wherein the nucleic acid in the growth-promoting clone is identified as encoding an epoxide hydrolase capable of converting the precursor or substrate to a composition containing the required growth, thereby selecting for the nucleic acid encoding the epoxide hydrolase.
The invention provides methods for identifying a nucleic acid encoding an epoxide hydrolase comprising the steps of: (a) providing a library of nucleic acids; (b) providing a precursor or substrate, wherein the precursor or substrate can be converted by an epoxide hydrolase to a composition necessary for cell growth; (c) providing a plurality of cells wherein the cells cannot produce the composition of step (b); (d) inserting a member of the gene library into the cell and culturing the cells in a medium without the composition required for growth; (e) adding the precursor or substrate of step (b) to the culture; (f) selecting a growing cell and identifying the inserted library member of step (d), wherein the cell is capable of growth by enzymatically converting the precursor to a composition necessary for growth, and the enzyme is encoded by the library member, thereby identifying the nucleic acid encoding the epoxide hydrolase.
In one embodiment, the precursor or substrate comprises glycidol or propylene oxide. In one embodiment, the composition necessary for growth comprises glycerol or propanediol. In one embodiment, the precursor or substrate comprises a pure enantiomer or a racemic mixture. The composition required for growth may contain a pure enantiomer or a racemic mixture. In one aspect, the nucleic acid is a member of a gene library. In one aspect, the library can be obtained from a mixed population of organisms. A mixed population of organisms is derived from a soil sample, a water sample, or an air sample. In one embodiment, the cells comprise an E. coli mutant with a disruption of fucA.
The invention provides methods for the identification of epoxide hydrolase comprising the steps of: (a) obtaining the polypeptide; (b) providing a precursor or substrate, wherein the precursor or substrate can be converted by an epoxide hydrolase into a composition necessary for cell growth; (c) providing a plurality of cells wherein the cells are unable to produce the composition of step (b); (d) inserting the polypeptide into the cells and growing the cells, wherein the cells are grown in a medium without the composition necessary for growth and by adding the precursor or substrate from step (b); and (e) screening cells for growth, wherein the polypeptide in the growth-stimulated clone is identified as an epoxide hydrolase capable of converting the precursor or substrate to a compound necessary for cell growth, thereby identifying the epoxide hydrolase.
The invention provides methods for the identification of epoxide hydrolase comprising the steps of: (a) providing a library of polypeptides; (b) providing a precursor or substrate, wherein the precursor or substrate can be converted by an epoxide hydrolase into a composition necessary for cell growth; (c) providing a plurality of cells wherein the cells are unable to produce the composition of step (b); (d) inserting a member of the polypeptide library into the cell and culturing the cells in a medium without the composition required for growth; (d) adding the polypeptide library from step (a) and the precursor or substrate from step (b) to the cells from step (c); and (f) selecting the growing cell and identifying the inserted polypeptide of step (d), wherein the cell is capable of growth by enzymatically converting the precursor to a composition necessary for growth, thereby identifying the epoxide hydrolase. In one aspect, the library is obtained from a mixed population of organisms.
The invention provides direct activity assay methods for screening polypeptides having epoxide hydrolase activity, comprising the steps of: (a) providing a plurality of polypeptides; (b) providing a precursor or substrate covalently bound to a fluorophore, wherein the precursor or substrate can be converted by an epoxide hydrolase to a diol, wherein the fluorophore can generate a fluorescent signal when free; (c) combining the polypeptide of step (a) with the precursor or substrate of step (b) under conditions wherein the polypeptides can convert the precursor or substrate into a diol attached to the fluorophore; (d) converting the diol attached to the fluorophore from step (c) to the free fluorophore; (e) measurement of fluorescence quantum yield; and (f) screening the polypeptide for epoxide hydrolase activity, wherein the polypeptide is identified as having epoxide hydrolase activity capable of converting a precursor or substrate to a diol as detected by an increase in fluorescence quantum yield due to the formation of a free fluorophore, thereby selecting a polypeptide having activity epoxide hydrolase. In one embodiment, the conversion of the fluorophore-bound diol to the free fluorophore further comprises the following steps: (a) subjecting the fluorophore-bound diol to a periodic oxidation step to provide the fluorophore-bound aldehyde; (b) subjecting the aldehyde from step (a) to BSA-catalyzed β-elimination to yield the free fluorophore. In one aspect, the fluorophore can be umbelliferone.
The invention provides direct activity colorimetric methods for screening polypeptides having epoxide hydrolase activity, comprising the steps of: (a) obtaining multiple polypeptides; (b) providing a precursor or substrate, wherein the precursor or substrate can be converted by an epoxide hydrolase to a diol, (c) providing a chemical, wherein the chemical is capable of reacting with the precursor or substrate to form a product capable of absorbance at a visible wavelength whereby the chemical does not react with the part; (d) combining the polypeptide of step (a) with the precursor or substrate of step (b) under conditions in which the polypeptide can convert the precursor or substrate to a diol; (e) measuring the decrease in light absorption at a wavelength characteristic of the absorption of the product bound to the precursor or substrate; and (f) screening the polypeptide for epoxide hydrolase activity, wherein the polypeptide is identified as having epoxide hydrolase activity capable of converting a precursor or substrate to a diol as detected by a decrease in absorbance at a characteristic wavelength due to diol formation, thereby selecting a polypeptide having activity epoxide hydrolase. In one aspect, the chemical capable of reacting with the precursor or substrate to form a product capable of absorbing at a visible wavelength is 4-(p-nitrobenzyl)-pyridine.
The invention provides in vitro growth selection selection using epoxide as a precursor for the detection of nucleic acids encoding diol product-producing epoxide hydrolases, including the steps of: (a) providing a library of nucleic acids; (b) providing a precursor, wherein the precursor can be converted to a diol; (c) providing a diol-free in vitro transcription/translation system; (d) adding a nucleic acid library member to an in vitro transcription/translation system; (e) adding the precursor of step (b); and (f) selecting the diol-producing sample and identifying the inserted nucleic acid of step (d), wherein selecting the sample containing the precursor selects the nucleic acid encoding the corresponding epoxide hydrolase.
The invention provides in vitro growth selection selection using epoxide as a precursor for the discovery of diol-producing epoxide hydrolases, comprising the steps of: (a) providing a polypeptide library; (b) providing a precursor, wherein the precursor can be converted to a diol; (c) providing a diol-free in vitro transcription/translation system; (d) adding a polypeptide library member to an in vitro transcription/translation system; (e) adding the precursor of step (b); and (f) selecting the diol-producing sample and identifying the added polypeptide of step (d), wherein selecting the diol-containing sample selects the appropriate epoxide hydrolase. DESCRIPTION OF DRAWINGS
FIGURE Figure 1 is a schematic representation of the selective hydrolysis of a racemic epoxide to give the corresponding diol and unreacted epoxide with high enantiomeric excess (ee) values.
FIGURE Figure 2 is a schematic representation of glycidol, (S-(1) and R-(2)), the leading chiral epoxides among representative C-3 synthons.
FIGURE Figure 3 is a schematic representation of the preparation of saquinavir, an antiviral drug, and the synthesis of amprenavir, another antiviral drug.
FIGURE Figure 4 is a schematic representation of the synthesis of two anticancer drugs, docetaxel and paclitaxel.
FIGURE Figure 5 is a schematic representation of the hydrolysis of the styrene oxide substrate by A. niger epoxide hydrolase that hydrolyzes the R-enantiomers in all transformations.
FIGURE Figure 6 is a compilation of graphs of example reactions that can be used with the epoxide hydrolases of the invention.
FIGURE Figure 7 is a schematic representation of an exemplary reaction in which the epoxide hydrolase of the invention is used for mesoepoxide desymmetrization.
FIGURE Figure 8 shows a block diagram of the computer system.
FIGURE Figure 9 is a flow diagram illustrating one aspect of the process of comparing a new nucleotide or protein sequence to a sequence database to determine levels of homology between the new sequence and sequences in the database.
FIGURE Figure 10 is a flow diagram illustrating one aspect of a computer process for determining whether two sequences are homologous.
FIG. 11 is a flow diagram illustrating one aspect of an identifier process 300 for detecting the presence of a feature in an array.
FIGURE Figure 12 is an illustration of the mechanism of A. radiobacter epoxide hydrolase.
FIGURE Figure 13 is an illustration of the types of epoxy substrates.
FIGURE Figure 14 is an illustration of the enantiomeric hydrolysis of cis-2,3-epoxyheptane to 2R,3R-2,3-dihydroxyheptane catalyzed by Norcardia EH1.
FIGURE Figure 15 is an illustration of glycidol and propylene oxide used as selection substrates.
FIGURE Figure 16 is an illustration of a high-throughput screening method based on a periodate-coupled fluorogenic assay for epoxide hydrolase.
FIGURE Figure 17 is an illustration of substrate synthesis for the periodate-linked fluorogenic assay for epoxide hydrolase.
FIGURE Figure 18 is an illustration of fluorescence activated cell sorting (FACS) for ultra-high throughput single cell activity and screening sequence.
FIGURE Figure 19 is an illustration of environmental library biopanning for sequence-based discovery. Similar reference symbols in different drawings indicate similar elements.
DETAILED DESCRIPTION
The invention provides polypeptides having epoxide hydrolase activity, polynucleotides encoding the polypeptides, and methods for making and using these polynucleotides and polypeptides. The polypeptides of the invention can be used as epoxide hydrolases to catalyze the hydrolysis of epoxides and arene oxides to their respective diols. The epoxide hydrolases of the invention may be hydrolytic enzymes that catalyze the opening of the epoxide ring to convert the substrate to the corresponding diol. The epoxide hydrolases of the invention can be highly regio- and enantioselective, enabling the production of pure enantiomers. The polypeptides of the invention can be used to hydrolyze hazardous epoxy compounds produced by peroxidation in living organisms and to remove highly chemically reactive epoxy compounds.
The invention provides epoxide hydrolases (EH) from a wide range of biodiversity sources, such as enzyme or gene libraries. The invention provides methods for rapid selection or screening of enzymes and genes to obtain suitable EHs. The invention provides methods of accessing untapped biological diversity and rapidly searching for sequences and activities of interest using recombinant DNA technology. This invention combines the advantages of being able to rapidly search for natural compounds with the flexibility and reproducibility afforded when working with the genetic material of organisms.
The invention provides a method of synthesizing useful chiral epoxides using the enzymes of the invention. The invention provides useful chiral epoxides and derivatives thereof prepared using the EH of this invention.
The epoxide hydrolases of the invention are highly versatile biocatalysts for the asymmetric hydrolysis of epoxides at the preparative level. With kinetic resolution, which ensures the appropriate vicinal diol and the remaining non-racemic unhydrolyzed epoxide, the epoxide hydrolases of the invention are used in enantiomerically convergent processes to produce a single enantiomeric diol from a racemic oxirane. The epoxide hydrolases of the invention can be used in the hydrolysis of highly substituted epoxides, e.g. of highly substituted 2,2- and 2,3-disubstituted epoxides. The epoxide hydrolases of the invention can be used by any method known in the art, see, for example, Orru (1999) Curr. An opinion. Chemistry 3:16-21.
Polypeptides of the invention can be used as epoxide hydrolases in Sharpless epoxidation reactions, Katsuki-Jacobsen reactions, Shi epoxidation and Jacobsen hydrolytic kinetic resolution reactions (see Figure 6).
The invention provides methods of using epoxide hydrolases according to the invention to obtain stereospecific reaction products. Polypeptides of the invention can be used to desymmetrize mesoepoxides. In one embodiment the conversion of the substrate to the R,R or S,S product was greater than 97% ee, and in one embodiment the conversion was 99%. FIGURE Figure 7 is a schematic diagram of an exemplary reaction in which the epoxide hydrolase of the invention is used for mesoepoxide desymmetrization.
In one aspect, the invention provides epoxide hydrolases for the production of styrene glycol and corresponding methods. Epoxide hydrolases react with styrene oxide to produce styrene glycols.
The invention provides methods for the enzymatic separation of epoxide-enantiomer mixtures. The invention provides methods of cell protection against oxidants, e.g. in an immunotoxic reaction, which consist of the introduction around or into the cell of an antioxidant agent containing epoxide hydrolase. The invention provides epoxide hydrolase inhibitors (eg, an antisense or ribozyme nucleic acid or antibody of the invention) for ameliorating an immunological disorder, e.g. of a T cell-mediated disorder, and suitable methods for ameliorating the immune disorder, e.g. mediated by T cells The invention provides epoxide hydrolases for the treatment of peroxisomal disorders and corresponding methods for ameliorating peroxisomal disorders. The invention provides epoxide hydrolases for treating respiratory dysfunction, impairment or disease and corresponding methods for ameliorating respiratory dysfunction, impairment or disease. The invention provides reagents for forensic analysis, e.g. as chromosome markers or tissue- or organ-specific markers, containing the epoxy hydrolases of the invention. The invention provides epoxy hydrolases for the development of new pest control agents, e.g. insects, and compositions containing epoxide hydrolase inhibitors (eg, antisense or ribozyme nucleic acid or antibodies of the invention) for use in pest control.
The invention provides epoxy hydrolases for the hydrolysis of leukotrienes and corresponding methods, e.g. their use as anti-inflammatory reagents. Accordingly, the invention provides pharmaceutical compositions containing one or more epoxide hydrolases of the invention that act as anti-inflammatory reagents by hydrolyzing leukotriene and other inflammatory compounds. Alternatively, inflammation can be treated by inhibiting epoxide hydrolases using compositions containing epoxide hydrolase inhibitors (eg, an antisense or ribozyme nucleic acid or antibody of the invention) to inhibit inflammatory mediators of polyunsaturated lipid metabolites. The invention provides epoxide hydrolases and methods for assessing the cytotoxicity of a compound by measuring the expression of epoxide hydrolases in a cell.
The polypeptides of the invention can be made or used as epoxide hydrolases by any known method, protocol or industrial application, as described, for example, in ref. 6,387,668; 6,379,938; 6,372,469; 6,372,469; 5,635,369; 6,174,695, which describes the use of epoxide hydrolase inhibitors to inhibit inflammation mediated by polyunsaturated lipid metabolites; U.S. patent no. 5,759,765, which describes epoxide hydrolases and methods for assessing the cytotoxicity of a compound by measuring the expression of epoxide hydrolases in a cell; and WO 01/46476, which describes the use of epoxide hydrolases to deliver stereospecific reaction products; WO 01/07623, WO 00/68394, WO 00/37619, which describe methods for enzymatic separation of epoxide-enantiomer mixtures; WO 99/06059, which describes a method of protecting a cell from immunotoxicity, which consists in introducing into the cell an antioxidant agent containing epoxide hydrolase; WO 00/23060, which describes the use of epoxide hydrolase inhibitors to improve an immune disorder, e.g. T cell-mediated disorders; WO 00/29846 which describes the use of epoxide hydrolases in the treatment of peroxisomal disorders; WO 99/64627, which describes the use of epoxide hydrolases for the treatment of dysfunction, damage or disease of the respiratory system; WO 01/42451, which describes the use of epoxide hydrolases in forensic reagents, e.g. as chromosome markers or tissue- or organ-specific markers; US Patent Nos. 6,153,397, 6,143,542, 6,037,160 and WO 99/32153, which describe the use of epoxide hydrolase inhibitors in pest control; JP 20217597, which describes the use of epoxide hydrolases for the production of styrene glycol by reaction with styrene oxides; WO 00/50577, which describes the use of epoxide hydrolases to hydrolyze leukotrienes and act as anti-inflammatory reagents.
Definitions
The term "epoxide hydrolase" includes enzymes that catalyze the cofactor hydrolysis of oxirane compounds, for example epoxides, to their corresponding diols by the addition of a water molecule. The term also includes epoxide hydrolases capable of hydrolyzing peptide bonds at high temperature, low temperature, alkaline pH, and acidic pH. Epoxide hydrolase activity involves regioselective epoxide hydrolase activity, i.e. when two possible carbon atoms of the substrate are attacked. Epoxide hydrolase activity also includes enantioselective epoxide hydrolase activity, i.e. the enzyme's preference for substrates of a certain chirality. Epoxide hydrolase activity includes epoxide hydrolase activity that is not stereoselective.
A "variant epoxide hydrolase" has an amino acid sequence that is derived from the amino acid sequence of a "precursor epoxide hydrolase". Precursor epoxide hydrolases include natural epoxide hydrolases and recombinant epoxide hydrolases. The amino acid sequence of the epoxide hydrolase variant is "derived" from the amino acid sequence of the precursor epoxide hydrolase by substitution, deletion, or insertion of one or more amino acids of the precursor amino acid sequence. Such modification refers to the "precursor DNA sequence" encoding the amino acid sequence of the epoxide hydrolase precursor, and not to the manipulation of the epoxide hydrolase precursor enzyme per se. Suitable methods for such manipulation of precursor DNA sequences include those disclosed herein as well as methods known to those skilled in the art.
The term "antibody" includes a peptide or polypeptide derived from, modeled on, or substantially encoded by an immunoglobulin gene or immunoglobulin genes or fragments thereof, capable of specifically binding an antigen or epitope, see, e.g., Fundamental Immunology, Third Edition, W. E. Paul, ed., Raven Press, N.Y. (1993); Wilson (1994) J. Immunol. Methods 175:267-273; Yarmush (1992) J. Biochem. biophysics. Methods 25:85-97. The term antibody includes parts that bind to antigens, i.e. "antigen-binding sites" (eg, fragments, subsequences, complementarity-determining regions (CDRs)) that retain antigen-binding capability, including (i) a Fab fragment, a monovalent fragment consisting of a VL domain, VH, CL, and CH1; (ii) F(ab')2 fragment, a bivalent fragment containing two Fab fragments joined by a disulfide bridge in the hinge region; (iii) Fd fragment consisting of VH and CH1 domains; (iv) an Fv fragment consisting of the VL and VH domains of one antibody arm, (v) a dAb fragment (Ward et al. (1989) Nature 341:544-546) consisting of the VH domain; and (vi) an isolated complementarity determining region (CDR). Single chain antibodies are also incorporated by reference with the term "antibodies".
As used herein, the terms "array" or "microarray" or "biochip" or "chip" mean multiple targets, each target containing a predetermined amount of one or more polypeptides (including antibodies) or nucleic acids immobilized on a specific area of the substrate surface as which is explained in more detail below.
As used herein, the terms "computer", "computer program" and "processor" are used in their broadest general context and include all such devices as detailed below.
As used herein, the term "expression cassette" refers to a nucleotide sequence capable of effecting the expression of a structural gene (ie, a sequence encoding a protein, such as an epoxide hydrolase polypeptide of the invention) in a host compatible with such sequence. Expression cassettes include at least a promoter operably linked to the polypeptide coding sequence; and optionally with other sequences, eg transcription termination signals. Additional factors necessary or useful in the expression effect can also be used, eg enhancers. As used herein, the term "operably linked" refers to the linkage of a promoter upstream of a DNA sequence such that the promoter mediates transcription of the DNA sequence. Therefore, expression cassettes also include plasmids, expression vectors, recombinant viruses, any form of "naked DNA" recombinant vector, and the like. A "vector" includes a nucleic acid capable of infecting, transfecting, transiently or permanently transducing a cell. It should be noted that the vector can be a bare nucleic acid or a nucleic acid complexed with a protein or lipid. The vector optionally contains viral or bacterial nucleic acids and/or proteins and/or membranes (eg, cell membrane, viral lipid envelope, etc.). Vectors include, but are not limited to, replicons (eg, RNA replicons, bacteriophages) to which DNA fragments can be attached and replicated. Vectors therefore include, but are not limited to, RNA, autonomously self-replicating circular or linear DNA or RNA (eg, plasmids, viruses, and the like, see, eg, US Patent No. 5,217,879) and include both expression and non-expression plasmids. Where a recombinant microorganism or cell culture is described as a host "expression vector," this includes both extrachromosomal circular and linear DNA as well as DNA that is integrated into the host chromosome(s). When the vector is maintained by the host cell, the vector can be stably replicated in cells during mitosis as an autonomous structure or can be incorporated into the host genome.
"Plasmids" may be commercially available, freely available to the public, or may be constructed from available plasmids according to published procedures. Equivalent plasmids to those described herein are known in the art and will be apparent to one of ordinary skill in the art.
The term "gene" means a nucleic acid sequence that contains a segment of DNA involved in the production of a transcription product (eg, genes may include, but are not limited to, regions upstream and downstream of the coding region, such as leader and teaser, promoters and enhancers, as well as sequences where appropriate intermediates (introns) between individual coding segments (exons).
As used herein, the term "nucleic acid" or "nucleic acid sequence" refers to an oligonucleotide, nucleotide, polynucleotide, or fragment of DNA or RNA (eg, mRNA, rRNA, tRNA) of genomic or synthetic origin. origin, which can be single-stranded or double-stranded and can represent a sense or antisense strand, peptide nucleic acid (PNA) or any material similar to DNA or RNA, of natural or synthetic origin, including e.g. iRNA, ribonucleoproteins (eg iRNP). The term includes nucleic acids, i.e. oligonucleotides, which contain known analogues of natural nucleotides. The term also includes nucleic acid-like structures with synthetic backbones, see e.g. Mata (1997) Toxicol. Application Pharmacol. 144:189-197; Strauss-Soukup (1997) Biochemistry 36:8692-8698; Samstag (1996) Antisense Nucleic Acid Drug Dev 6:153-156.
As used herein, the term "amino acid" or "amino acid sequence" refers to an oligopeptide, peptide, polypeptide, or protein sequence, or a fragment, portion, or subunit of any of these, and natural or synthetic molecules.
As used herein, the terms "polypeptide" and "protein" refer to amino acids linked together by peptide bonds or modified peptide bonds, i.e., the term "polypeptide" also includes polypeptide peptides and fragments, motifs, and the like. The term also includes glycosylated polypeptides. The peptides and polypeptides of the invention also include all "mimetic" and "peptidomimetic" forms, as described in more detail below.
The term "isolated" as used herein means that the material has been removed from its original environment (eg, the natural environment if naturally occurring). For example, a naturally occurring polynucleotide or polypeptide present in a living animal is not isolated, but the same polynucleotide or polypeptide is isolated from some or all of the materials that coexist in nature. Such polynucleotides may be part of a vector and/or such polynucleotides or polypeptides may be part of a composition and still be isolated such that such vector or composition is not part of its natural environment. An isolated material or composition as used herein may also be a "purified" composition, i.e., it does not require absolute purity; it is rather a relative definition. Individual nucleic acids obtained from a library can be conventionally purified to electrophoretic homogeneity. In alternative aspects, the invention provides nucleic acids that have been purified from genomic DNA or other sequences in a library or other environment by at least one, two, three, four, five or more orders of magnitude.
The term "recombinant" as used herein means that the nucleic acid is contiguous to a "parent" nucleic acid to which it is not contiguous in its natural environment. In one aspect, the nucleic acids comprise 5% or more of the number of nucleic acid insertions in a population of nucleic acid “backbone molecules”. "Scaffold molecules" of the invention include nucleic acids such as expression vectors, self-replicating nucleic acids, viruses, integrating nucleic acids, and other vectors or nucleic acids used to maintain or manipulate the nucleic acid insert of interest. In one aspect, the enriched nucleic acids comprise 15%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, 90% or more of the number of nucleic acid inserts in the recombinant backbone Molecule population. "Recombinant" polypeptides or proteins refer to polypeptides or proteins produced by recombinant DNA techniques; eg, made from cells transformed with an exogenous DNA construct encoding the desired polypeptide or protein. "Synthetic" polypeptides or proteins are those produced by chemical synthesis, as described in more detail below.
A promoter sequence is "operably linked" to a coding sequence when RNA polymerase that initiates transcription at the promoter will transcribe the coding sequence into mRNA, as discussed below.
"Oligonucleotide" refers to a single-stranded polydeoxynucleotide or two complementary polydeoxynucleotide chains that can be chemically synthesized. Such synthetic oligonucleotides lack a 5' phosphate and therefore will not bind to another oligonucleotide without the addition of a phosphate from ATP in the presence of a kinase. The synthetic oligonucleotide will bind to the fragment that is not dephosphorylated.
"Hybridization" refers to the process by which a nucleic acid strand joins a complementary strand through base pairing. Hybridization reactions can be sensitive and selective, allowing identification of a particular sequence of interest even in samples where it is present in low concentrations. Stringent conditions can be determined, for example, by the salt or formamide concentration in the prehybridization and hybridization solutions, or by the hybridization temperature, and are well known in the art. For example, stringency can be increased by decreasing the salt concentration, increasing the formamide concentration, or increasing the hybridization temperature by changing the hybridization time, as detailed below. In alternative aspects, the nucleic acids of the invention are defined by their ability to hybridize under different stringency conditions (eg, high, medium, and low) as set forth herein.
The term "variant" refers to polynucleotides or polypeptides of the invention modified at one or more base pairs, codons, introns, exons, or amino acid residues (as appropriate), but still retaining the biological activity of the epoxide hydrolase of the invention. Variants can be produced by any means, including methods such as, for example, error-prone PCR, shuffling, oligonucleotide-directed mutagenesis, PCR assembly, sex-specific PCR mutagenesis, in vivo mutagenesis, cassette mutagenesis, recursive team mutagenesis, exponential team mutagenesis , site-specific mutagenesis, gene reassembly, GSSM, and any combination thereof. Included herein are techniques for producing a variant epoxide hydrolase that has activity at, for example, a pH or temperature that differs from wild-type epoxide hydrolase.
The term "saturation mutagenesis" or "GSSM" includes a method that uses degenerate oligonucleotide primers to introduce point mutations into a polynucleotide, as described in detail below.
The term "optimized directed evolution system" or "optimized directed evolution system" includes a method for reassembling fragments of related nucleic acid sequences, e.g., related genes, and is explained in detail below.
The term "synthetic ligation reassembly" or "SLR" includes the method of ligation of oligonucleotide fragments in a non-stochastic manner and is explained in detail below.
Generation and manipulation of nucleic acids
The invention provides nucleic acids, including expression cassettes, such as expression vectors, that encode the polypeptides of the invention. The invention also includes methods for discovering novel epoxide hydrolase sequences using the nucleic acids of the invention. Also provided are methods of modifying the nucleic acids of the invention, for example, by reassembly by synthetic ligation, an optimized site-directed evolution system, and/or by saturation mutagenesis.
Nucleic acids of the invention can be produced, isolated and/or manipulated, for example, by cloning and expression of cDNA libraries, amplification of messages or genomic DNA by PCR, and the like. In practicing the methods of the invention, homologous genes can be modified by manipulating the template nucleic acid as described herein. The invention may be practiced in combination with any method or protocol or device known in the art and well described in the scientific and patent literature.
General techniques
Nucleic acids used in the practice of this invention, whether RNA, iRNA, antisense nucleic acid, cDNA, genomic DNA, vectors, viruses or hybrids thereof, can be isolated from various sources, genetically modified, amplified and/or expressed. /generated recombinantly. Recombinant polypeptides produced from these nucleic acids can be individually isolated or cloned and tested for the desired activity. Any recombinant expression system can be used, including bacterial, mammalian, yeast, insect or plant expression systems.
Alternatively, these nucleic acids can be synthesized in vitro by well-known chemical synthesis techniques, as described, for example, in Adams (1983) J. Am. social chemistry 105:661; Belous (1997) Nucleic Acids Res. 25:3440-3444; Frenkel (1995) Free radicals. Biol. Honey. 19:373-380; Blommers (1994) Biochemistry 33:7886-7896; Narang (1979) Met. enzyme. 68:90; Brown (1979) Met. enzyme. 68:109; Beaucage (1981) Tetra. Latvian. 22:1859; LOUSE. patent no. 4,458,066.
Nucleic acid manipulation techniques such as subcloning, labeling of probes (eg Klenow polymerase labeling of random samples, nick translation, amplification), sequencing, hybridization and the like are well described in the scientific and patent literature, see e.g. Sambrook, ed. MOLECULAR CLONING: A LABORATORY MANUAL (2ND EDITION), Vol. 1-3, Cold Spring Harbor Laboratory, (1989); CURRENT PROTOCOLS IN MOLECULAR BIOLOGY, Ausubel, editor. John Wiley & Sons, Inc., New York (1997); LABORATORY TECHNIQUES IN BIOCHEMISTRY AND MOLECULAR BIOLOGY: HYBRIDIZATION WITH NUCLEIC ACID PROBE, Part I. Nucleic Acid Theory and Preparation, Tijssen, ed. Elsevier, New York (1993).
Another useful method for obtaining and handling nucleic acids used in the practice of the methods of the invention is cloning from genomic samples and, if desired, searching and recloning inserts isolated or amplified from e.g. genomic clones or cDNA clones. Nucleic acid sources used in the methods of the invention include genomic or cDNA libraries contained, e.g., in mammalian artificial chromosomes (MACs), see, e.g., US Pat. no. 5721118; 6,025,155; human artificial chromosomes, see e.g. Rosenfeld (1997) Nat. Genet. 15:333-335; yeast artificial chromosomes (YAC); bacterial artificial chromosomes (BAC); artificial P1 chromosomes, see e.g. Woon (1998) Genomics 50:306-316; P1-derived vectors (PACs), see, eg, Kern (1997) Biotechniques 23:120-124; cosmids, recombinant viruses, phages or plasmids.
In one aspect, a nucleic acid encoding a polypeptide of the invention is assembled into a suitable phase with a leader sequence capable of directing secretion of the translated polypeptide or fragment thereof.
The invention provides fusion proteins and nucleic acids encoding them. A polypeptide of the invention can be combined with a heterologous peptide or polypeptide, such as N-terminal identification peptides, which confer desirable characteristics, such as increased stability or simplified purification. Peptides and polypeptides of the invention can also be synthesized and expressed as fusion proteins with one or more additional domains fused thereto, e.g. expression of B cells and the like. Domains that facilitate detection and purification include, for example, metal-chelating peptides such as polyhistidine tracts and histidine-tryptophan modules that allow purification on immobilized metals, protein A domains that allow purification on immobilized immunoglobulin, and an extension/affinity domain. FLAGS purification system (Immunex Corp., Seattle, Washington). Inclusion of cleavable linkers such as factor Xa or enterokinase (Invitrogen, San Diego CA) between the purification domain and the peptide or polypeptide containing the motif to facilitate purification. For example, an expression vector may contain a nucleic acid sequence encoding an epitope fused to six histidine residues followed by thioredoxin and an enterokinase cleavage site (see, e.g., Williams (1995) Biochemistry 34:1787-1797; Dobeli (1998) Protein Expr Purif 12: 404-414). Histidine residues facilitate detection and purification, while the enterokinase cleavage site provides a means to purify the epitope from fusion protein residues. The technology for vectors encoding fusion proteins and the use of fusion proteins is well described in the scientific and patent literature, see, for example, Kroll (1993) DNA Cell. Biol., 12:441-53.
Control sequences of transcription and translation
The invention provides nucleic acid sequences (eg DNA) of the invention operably linked to a control sequence (eg transcription or translation), e.g. promoters or enhancers, to direct or modulate RNA synthesis/expression. An expression control sequence can be contained in an expression vector. Examples of bacterial promoters include lacI, lacZ, T3, T7, gpt, lambda PR, PL and trp. Examples of eukaryotic promoters include CMV immediate early, HSV thymidine kinase, SV40 early and late, retrovirus LTR, and mouse metallothionein I.
Promoters suitable for expression of polypeptides in bacteria include E. coli lac or trp promoters, lacI promoter, lacZ promoter, T3 promoter, T7 promoter, gpt promoter, lambda PR promoter, lambda PL promoter, promoters with operons encoding glycolytic enzymes such as kinase 3 -phosphoglycerate (PGK) and acid phosphatase promoter. Eukaryotic promoters include the CMV immediate early promoter, the HSV thymidine kinase promoter, the heat shock promoters, the SV40 early and late promoters, the retroviral LTR, and the murine metallothionein-1 promoter. Other promoters known to control gene expression in prokaryotic or eukaryotic cells or their viruses may also be used.
Expression vectors and cloning vehicles
The invention provides expression vectors and cloning vehicles containing the nucleic acids of the invention, e.g. sequences encoding the proteins of the invention. Expression vectors and cloning vehicles of the invention may include viral particles, baculovirus, phages, plasmids, phagemids, cosmids, cosmids, bacterial artificial chromosomes, viral DNA (e.g., based on artificial chromosomes, yeast plasmids, yeast artificial chromosomes, and any other specific vectors for specific hosts of interest (such as Bacillus, Aspergillus and yeasts). Vectors of the invention may include chromosomal, non-chromosomal and synthetic DNA sequences. A large number of suitable vectors are known to those skilled in the art and are commercially available Examples of vectors include: bacterial: pQE vectors (Qiagen), pBluescript plasmids, pNH vectors (lambda-ZAP vectors (Stratagene); ptrc99a, pKK223-3, pDR540, pRIT2T (Pharmacia); eukaryotic: pXT1, pSG5 (Stratagene), pSVK3, pBPV, pMSG, pSVLSV40 (Pharmacia) However, any other plasmid or vector can be used as long as it is replicable and viable in the host Low or high copy number vectors can be used in this invention.
An expression vector may contain a promoter, a ribosome binding site for translation initiation, and a transcription terminator. The vector may also contain appropriate sequences to enhance expression. Mammalian expression vectors may contain an origin of replication, any required ribosome binding sites, a polyadenylation site, fused donor and acceptor sites, transcription termination sequences, and flanking 5' non-transcribed sequences. In some aspects, DNA sequences derived from the SV40 splice and polyadenylation sites can be used to provide the required non-transcribed genetic elements.
In one aspect, the expression vectors contain one or more selectable marker genes that allow selection of host cells containing the vector. Such selectable markers include genes encoding dihydrofolate reductase or genes conferring resistance to neomycin in eukaryotic cell culture, genes conferring resistance to tetracycline or ampicillin in E. coli, and the S. cerevisiae TRP1 gene. Promoter regions can be selected from any desired gene using chloramphenicol transferase (CAT) vectors or other vectors with selectable markers.
Vectors for the expression of the polypeptide or its fragment in eukaryotic cells may also contain enhancers to increase the level of expression. Enhancers are cis-acting DNA elements, typically between about 10 and about 300 bp in length, that act on a promoter to increase its transcription. Examples include the SV40 enhancer at the late origin of replication pz 100 to 270, the cytomegalovirus early promoter enhancer, the polyoma enhancer at the late origin of replication, and adenovirus enhancers.
The DNA sequence can be inserted into the vector by different methods. Generally, the DNA sequence is ligated to the desired position in the vector after the insert and vector have been digested with appropriate restriction endonucleases. Alternatively, the blunt ends of both the insert and the vector can be ligated. Many cloning techniques are known in the art, e.g. as described in Ausubel and Sambrook. These and other procedures are considered to be within the scope of the expert.
The vector can be in the form of a plasmid, viral particle or phage. Other vectors include chromosomal, non-chromosomal and synthetic DNA sequences, derivatives of SV40; bacterial plasmids, phage DNA, baculovirus, yeast plasmids, vectors derived from a combination of plasmid and phage DNA, viral DNA such as vaccinia, adenovirus, fowl pox and pararabies. Sambrook, for example, described various cloning and expression vectors for use with prokaryotic and eukaryotic hosts.
Specific bacterial vectors that can be used include commercially available plasmids containing the genetic elements of the well-known cloning vector pBR322 (ATCC 37017), pKK223-3 (Pharmacia Fine Chemicals, Uppsala, Sweden), GEM1 (Promega Biotec, Madison, Wisconsin). , USA) PQE70, PQE60, PQE-9 (Qiagen), PD10, PSIX174 PBluescript II KS, PNH8A, PNH16A, PNH18A, PNH46A (Stratagene), PTRC9, PKK23-PKDD - 8 and PCM7. Specific eukaryotic vectors include pSV2CAT, pOG44, pXT1, pSG (Stratagene) pSVK3, pBPV, pMSG and pSVL (Pharmacia). However, any other vector can be used as long as it is replicable and viable in the host cell.
Host cells and transformed cells
The invention also provides a transformed cell comprising a nucleic acid sequence of the invention, e.g. a sequence encoding a polypeptide of the invention or a vector of the invention. The host cell can be any host cell known to those skilled in the art, including prokaryotic cells, eukaryotic cells such as bacterial cells, fungal cells, yeast cells, mammalian cells, insect cells, or plant cells. Examples of bacterial cells include E. coli, Streptomyces, Bacillus subtilis, Salmonella typhimurium, and various species of the genera Pseudomonas, Streptomyces, and Staphylococcus. Examples of insect cells include Drosophila S2 and Spodoptera Sf9. Examples of animal cells include CHO, COS or Bowes melanoma or any murine or human cell line. Selecting a suitable host is within the skill of the expert.
The vector can be introduced into host cells using any of a number of techniques, including transformation, transfection, transduction, viral infection, gene guns, or Ti-mediated gene transfer. Specific methods include calcium phosphate transfection, DEAE-dextran mediated transfection, lipofection or electroporation (Davis, L., Dibner, M., Battey, I., Basic Methods in Molecular Biology, (1986)).
Where appropriate, the modified host cells may be cultured in conventional nutrient medium modified as appropriate for promoter activation, transformant selection, or gene amplification of the invention. After transformation of a suitable host strain and growth of the host strain to an appropriate cell density, the selected promoter can be induced by suitable means (eg, to produce the desired polypeptide or fragment thereof).
Cells can be collected by centrifugation, broken by physical or chemical means, and the obtained crude extract is kept for further purification. Microbial cells used for protein expression can be disrupted by any suitable method, including freeze-thaw cycles, sonication, mechanical disruption, or the use of cell lysing agents. Such methods are well known to those skilled in the art. The expressed polypeptide or fragment thereof can be recovered and purified from recombinant cell cultures by methods including ammonium sulfate or ethanol precipitation, acid extraction, anion or cation exchange chromatography, phosphocellulose chromatography, hydrophobic interaction chromatography, affinity chromatography, hydroxylapatite chromatography, and lectin chromatography. Protein refolding steps can be used to complete the configuration of the polypeptide, if desired. If desired, high performance liquid chromatography (HPLC) can be used for the final purification steps.
A variety of mammalian cell culture systems can also be used to express the recombinant protein. Examples of mammalian expression systems include the COS-7 monkey kidney fibroblast line and other cell lines capable of expressing the protein from a compatible vector, such as C127, 3T3, CHO, HeLa and BHK cell lines.
The constructs in host cells can be used in a conventional manner to produce the gene product encoded by the recombinant sequence. Depending on the host used in the recombinant production process, polypeptides produced by host cells containing the vector may be glycosylated or non-glycosylated. The polypeptides of the invention may or may not contain a starting amino acid residue of methionine.
Cell-free translation systems can also be used to produce polypeptides of the invention. Cell-free translation systems can use mRNA transcribed from a DNA construct containing a promoter operably linked to a nucleic acid encoding a polypeptide or fragment thereof. In some aspects, the DNA construct can be linearized prior to in vitro transcription reactions. The transcribed mRNA is then incubated with a suitable cell-free extract for translation, such as rabbit reticulocyte extract, to produce the desired polypeptide or fragment thereof.
Expression vectors may contain one or more marker genes that can be selected to provide a phenotypic trait for selection of transformed host cells, such as resistance to dihydrofolate reductase or neomycin in eukaryotic cell culture, or such as resistance to tetracycline or ampicillin in E .coli.
Nucleic acid amplification
In the practice of the invention, nucleic acids encoding polypeptides of the invention or modified nucleic acids can be reproduced, for example, by amplification. The invention provides pairs of primer sequences for the amplification of nucleic acids encoding epoxide hydrolase polypeptides, wherein the primer pairs are capable of amplifying a nucleic acid sequence, including exemplary SEQ ID NO: 1 or a subsequence thereof; the sequence shown in SEQ ID NO:3 or a subsequence thereof; the sequence shown in SEQ ID NO: 5 or a subsequence thereof; and the sequence set forth in SEQ ID NO: 7 or a subsequence thereof, the sequence set forth in SEQ ID NO: 9 or a subsequence thereof. One skilled in the art can design primer sequence pairs to amplify any portion or full length of these sequences; For example:
An example of SEQ ID NO:1 is
atgtcaaaca acgctcccca atcctcgtcg cgccgccatt tcgtcggcgt ggccgctgcg
60
gcgctcgcga caggctcgct gagccggctc gcctttgcca acgcattccc gactgtcggc
120
acgatcacgg aacccgccaa tggcgacaag cagcgctgc gcccgttccg cgttcacatt
180
cctgaagcgc agctcgtcga catgcggcgg cgcatcaagg cgacgcgctg gccggaccgc
240
gaaaccgtgc ccgacgaatc gcagggtatt cagctcgcca ccatccagggg actcgcccaa
300
taktgggcga ccggatacga ctggcgtaaa tgcgaggcgc gactgaattc gtatccgcaa
360
ttcatcacgg agatcgacgg actcgatatc catttcatcc atgtgcgctc gaagcacgcc
420
gacgccatgc cgttgatcgt cacgcatgga tggcccgggt cggtcatcga acagttcaag
480
atcatcgatc cgctcgtcaa tccgaccgcg tacggcgcgc cggcatcgga tgccttccat
540
ctcgtgattc cctctttgcc cggttacggc ttttcggcca gaccgaccac gacgggatgg
600
ggacggagc gcaccgcacg cgcgtgggtc accttgatga aacgcctcgg ctatgagcgt
660
tttgcttcgc agggcggcga tctcggcggg atcgtcacga acatcatggc caaacaggcg
720
ccgcccgaac tgatcggcat tcatgtgaac ttccctgcct ccgttccagc ggagagattctg
780
aagtcgctgg ctgccggtga atcgatgccc gccggattat cggacgagga aaagcacgcg
840
tatgagcagt tgagtgccaa cttcaagaag aagcgcggct acgcattcga aatgggcacg
900
cgcccgcaga cgctttacgg actcgccgac tcacccatcg cgctggcttc ctggctactc
960
gaccacggcg acggctacgg ccagcccgcg gctgcgctga gcgcggccgt ccttggtcac
1020
cccgtcaacg gtcactcagc aggcgcgctg acgcgagacg acatactcga cgacatcacg
1080
ctttactggc tgaccaacac cggtatctcg gcagcgcgtt tctactggga gtcgcatgcg
1140
aacttctttc tcgcagccga cgtcaatgtg cctgctgccg tgagcgcatt tcccggagaa
1200
aattaccagg cgccgaagag ctggacggaa aaggcctatc acaagctgat tacttcaac
1260
aagcccgaaa cgggcggcca cttcgcggca tgggaagagc cgatgatctt cgcgaatgaa
1320
gtgcgctcgg ggttaaggcc cttgcgcgcg tga
Accordingly, an exemplary pair of primer sequences for amplification are residues 1 to 21 of SEQ ID NO:1 and the complementary strand of the last 21 residues of SEQ ID NO:1.
Example SEQ ID NO:1 encodes a polypeptide having this sequence
Met Ser Asn Asn Ala Pro Gln Sir Ser Ser Arg Arg His Phe Val Gly
(SEK NR ID: 2)
Val Ala Ala Ala Ala Leu Ala Thr Gly Ser Leu Ser Arg Leu Ala Phe
Ala Asn Ala Phe Pro Thr Val Gly Thr Ile Thr Glu Pro Ala Asn Gly
Asp Lys Ala Ala Leu Arg Pro Phe Arg Val His Ile Pro Glu Ala Gln
Leu Val Asp Met Arg Arg Arg Ile Lys Ala Thr Arg Trp Pro Asp Arg
Glu Thr Val Pro Asp Glu Ser Gln Gly Ile Gln Leu Ala Thr Ile Gln
Gly Leu Ala Gln Tyr Trp Ala Thr Gly Tyr Asp Trp Arg Lys Cys Glu
Ala Arg Leu Asn Ser Tyr Pro Gln Phe Ile Thr Glu Ile Asp Gly Leu
Asp Ile His Phe Ile His Val Arg Ser Lys His Ala Asp Ala Met Pro
Leu Ile Val Thr His Gly Trp Pro Gly Ser Val Ile Glu Gln Phe Lys
Ile Ile Asp Pro Leu Val Asn Pro Thr Ala Tyr Gly Ala Pro Ala Ser
Asp Ala Phe His Leu Val Ile Pro Ser Leu Pro Gly Tyr Gly Phe Ser
Ala Arg Pro Thr Thr Gly Trp Gly Pro Glu Arg Thr Ala Arg Ala
Trp Val Thr Leu Met Lys Arg Leu Gly Tyr Glu Arg Phe Ala Ser Gln
Gly Gly Asp Leu Gly Gly Ile Val Thr Asn Ile Met Ala Lys Gln Ala
Press Download to save Pro Pro Glu Leu Ile Gly Ile His Val mp3 youtube com
Press Download to save Ala Gly Ile Leu Ser Leu mp3 youtube com
Leu Ser Asp Glu Glu Lys His Ala Tyr Glu Gln Leu Ser Ala Asn Phe
Lys Lys Lys Arg Gly Tyr Ala Phe Glu Met Gly Thr Arg Pro Gln Thr
Leu Tyr Gly Leu Ala Asp Ser Pro Ile Ala Leu Ala Ser Trp Leu Leu
Asp His Gly Asp Gly Tyr Gly Gln Pro Ala Ala Ala Leu Ser Ala Ala
Val Leu Gly His Pro Val Asn Gly His Ser Ala Gly Ala Leu Thr Arg
Asp Asp Ile Leu Asp Asp Ile Thr Leu Tyr Trp Leu Thr Asn Thr Gly
Ile Ser Ala Ala Arg Phe Tyr Trp Glu Ser His Ala Asn Phe Phe Leu
Ala Ala Asp Val Asn Val Pro Ala Ala Val Ser Ala Phe Pro Gly Glu
Asn Tyr Gln Ala Pro Lys Ser Trp Thr Glu Lys Ala Tyr Jego Lys Leu
Ile Tyr Phe Asn Lys Pro Glu Thr Gly Gly His Phe Ala Ala Trp Glu
Glu Pro Met Ile Phe Ala Asn Glu Val Arg Ser Gly Leu Arg Pro Leu
Arg Ala
An example of SEQ ID NO:3 is
atgcgggtgc agctgtccga ggtgaacctc gacgtcgagg tgagcgggga ggggccggcc
60
gtgctgctcg tgcacggctt ccccgacagc catcgtctgt ggcgtcatca ggtcgcggcg
120
ctgaacgacg ccggtttcac cacggtcgcg ccccaccctgc ggggcttcgg cgcctcggac
180
cgccccgagg gcggccccgc ggcgtaccac ccgggcaggc acgtcgccga cctggtcgag
240
ctcctggcgc acctcgacct cgaccgggtc catctggtgg gccacgactg gggttcgggc
300
atcgcgcagg ccctgaccca gttctacccg gaccgggtgc ggagcctgag catcctgtcc
360
gtcggccatc tggcgtcgat ccggtcggcg ggctgggagc agaagcagcg gtcctggtac
420
atgcttctgt tccagctggc cggggtggcc gaggactggc tggcgcggga cgacttcgcg
480
aacatgcggg agatgctggg cgagcacccg gacgccgagt ccgcgatcga ggcgctgcgc
540
gcgcccggag cgctgacggc cgcgctggac atctaccgcg cgggcctgcc gcctgaggtg
600
ctgttcggcg cggacgcgcc ggcggtgccg ctgccggagt cggtcccggt gctgggcctg
660
tggtcgaccg gcgaccgttt cctcaccgag cgctcgatgg cggggacggc cgagtacgtc
720
gccgggccgt ggcgctacga gcgcgtcgag gacgcgggcc actggctgca gctcgaccag
780
ccggagaggg tcaacgaact gctgctctcc ttcctcaagg agaacggcta g
831
Accordingly, an exemplary pair of primer sequences for amplification are residues 1 to 21 of SEQ ID NO:3 and the complementary strand of the last 21 residues of SEQ ID NO:3.
Example SEQ ID NO:3 encodes a polypeptide having this sequence
Met Arg Val Gln Leu Ser Glu Val Asn Leu Asp Val Glu Val Ser Gly
(SEKW. NR ID.: 4)
Glu Gly Pro Ala Val Leu Leu Val His Gly Phe Pro Asp Ser His Arg
Leu Trp Arg His Gln Val Ala Ala Leu Asn Asp Ala Gly Phe Thr Thr
Val Ala Pro Thr Leu Arg Gly Phe Gly Ala Ser Asp Arg Pro Glu Gly
Gly Pro Ala Ala Tyr His Pro Gly Arg His Val Ala Asp Leu Val Glu
Leu Leu Ala His Leu Asp Leu Asp Arg Val His Leu Val Gly His Asp
Trp Gly Ser Gly Ile Ala Gln Ala Leu Thr Gln Phe Tyr Pro Asp Arg
Val Arg Ser Leu Ser Ile Leu Ser Val Gly His Leu Ala Ser Ile Arg
Ser Ala Gly Trp Glu Gln Lys Gln Arg Ser Trp Tyr Met Leu Leu Phe
Gln Leu Ala Gly Val Ala Glu Asp Trp Leu Ala Arg Asp Asp Phe Ala
Asn Met Arg Glu Met Leu Gly Glu His Pro Asp Ala Glu Ser Ala Ile
Glu Ala Leu Arg Ala Pro Gly Ala Leu Thr Ala Ala Leu Asp Ile Tyr
Arg Ala Gly Leu Pro Pro Glu Val Leu Phe Gly Ala Asp Ala Pro Ala
Val Pro Leu Pro Glu Ser Val Pro Val Leu Gly Leu Trp Ser Thr Gly
Asp Arg Phe Leu Thr Glu Arg Ser Met Ala Gly Thr Ala Glu Tyr Val
Ala Gly Pro Trp Arg Tyr Glu Arg Val Glu Asp Ala Gly His Trp Leu
Gln Leu Asp Gln Pro Glu Arg Val Asn Glu Leu Leu Leu Ser Phe Leu
Lys Glu Asn Gly
An example of SEQ ID NO: 5 is
atgaggccaa cctccacacc cgagggcccc ggctccgtct ccggggcacc caacctcccg
60
gaggggttcg ccgacacctt caccagcagg tacgtcgacg ccggtgagct gcgtctccat
120
gcagttaccg gcggcgaagg cccgcccctg ctcctcgtcc acgggtggcc cgagacctgg
180
tacgcctggc ggatggtgat gccggcgttg gccgagcact tcgaggtgat cgcggtcgac
240
cagcgcgggg tcgggctgtc cgacaagccc gagacggat acgacaggac aagcctcgcc
300
aacgacctcg tcggactgat ggacgcgctc ggccatgagc ggttcgcact gtatggaacc
360
gacactggaa tgccgatcgc ctatgcactg gctgcggacc agccggaccg aatcgaccgt
420
ttgatcgtct cggaggcccc gcttcccggc gtgactccct caccaccttt gctcctcccg
480
ccccaactca ctgccaagtt ctggcacctg atgttcaacc agctccccgc cgaggtgaac
540
gaggcgctcg tcagggggcg ggagacatc ttcttcgggg cggagttcga cgcctctgcc
600
gggacgaaga agctgccagc cgacatcgtg aggtactaca tcgatacggt cgcgaccgac
660
cccgaccatc tgcgcgggag cttcgggttc taccgggcga tcccgaccac gatcgcgcag
720
aacgagcagc ggaagacacg gcgtctgccc atgcccgttc tcgcgatcgg cggggaggag
780
agcggtggag aagggccggg gaacgcgatg aagctcgtcg cagacgacgt gcagaccctg
840
gtcctcgcgg gcagcggcca ctgggtcgcc gagcaggcgc ctcacgcgct gctggcggcg
900
ctgagcgagt tcctggctcc ctacctcgag gaagcgactg cacaggtagg agcggcccgc
960
tga
Accordingly, an exemplary pair of primer sequences for amplification are residues 1 to 21 of SEQ ID NO:5 and the complementary strand of the last 21 residues of SEQ ID NO:5.
Example SEQ ID NO: 5 encodes a polypeptide having this sequence
(SEKW. NR ID.: 6)
Met Arg Pro Thr Ser Thr Pro Glu Gly Pro Gly Ser Val Ser Gly Ala Pro Asn Leu Pro Glu
Gly Phe Ala Asp Thr Phe Thr Ser Arg Tyr Val Asp Ala Gly Glu Leu Arg Leu His Ala
Val Thr Gly Gly Glu Gly Pro Leu Leu Leu Val His Gly Trp Pro Glu Thr Trp Tyr Ala
Trp Arg Met Val Met Pro Ala Leu Ala Glu His Phe Glu Val Ile Ala Val Asp
Gln Arg Gly Val Gly Leu Ser Asp Lys Pro Glu Asp Gly Tyr Asp Ser Thr Ser Leu Ala
Asn Asp Leu Val Gly Leu Met Asp Ala Leu Gly His Glu Arg Phe Ala Leu Tyr Gly Thr
Asp Thr Gly Met Pro Ile Ala Tyr Ala Leu Ala Ala Asp Gln Pro Asp Arg Ile Asp Arg Leu
Ile Val Ser Glu Ala Pro Leu Pro Gly Val Thr Pro Ser Pro Pro Leu Leu Pro Pro Gln
Leu Thr Ala Lys Phe Trp His Leu Met Phe Asn Gln Leu Pro Ala Glu Val Asn Glu Ala
Leu Val Arg Gly Arg Glu Asp Ile Phe Phe Gly Ala Glu Phe Asp Ala Ser Ala Gly Thr Lys
Lys Leu Pro Ala Asp Ile Val Arg Tyr Tyr Ile Asp Thr Val Ala Thr Asp Pro Asp His Leu
Arg Gly Ser Phe Gly Phe Tyr Arg Ala Ile Pro Thr Thr Ile Ala Gln Asn Glu Gln Arg Lys
Thr Arg Arg Leu Pro Met Pro Val Leu Ala Ile Gly Gly Glu Glu Ser Gly Gly Glu Gly Pro
Gly Asn Ala Met Lys Leu Val Ala Asp Asp Val Gln Thr Leu Val Leu Ala Gly Ser Gly
His Trp Val Path Glu Gln Path Pro His Path Leu Leu Path Leu Ser Glu Phe Leu Path Pro
Tyr Leu Glu Glu Ala Thr Ala Gln Val Gly Ala Ala Arg
An example of SEQ ID NO: 7 is
atgtcgcccc gttcgattcc tgctctggct ctactgctct gttcgactgt ctccgctttg
60
gccgccgatt tcgaatcgcg cgtgaagcat ggctacgccg actccaacgg cgtgaagatt
120
cactacgcca cgatcggcag cgggccgctg atcgtgatga tccacggctt ccccgacttc
180
tggtacacgt ggcgcaagca gatggagggt ttgtcggaca agtaccaatg cgtggccatc
240
gaccagcgcg gctataacct cagcgacaag ccgcagggcg tcgagaacta cgacatgagc
300
ctgctggtgg gcgacgtcat cgccgtgatc aagcacctgg gcaaagacaa ggccatcatc
360
gtcggtcacg actggggcgg ggcggtcgca tggcagctgg ctctgaacgc gccccagtat
420
gtcgaccgcc taatcattct taacctccca tacccgcgcg gcatcatgcg cgagctggct
480
cacaacccca agcaacaagc cgccagcgcc tacgcccgca attttcagac tgagggcgcg
540
gaagccatga tcaagccgga gcaactggcc ttctgggtca ccgatgccga ggccaagccg
600
aaatacgtgg aggcctttca gcgctcggac atcaaggcca tgctgaacta ctacaagcgc
660
aactacccgc gagagccgta tcaggaaaaac acctcgccgg tggtgaagac gcagatgccc
720
gtgctcatgt tccacggtct caaagacacc gcgctgctct ccgacgcgct caacaacacc
780
tgggactgga tgggcaaaga cctcaccctg gtgaccatcc ctgattccgg ccacttcgtg
840
cagcaagatg cagccgacct ggtgacgcgg atgatgcggg cgtggctgga acgttga
897
Accordingly, an exemplary pair of primer sequences for amplification are residues 1 to 21 of SEQ ID NO:7 and the complementary strand of the last 21 residues of SEQ ID NO:7.
Example SEQ ID NO: 7 encodes a polypeptide having this sequence
(SEKW. NR ID.: 8)
Met Ser Pro Arg Ser Ile Pro Ala Leu Ala Leu Leu Leu Cys Ser Thr Val Ser Ala Leu Ala
Ala Asp Phe Glu Ser Arg Val Lys His Gly Tyr Ala Asp Ser Asn Gly Val Lys Ile His Tyr
Ala Thr Ile Gly Ser Gly Pro Leu Ile Val Met Ile His Gly Phe Pro Asp Phe Trp Tyr Thr
Trp Arg Lys Gln Met Glu Gly Leu Ser Asp Lys Tyr Gln Cys Val Ala Ile Asp Gln Arg
Gly Tyr Asn Leu Ser Asp Lys Pro Gln Gly Val Glu Asn Thyr Asp Met Ser Leu Leu Val
Gly Asp Val Ile Ala Val Ile Lys His Leu Gly Lys Asp Lys Ala Ile Val Gly His Asp
Trp Gly Gly Ala Val Ala Trp Gln Leu Ala Leu Asn Ala Pro Gln Tyr Val Asp Arg Leu Ile
Ile Leu Asn Leu Pro Tyr Pro Arg Gly Ile Met Arg Glu Leu Ala His Asn Pro Lys Gln Gln
Ala Ala Ser Ala Tyr Ala Arg Asn Phe Gln Thr Glu Gly Ala Glu Ala Met Ile Lys Pro Glu
Gln Leu Ala Phe Trp Val Thr Asp Ala Glu Ala Lys Pro Lys Tyr Val Glu Ala Phe Gln
Arg Ser Asp Ile Lys Ala Met Leu Asn Tyr Tyr Lys Arg Asn Tyr Pro Arg Glu Pro Tyr
Gln Glu Asn Thr Ser Pro Val Val Lys Thr Gln Met Pro Val Leu Met Phe His Gly Leu
Lys Asp Thr Ala Leu Leu Ser Asp Ala Leu Asn Asn Thr Trp Asp Trp Met Gly Lys Asp
Leu Thr Leu Val Thr Ile Pro Asp Ser Gly His Phe Val Gln Gln Asp Ala Ala Asp Leu Val
Thr Arg Met Met Arg Ala Trp Leu Glu Arg
An example of SEQ ID NO:9 is
atgagtgtcg ttacagaaca cactgacaag accgctattc gtccgttcaa gatcaatgtg
60
ccggaggcgg acctgaagga tttgcacagg cgcatccagg cgaccaagtt tcccgaacgc
120
gagacggttc cggatgccac gcagggcgtg cagcttgcca cggttcaggc cctcgcgcag
180
tattgggcga aagactacaa ctggcacaag tgtgagtcga ggctgaatgc actgccgcag
240
ttcatgaccg agattgaggg gctcgacatt catttcattc acgttcgttc gaagcatccg
300
aacgcgctgc cggtcatcgt gacgcacggc tggccaggat cgatcgtcga gcagttgaag
360
atcatcgatc cgctgacgaa tccgacggcg catggtggaa gcgcatcgga cgccttcgac
420
gtggtggtcc cgtccatgcc cggctatgga taktccggca agcctaccgc cgccgggtgg
480
aatcccgttc gcatcgcgcg tgcctgggtt gtgctgatga agcgcctgggg ttacacgaag
540
ttcgtagccc aaggtggtga ctggggcgca gtcgtcgtcg acatgatggg gctacaagca
600
cctcctgagt tgctaggtat ccacaccaac atgcctggca tctttccgac cgacattgac
660
caggcggctt tcggcggcgc accgacgcca ggagggtttt cacccgacga gaaagttgct
720
tacgagcgtg tgcgcttcgt ctatcaaaag ggagtcgcct acggtttcca gatggggctt
780
cgaccgcaga cactgtacgc aatcggggac tcaccggttg ggctcgcggc ctatttcctt
840
gatcacgacg cccggagcta tgagctgatc gcacgcgtct ttcaaggaca ggccgaaggc
900
ctcacgcgcg atgacatcct ggacaacgtc acgatcacgt ggttgacgaa caccgccgtc
960
tctggcgctc gcctctattg ggagtattgg ggcaaagggt cgtacttcag cgccaagggc
1020
gtctccatcc cggttgccgt gagcgtgttc cctgacgaac tctatcccgc cccccagagc
1080
tggacagagc gcgcctatcc gaaactgatg takttcaaga agcacaacaa gggcgggcac
1140
ttcgcggcat gggaacagcc acaactcttg tctgaggacc tgcgcgaggg cttccgatcg
1200
ttgcggtag
1209
Accordingly, an exemplary pair of primer sequences for amplification are residues 1 to 21 of SEQ ID NO:9 and the complementary strand of the last 21 residues of SEQ ID NO:9.
Example SEQ ID NO: 9 encodes a polypeptide having this sequence
Met Ser Val Val Thr Glu His Thr Asp Lys Thr Ala Ile Arg Pro Phe Lys Ile Asn Val Pro
(SEKW. NR ID.: 10)
Glu Ala Asp Leu Lys Asp Leu His Arg Arg Ile Gln Ala Thr Lys Phe Pro Glu Arg Glu
Thr Val Pro Asp Ala Thr Gln Gly Val Gln Leu Ala Thr Val Gln Ala Leu Ala Gln Tyr Trp
Ala Lys Asp Tyr Asn Trp His Lys Cys Glu Ser Arg Leu Asn Ala Leu Pro Gln Phe Met
Thr Glu Ile GIu Gly Leu Asp Ile His Phe Ile His Val Arg Ser Lys His Pro Asn Ala Leu
Pro Val Ile Val Thr His Gly Trp Pro Gly Ser Ile Val Gln Gln Leu Lys Ile Ile Asp Pro Leu
Thr Asn Pro Thr Ala His Gly Gly Ser Ala Ser Asp Ala Phe Asp Val Val Val Pro Ser Met
Pro Gly Tyr Gly Tyr Ser Gly Lys Pro Thr Ala Ala Gly Trp Asn Pro Val Arg Ile Ala Arg
Ala Trp Val Val Leu Met Lys Arg Leu Gly Tyr Thr Lys Phe Val Ala Gln Gly Gly Asp
Trp Gly Ala Val Val Val Val Asp Met Met Gly Leu Gln Ala Pro Pro Gln Leu Leu Gly Colic mu
Thr Asn Met Pro Gly Ile Phe Pro Thr Asp Ile Asp Gln Ala Ala Phe Gly Gly Ala Pro Thr
Pro Gly Gly Phe Ser Pro Asp Gln Lys Val Ala Tyr Gln Arg Val Arg Phe Val Tyr Gln Lys
Gly Val Ala Tyr Gly Phe Gln Met Gly Leu Arg Pro Gln Thr Leu Tyr Ala Ile Gly Asp Ser
Pro Val Gly Leu Ala Ala Tyr Phe Leu Asp His Asp Ala Arg Ser Tyr Glu Leu Ile Ala Arg
Val Phe Gln Gly Gln Ala Glu Gly Leu Thr Arg Asp Asp Ile Leu Asp Asn Val Thr Ile Thr
Trp Leu Thr Asn Thr Ala Val Ser Gly Ala Arg Leu Tyr Trp Gln Tyr Trp Gly Lys Gly Ser
Tyr Phe Ser Ala Lys Gly Val Ser Ile Pro Val Ala Val Ser Val Phe Pro Asp Gln Leu Tyr
Pro Ala Pro Gln Ser Trp Thr Gln Arg Ala Tyr Pro Lys Leu Met Tyr Phe Lys Lys His Asn
Lys Gly Gly His Phe Ala Ala Trp Glu Gln Pro Gln Leu Leu Ser Glu Asp Leu Arg Glu
Gly Phe Arg Ser Leu Arg
An example of SEQ ID NO: 11 is
atgagcaaca cacacgtcgc cgccgggacg gagatccgcc ccttcaccgt cgaggtcgcc
60
caagacgagt tggacgacct cagccgtcgc atctcggcga cgcgctggcc cgaggaggag
120
accgtcgagg atcagtcgca gggcgtgccg ctggcgacga tgcaggagct cgtccgctac
180
tggggctccg agtacgactt cggaaggctg gaggcacggt tgaacgcctt ccctcagttc
240
atcaccgaga tcgacggcct cgacatccac ttcatccacg ttcgctcgcc ggagagaac
300
gcgctgccga tcatcctcac gcacggctgg ccgggctcgt teategagat gctgaacgtg
360
atcgggccac tgtccgaccc gaccgcgcac ggcggcgacg cggaggacgc gttcgacgtc
420
gtggttccgt ccatcccggg ctacgggttc tcggggaagc cgagcgcgac cgggtgggac
480
ccggttcaca tcgcgcgcgc gtggatcgcc ctgatggagc gecteggeec tgaccgctac
540
gtcgcgcagg gcggcgactg gggcgcgcag atcacggatg tgatgggtgc gggaggcgccg
600
ccggaactgg cggggatccc gggcttttac accaagacgg gcttcggcac gcaggtcgcc
660
gaagggaagg aagtgaaaga gttcgagggc gagcaatata taktcgagcg cgggattcgc
720
gccgacctct cgatcgtcaa gggatggaag gccgacgaga ccggcaatct catgttccgc
780
aagacaacgc gaaacttcaa cctgccggct gcgacctgcg ggaaggtgtg cctcgccgag
840
gtggaagaga tcgtcccggt cggctcgctt gatccccgact gcatccacct gccctcgatc
900
tatgtgaacc ggttgatcga tggctcgccc tacgagaaga agategagit ccggaccgtc
960
cgtcagcacg aggcggcatg a
981
Accordingly, an exemplary pair of primer sequences for amplification are residues 1 to 21 of SEQ ID NO:11 and the complementary strand of the last 21 residues of SEQ ID NO:11.
Example SEQ ID NO: 11 encodes a polypeptide having this sequence
(SEKW. NR ID.: 12)
Met Ser Asn Thr His Val Ala Ala Gly Thr Glu Ile Arg Pro Phe Thr Val Glu Val Ala
Gln Asp Glu Leu Asp Asp Leu Ser Arg Arg Ile Ser Ala Thr Arg Trp Pro Glu Glu Glu Thr
Val Glu Asp Gln Ser Gln Gly Val Pro Leu Ala Thr Met Gln Glu Leu Val Arg Tyr Trp
Gly Ser Glu Tyr Asp Phe Gly Arg Leu Glu Ala Arg Leu Asn Ala Phe Pro Gln Phe Ile Thr
Glu Ile Asp Gly Leu Asp Ile His Phe Ile His Val Arg Ser Pro Glu Glu Asn Ala Leu Pro
Ile Ile Leu Thr His Gly Trp Pro Gly Ser Phe Leu Glu Met Leu Asn Val Ile Gly Pro Leu
Ser Asp Pro Thr Ala His Gly Gly Asp Ala Glu Asp Ala Phe Asp Val Val Val Pro Ser Ile
Pro Gly Tyr Gly Phe Ser Gly Lys Pro Ser Ala Thr Gly Trp Asp Pro Val His Ile Ala Arg
Ala Trp Ile Ala Leu Met Glu Arg Leu Gly Pro Asp Arg Tyr Val Ala Gln Gly Gly Asp
Trp Gly Ala Gln Ile Thr Asp Val Met Gly Ala Glu Ala Pro Glu Leu Ala Gly Ile Pro
Gly Phe Tyr Thr Lys Thr Gly Phe Gly Thr Gln Val Ala Glu Gly Lys Glu Val Lys Glu
Phe Glu Gly Glu Gln Tyr Ile Leu Glu Arg Gly Ile Arg Ala Asp Leu Ser Ile Val Lys Gly
Trp Lys Ala Asp Glu Thr Gly Asn Leu Met Phe Arg Lys Thr Thr Arg Asn Phe Asn Leu
Pro Ala Ala Thr Cys Gly Lys Val Cys Leu Ala Glu Val Glu Glu Ile Val Pro Val Gly Ser
Leu Asp Pro Asp Cys Ile His Leu Pro Ser Ile Tyr Val Asn Arg Leu Ile Asp Gly Ser Pro
Tyr Glu Lys Lys Ile Glu Phe Arg Thr Val Arg Gln His Glu Ala Ala
An example of SEQ ID NO: 13 is
atgatttcgc tcttcgcccc cggaatcctc gccatcgcgc tcggcagcgc gcaggcgccg
60
cgcgacgatg tgttcgatcg cgtgacgcac ggttacgcga cgtcggatgg cggcgtgaag
120
atccactacg cgtcgctcgg ccaggggccg ctcgtggtga tgatccacgg cttcccggat
180
ttctggtact cgtggcggcg ccagatgcaa gcgttgtcgg atcgctatca ggtggtcgcc
240
atcgatcagc gcggctacaa cctgagcgac aagcccaagg gcgtcgacgc ctacgacatg
300
cgcctgctcg tcggcgacgt cgccgctgtg atccgcagcc tcggcaaaga caaagccacg
360
atcgtcggcc acgactgggg cggcatcgtc gcatggaact tcgcgatgaa cctgccccag
420
atgaccgaga acctgatcat cctgaacctg ccgcatccga acggccttgc ccgggagctc
480
aagaacaatc cegateagat caagaacagt gagtacgcgc gcaacttcca gaccaagtcg
540
ccgtccgatc cgaccgtgtt cttcggcagg ccgatgacgg cggagaacct ggcgggctgg
600
gtccgcgatc ccgaggcgcg caagcggtac gtcgaggcgt tccagaagtc cgatttcgag
660
gcgatgctga actactacaa gcggaactac ccgcgcggcg cgggcgcgga cgcgccgacg
720
ccgccgccgc tcccgaaggt gaagatgccg gtgctgatgt ttcacgggct caacgacacc
780
gcgttgaacg cgtcgggact gaacgacacg tggcagtggc tggagaagga tctgacgctc
840
gtcacggttc cgggctcggg acacttcgtg cagcaggatg cggccgacct cgtcgccaac
900
acgatgaagt ggtggctcgc gatgcgttga
Accordingly, an exemplary pair of primer sequences for amplification are residues 1 to 21 of SEQ ID NO:13 and the complementary strand of the last 21 residues of SEQ ID NO:13.
Example SEQ ID NO: 13 encodes a polypeptide having this sequence
(SEQ ID NO: 14)
Met Ile Ser Leu Phe Ala Pro Gly Ile Leu Ala Ile Ala Leu Gly Ser Ala Gln Ala Pro Arg
Asp Asp Val Phe Asp Arg Val Thr His Gly Tyr Ala Thr Ser Asp Gly Gly Val Lys Ile Jego
Tyr Ala Ser Leu Gly Gln Gly Pro Leu Val Val Met Ile His Gly Phe Pro Asp Phe Trp Tyr
Ser Trp Arg Arg Gln Met Gln Ala Leu Ser Asp Arg Tyr Gln Val Val Ala Ile Asp Gln Arg
Gly Tyr Asn Leu Ser Asp Lys Pro Lys Gly Val Asp Ala Tyr Asp Met Arg Leu Leu Val
Gly Asp Val Ala Ala Val Ile Arg Ser Leu Gly Lys Asp Lys Ala Thr Ile Val Gly His Asp
Trp Gly Gly Ile Val Ala Trp Asn Phe Ala Met Asn Leu Pro Gln Met Thr Glu Asn Leu Ile
Ile Leu Asn Pro His Pro Asn Gly Leu Ala Arg Glu Leu Lys Asn Asn Pro Asp Gln Ile
Lys Asn Ser Glu Tyr Ala Arg Asn Phe Gln Thr Lys Ser Pro Ser Asp Pro Thr Val Phe Phe
Gly Arg Pro Met Thr Ala Glu Asn Leu Ala Gly Trp Val Arg Asp Pro Glu Ala Arg Lys
Arg Tyr Val Glu Ala Phe Gln Lys Ser Asp Phe Glu Ala Met Leu Asn Tyr Tyr Lys Arg
Asn Tyr Pro Arg Gly Ala Gly Ala Asp Ala Pro Thr Pro Pro Pro Leu Pro Lys Val Lys Met
Pro Val Leu Met Phe His Gly Leu Asn Asp Thr Ala Leu Asn Ala Ser Gly Leu Asn Asp
Thr Trp Gln Trp Leu Glu Lys Asp Leu Thr Leu Val Thr Val Pro Gly Ser Gly His Phe Val
Gln Gln Asp Ala Ala Asp Leu Val Ala Asn Thr Met Lys Trp Leu Ala Met Arg
An example of SEQ ID NO: 15 is
gtgagagcag gtagggttcg ggcgcgcggg atcgagttcg cgacgctgga ggagggcaac
60
ggtccgctcg tcctctgcct gcacgggttc cccgatcatc cccgctcgtt ccggcaccag
120
ctgccggcgc tcgcgaaggc cggattccgc gcggtcgcgc ccgcgctccg tggctacgcg
180
ccgaccgggc cggccccccga eggeegetat cagtcggcgg cgctcgccat ggatgccgtc
240
gcgctgatcg aggcactcgg ttacgacgac gcggtcgtct tcgggcacga ctggggcgcg
300
accgccgcct aeggegeege gctcgccgca ccgcagcggg tccgcaagct cgtcaccgcc
360
gcggtgccgt acggcccgca ggtggtcggc tcgttcatga ccagctacga ccagcagcgc
420
cggtcctggt acatgttctt ctttcagacg ccgttcgccg acgccgccgt cgcgcacgac
480
gacttcgcgt tectegegeg gctgtggcgc gattggtcgc cgggctggaa gtacccaccc
540
gaagagatgg ccgcgctcaa agagacgttc egecageceg gcgtgctgga ggccgcactc
600
ggctactacc gcgccgcctt caatccggcg ctgcaggacc cegagetege ggcgttgcag
660
ggccggatga tgacggaccc gatcgaggtg ccgggcctga tgctgcacgg cgccgccgac
720
ggttgcatgg gcgctgagct cgtcgagggg atggcggcgc tcttcccgcg cggcctccgc
780
gtcgaaatcg tcccgggaac gggccacttc ctgcaccagg aagcccccga tcggatcaat
840
ccgatcgtcc tcgacttcct gcggtcgtag
870
Accordingly, an exemplary pair of primer sequences for amplification are residues 1 to 21 of SEQ ID NO:15 and the complementary strand of the last 21 residues of SEQ ID NO:15.
Example SEQ ID NO: 15 encodes a polypeptide having this sequence
(SEKW. NR ID.: 16)
Met Arg Ala Gly Arg Val Arg Ala Arg Gly Ile Glu Phe Ala Thr Leu Glu Glu Gly Asn
Gly Pro Leu Val Leu Cys Leu His Gly Phe Pro Asp His Pro Arg Ser Phe Arg His Gln
Leu Pro Ala Leu Ala Lys Ala Gly Phe Arg Ala Val Ala Pro Ala Leu Arg Gly Tyr Ala Pro
Thr Gly Pro Ala Pro Asp Gly Arg Tyr Gln Ser Ala Ala Leu Ala Met Asp Ala Val Ala
Leu Ile Glu Ala Leu Gly Tyr Asp Ala Val Val Phe Gly His Asp Trp Gly Ala Thr Ala
Ala Tyr Gly Ala Ala Leu Ala Ala Pro Gln Arg Val Arg Lys Leu Val Thr Ala Ala Val Pro
Tyr Gly Pro Gln Vat Val Gly Ser Phe Met Thr Ser Tyr Asp Gln Gln Arg Arg Ser Trp Tyr
Met Phe Phe Phe Gln Thr Pro Phe Ala Asp Ala Ala Val Ala Jego Asp Phe Ala Phe
Leu Glu Arg Leu Trp Arg Asp Trp Ser Pro Gly Trp Lys Tyr Pro Pro Glu Glu Met Ala
Ala Leu Lys Glu Thr Phe Arg Gln Pro Gly Val Leu Glu Ala Ala Leu Gly Tyr Tyr Arg
Ala Ala Phe Asn Pro Ala Leu Gln Asp Pro Glu Leu Ala Ala Leu Gln Gly Arg Met Met
Thr Asp Pro Ile Glu Val Pro Gly Leu Met Leu His Gly Ala Ala Asp Gly Cys Met Gly
Ala Glu Leu Val Glu Gly Met Ala Ala Leu Phe Pro Arg Gly Leu Arg Vat Glu Ile Vat Pro
Gly Thr Gly His Phe Leu His Gln Glu Ala Pro Asp Arg Ile Asn Pro Ile Vat Leu Asp Phe
Leo Arg Ser
An example of SEQ ID NO: 17 is
atggcgaggg tcaatcgacg gttgacggtt ttcggactcg tagtcgcgct gtcggtcgtg
60
ggcgcacggg eggeteagac ccagcgtgcg tcgaactcct tcgctgcagg cgcgggcgcg
120
aagactgcct caggcgaagc gatcgtgcct ttcaagatcc atgttcccga ctctgtcgtg
180
gccgacctga agcagcggct ccagcgcgcc cggtttgcgg acgagattcc cgaggtggga
240
tgggactatg gcacgaacct ggcctatctc areagctcg tgacgtactg gcgcgacaag
300
tacgactggc gggctcagga gcggcgcctc aaccagtacg accaattcaa gacgaacatc
360
gacgggctcg acatccactt cattcatcaa cgatcgaagg tgccgaacgc caagcccctc
420
ctgctgctga acgggtggcc gagctcgatc gaggaglaca cgaaggtcat cggtcctctc
480
actgacccgg ccgcccacgg cggccgcacc accgacgcct ttcacgtcgt catcccgtcg
540
atgccgggct acggcttctc ggacaaaccg cgcgagcgcg gctacaaccc cgagcgcatg
600
gcaagcgtat gggtgaagct gatggcgcgc ctcggataca cgcgttacct gacgcatggc
660
agcgattgggg gaatcgcggt agccacgcac ctcgccctga aagaccccggg gcatctggcg
720
gcgcttcatc ttgcgggctg cccgggcggc ctgatcgggc agtctccgtc acggcccgca
780
ggcgcgcccc cgccgccacc agccccccccg cctccagccg cgccagtctc cgcgaatctg
840
gggtatcagg aaatacaaac gaccaagccg cagacactcg gccacgggct gagtgattca
900
cccctggggc tcgcgtcgtg gattatcgac aagtggcagt cctggaccga tcacgatggc
960
gatctcgaga aggactacac caaagaccag ctgctgacga atgtcatgat tactggggtc
1020
accaactcag gggcgtcttc ggctcgcttg tactacgaga cgagacatgt ggatggacgg
1080
ctgctgccga cctttttcga gaactttctt ccgaagcttc ccgagggccg cgtcaacgtt
1140
ccaaccggat gcgggacgtt tccctcgcag tacgatcgcc gegacattec gatcagcatg
1200
aacactgcag cagcacgcac ggctgctgag gcccgctaca acgtggtcta tctgacgatt
1260
tcgccacag gaggccactt tccggcgctc gagcagccgc aggtctgggc cgacgacatt
1320
cgagcgttct tccgcgatcg gccactgtaa
1350
Accordingly, an exemplary pair of primer sequences for amplification are residues 1 to 21 of SEQ ID NO:17 and the complementary strand of the last 21 residues of SEQ ID NO:17.
Example SEQ ID NO: 17 encodes a polypeptide having this sequence
(SEKW. NR ID.: 18)
Met Ala Arg Val Asn Arg Arg Leu Thr Val Phe Gly Leu Val Val Ala Leu Ser Val Val
Gly Ala Arg Ala Ala Gln Thr Gln Arg Ala Ser Asn Ser Phe Ala Ala Gly Ala Gly Ala Lys
Thr Ala Ser Gly Glu Ala Ile Val Pro Phe Lys Ile His Val Pro Asp Ser Val Val Ala Asp
Leu Lys Gln Arg Leu Gln Arg Ala Arg Phe Ala Asp Glu Ile Pro Glu Val Gly Trp Asp
Tyr Gly Thr Asn Leu Ala Tyr Leu Lys Glu Leu Val Thr Tyr Trp Arg Asp Lys Tyr Asp
Trp Arg Ala Gln Glu Arg Arg Leu Asn Gln Tyr Asp Gln Phe Lys Thr Asn Ile Asp Gly
Leu Asp Ile His Phe Ile Jego Gln Arg Ser Lys Val Pro Asn Ala Lys Pro Leu Leu Leu Leu
Asn Gly Trp Pro Ser Ser Ile Glu Glu Tyr Thr Lys Val Ile Gly Pro Leu Thr Asp Pro Ala
Ala His Gly Gly Arg Thr Thr Asp Ala Phe His Val Val Ile Pro Ser Met Pro Gly Tyr Gly
Phe Ser Asp Lys Pro Arg Glu Arg Gly Tyr Arg Pro Glu Arg Met Ala Ser Val Trp Val
Lys Leu Met Ala Arg Leu Gly Tyr Thr Arg Tyr Leu Thr His Gly Ser Asp Trp Gly Ile Ala
Val Ala Thr His Leu Ala Leu Lys Asp Pro Gly His Leu Ala Ala Leu His Leu Ala Gly
Cys Pro Gly Gly Leu Ile Gly Gln Ser Pro Ser Arg Pro Ala Gly Ala Pro Pro Pro Pro Pro
Ala Pro Pro Pro Ala Ala Pro Val Ser Ala Asn Leu Gly Tyr Gln Glu Ile Gln Thr Thr
Lys Pro Gln Thr Leu Gly His Gly Leu Ser Asp Ser Pro Leu Gly Leu Ala Ser Trp Koliko Koliko
Asp Lys Trp Gln Ser Trp Thr Asp His Asp Gly Asp Leu Glu Lys Val Tyr Thr Lys Asp
Gln Leu Leu Thr Asn Val Met Ile Tyr Trp Val Thr Asn Ser Gly Ala Ser Ser Ala Arg Leu
Tyr Tyr Glu Thr Arg His Val Asp Gly Arg Leu Leu Pro Thr Phe Phe Glu Asn Phe Leu
Pro Lys Leu Pro Glu Gly Arg Val Asn Val Pro Thr Gly Cys Gly Thr Phe Pro Ser Gln Tyr
Asp Arg Arg Asp Ile Pro Ile Ser Met Asn Thr Ala Ala Ala Arg Thr Ala Ala Glu Ala Arg
Tyr Asn Val Val Val Tyr Leu Thr Ile Ser Pro His Gly Gly His Phe Pro Ala Leu Glu Gln Pro
Gln Val Trp Ala Asp Asp Ile Arg Ala Phe Phe Arg Asp Arg Pro Leu
An example of SEQ ID NO: 19 is
atgagcgaag taaaacatcg cgaggtagat acgaacggta tccgcatgca catcgctgaa
60
agcgggacgg gcccgttggt gttgctgtgc catggttttc ccgaatcttg gtatlcgtgg
120
cgccaccagt tggatgcggt cgcagaagct ggattccacg tggttgcacc tgacatgcga
180
ggttatggcc taactgagag tccagaagaa atcgaccggt acaccetect ccatttggtc
240
ggggatatgg tcggcctgct ggacgctctt ggggaggaga gggcggtgat tgctgggcac
300
gattggggtg ctccggtcgc gtggcacgcc gctcttctac gccccgatcg cttccgcggt
360
gtgatcggct tgagcgtgcc cttcacgccg cggcggcctg cacgccccac cagcatgatg
420
cctcagacgg aagacgcgtt gttctatcaa cttacttcc aatctccagg cgttgcggaa
480
gcggagttcg agcgcgacgt tcgtctaagc atccgaagcc tcctctactc cgcttccgggg
540
gatgctccac gttgggaaaa ccgtgaaggg gctcgagagg aagttggtat ggtaccgcgc
600
cgaggtggct taktttcgcg gttgatgaac cctgcctcgt tgccgccttg gatcaccgag
660
gcggacgtgg acttctacgt gagcgagttc acgcgcacgg gatttcgcgg gccactgaac
720
tggtaccgca atatagacag caactgggaa ctcctagcac ccatggcggc aacgacagtg
780
tcagtcccgg ggctgtacat cgcaggcgac cgcgatctcg ttttggcttt tcgtgggatg
840
gaccagatca tcgccagect gtccaagttt gtaccgcggc ttcagggaac agtcgtgctc
900
ccaggttgcg gtcattggac ccagcaggaa cgggcccgag aggtcacgaa ggccatgatt
960
gacttcgccc ggcgacttta g
981
Accordingly, an exemplary pair of primer sequences for amplification are residues 1 to 21 of SEQ ID NO:19 and the complementary strand of the last 21 residues of SEQ ID NO:19.
Example SEQ ID NO: 19 encodes a polypeptide having this sequence
(SEKW. NR ID.: 20)
Met Ser Glu Val Lys His Arg Glu Val Asp Thr Asn Gly Ile Arg Met His Ile Ala Glu
Ser Gly Thr Gly Pro Leu Val Leu Leu Cys His Gly Phe Pro Glu Ser Trp Tyr Ser Trp Arg
His Gln Leu Asp Ala Val Ala Glu Ala Gly Phe His Val Val Ala Pro Asp Met Arg Gly
Tyr Gly Leu Thr Glu Ser Pro Glu Glu Ile Asp Arg Tyr Thr Leu Leu His Leu Val Gly Asp
Met Val Gly Leu Leu Asp Ala Leu Gly Glu Glu Arg Ala Val Ile Ala Gly His Asp Trp
Gly Ala Pro Val Ala Trp His Ala Ala Leu Leu Arg Pro Asp Arg Phe Arg Gly Val Ile Gly
Leu Ser Val Pro Phe Thr Pro Arg Arg Pro Ala Arg Pro Thr Ser Met Met Pro Gln Thr Glu
Asp Ala Leu Phe Tyr Gln Leu Tyr Phe Gln Ser Pro Gly Val Ala Glu Ala Glu Phe Glu
Arg Asp Val Arg Leu Ser Ile Arg Ser Leu Leu Tyr Ser Ala Ser Gly Asp Ala Pro Arg Trp
Glu Asn Arg Glu Gly Ala Arg Glu Glu Val Gly Met Val Pro Arg Arg Gly Gly Leu Leu
Ser Arg Leu Met Asn Pro Ala Ser Leu Pro Pro Trp Ile Thr Glu Ala Asp Val Asp Phe Tyr
Val Ser Glu Phe Thr Arg Thr Gly Phe Arg Gly Pro Leu Asn Trp Tyr Arg Asn Ile Asp
Arg Asn Trp Glu Leu Leu Ala Pro Met Ala Ala Thr Thr Val Ser Val Pro Gly Leu Tyr Ile
Ala Gly Asp Arg Asp Leu Val Leu Ala Phe Arg Gly Met Asp GIn Ile Ala Ser Leu Ser
Lys Phe Val Pro Arg Leu Gln Gly Thr Val Val Leu Pro Gly Cys Gly His Trp Thr Gln
Gln Glu Arg Ala Arg Glu Val Thr Lys Ala Met Ile Asp Phe Ala Arg Arg Leu
An example of SEQ ID NO: 21 is
gtgagagtag aggcagacgg cgtcgggatc tcgtacgagg tgaccggaca gggacggccg
60
gtgatcctgc tgcacggctt cccagactcg ggacggcttt ggcgcaacca ggtgccggct
120
ttggctgagg ccggcttcca ggtgatcgtc cctgacctgc gcgggtacgg gcagtccgat
180
aagccagagg ccgtcgatgc gtactccctt ccggccctgg ccggggacgt catggcggta
240
ctggctgatg cgggcgtcga tcgggcccac gtcgtgggcc acgactgggg tgcggcgctc
300
ggctgggtgc tggcctcgct cgtgcccgac cgggtcgatc acctcgccgt tctgtcggtc
360
ggccatcccg cgaccttccg caggacgctg gcacagaacg agaagtcctg gtacatgctt
420
ctcttccagt tcgcgggcat cgccgagcac tggctcagcg acaacgactg ggccaacttc
480
cgcgcctggg cgcggcaccc tgacaccgac cagtcatca gegacetega ggcgaccaag
540
tccctgacgc ctgcgctgaa ctggtatcgc gccaatgtcc cgcccgagtc ctggaccgcg
600
cctccgctgg ctcttcctgc cgtgcccgcg cccgtgatgg ggatctggag caccggcgac
660
atagccctga ccgagaagca gatgacggac tcgcaggaga acgtcagcgg cccgtggcgg
720
tacgagcgga tcgatggccc tggccactgg atgcagctcg aggctccgga gacgatcagc
780
cgcctgctcc tcgactttet ccctgcctag
810
Accordingly, an exemplary pair of primer sequences for amplification are residues 1 to 21 of SEQ ID NO:21 and the complementary strand of the last 21 residues of SEQ ID NO:21.
Example SEQ ID NO: 21 encodes a polypeptide having this sequence
(SEKW. NR ID.: 22)
Met Arg Val Glu Ala Asp Gly Val Gly Ile Ser Tyr Glu Val Thr Gly Gln Gly Arg Pro
Val Ile Leu Leu His Gly Phe Pro Asp Ser Gly Arg Leu Trp Arg Asn Gln Val Pro Ala Leu
Ala Glu Ala Gly Phe Gln Val Ile Val Pro Asp Leu Arg Gly Tyr Gly Gln Ser Asp Lys Pro
Glu Ala Val Asp Ala Tyr Ser Leu Pro Ala Leu Ala Gly Asp Val Met Ala Val Leu Ala
Asp Ala Gly Val Asp Arg Ala His Val Val Gly His Asp Trp Gly Ala Ala Leu Gly Trp
Val Leu Ala Ser Leu Val Pro Asp Arg Val Asp His Leu Ala Val Leu Ser Val Gly His Pro
Ala Thr Phe Arg Arg Thr Leu Ala Gln Asn Glu Lys Ser Trp Tyr Met Leu Leu Phe Gln
Phe Ala Gly Ile Ala Glu His Trp Leu Ser Asp Asn Asp Trp Ala Asn Phe Arg Ala Trp Ala
Arg His Pro Asp Thr Asp Ala Val Ile Ser Asp Leu Glu Ala Thr Lys Ser Leu Thr Pro Ala
Leu Asn Trp Tyr Arg Ala Asn Val Pro Pro Glu Ser Trp Thr Ala Pro Pro Leu Ala Leu Pro
Ala Val Pro Ala Pro Val Met Gly Ile Trp Ser Thr Gly Asp Ile Ala Leu Thr Glu Lys Gln
Met Thr Asp Ser Gln Glu Asn Val Ser Gly Pro Trp Arg Tyr Glu Arg Ile Asp Gly Pro Gly
Jego Trp Met Gln Leu Glu Ala Pro Glu Thr Ile Ser Arg Leu Leu Leu Asp Phe Leu Pro Ala
An example of SEQ ID NO: 23 is
atgaccccga ccgttgcgac aaaaaccagc gaccagcaga cagcggagaa gacagcgatt
60
cggccgtttc gcatcaacgt tcccgacgcg gaactgaccg acctgcgcag gcgcgtcagc
120
gcgacgaggt ggcccgaacg cgagacggtt ccggatcaaa cgcagggcgt gcagctcgcg
180
acggttcaac agcttgcgcg ttattgggcg accgagtacg actggcgtaa gtgcgaggcg
240
aggctgaatg ccctgccgca gttcatcacg gagategatg ggctggatat ccacttcatt
300
cacgtgcgct cgaagcacga tcgcgcgttg ccgctcatcg tcacgcacgg atggcctggc
360
tccatcgtcg agcagctgaa gatcatcgat ccgctcacca atcccacggc ccatggcggc
420
accgcgtccg acgccttcga cgtcgtgatc ccgtcgatgc ccggctacgg gtgttcaggc
480
cggccgtcga ccaccggctg ggacgtcgca cacategege gcgcgtgggt ggtgctcatg
540
aaacgcctcg gctactcgaa gttcgcggcg cagggtggcg attggggcgc gattgtggtc
600
gatcagatgg gcgtccaggc ggctccggaa ttgatcggca ttcacaccaa catgcctggt
660
atetticceg cggacatcga tcaggcggcg tttgccggga wiekeggegec atcgggtctg
720
tcagccgacg agaaagttgc gtacgagcgc ttgctgttcg tgtatcaaaa gggaatcggg
780
tacggatatc agatgggact gcgaccgcag acgctgtacg gaatcgccga ttcacccgtc
840
ggcctggcgg cgtattttct cgatcacgac gcgcgcagtc tcgatctgat ctcgcgcgtc
900
ttcgcgggag cgtccgaggg cctctcacgc gatgacgtcc tcgacaacgt cacgatcgcc
960
tggttgacga acacggggt gtccggcggc cgtctctact gggagaacta tggcaagctc
1020
ggattcttca atgtcaaagg cgtatcgatc ccggtggccg tgagcgtgtt ccccgacgag
1080
ctctatccag cgccgcggag ctggacggag aaggcgtatc cgaaactgat ccacttcaac
1140
aaggtcgaca agggcggaca cttcgcggcc ttcgagcagc cgaagctctt gtccgacgag
1200
attcgcacgg gtctgaagtc tctgcgcacc tga
1233
Accordingly, an exemplary pair of primer sequences for amplification are residues 1 to 21 of SEQ ID NO:23 and the complementary strand of the last 21 residues of SEQ ID NO:23.
Example SEQ ID NO: 23 encodes a polypeptide having this sequence
(SEQ ID NO: 24)
Met Thr Pro Thr Val Ala Thr Lys Thr Ser Asp Gln Gln Thr Ala Glu Lys Thr Ala Ile
Arg Pro Phe Arg Ile Asn Val Pro Asp Ala Glu Leu Thr Asp Leu Arg Arg Arg Val Ser
Ala Thr Arg Trp Pro Glu Arg Glu Thr Val Pro Asp Gln Thr Gln Gly Val Gln Leu Ala
Thr Val Gln Gln Leu Ala Arg Tyr Trp Ala Thr Glu Tyr Asp Trp Arg Lys Cys Glu Ala
Arg Leu Asn Ala Leu Pro Gln Phe Ile Thr Glu Ile Asp Gly Leu Asp Ile His Phe Ile His
Val Arg Ser Lys His Asp Arg Ala Leu Pro Leu Ile Val Thr His Gly Trp Pro Gly Ser Ile
Val Glu Gln Leu Lys Ile Asp Pro Leu Thr Asn Pro Thr Ala His Gly Gly Thr Ala Ser
Asp Ala Phe Asp Val Val Ile Pro Ser Met Pro Gly Tyr Gly Cys Ser Gly Arg Pro Ser Thr
Thr Gly Trp Asp Val Ala His Ile Ala Arg Ala Trp Val Val Leu Met Lys Arg Leu Gly Tyr
Ser Lys Phe Ala Ala Gln Gly Gly Asp Trp Gly Ala Ile Val Val Asp Gln Met Gly Val Gln
Ala Ala Pro Glu Leu Ile Gly Ile His Thr Asn Met Pro Gly Ile Phe Pro Ala Asp Ile Asp
Gln Ala Ala Phe Ala Gly Lys Pro Ala Pro Ser Gly Leu Ser Ala Asp Glu Lys Val Ala Tyr
Glu Arg Leu Leu Phe Val Tyr Gln Lys Gly Ile Gly Tyr Gly Tyr Gln Met Gly Leu Arg
Pro Gln Thr Leu Tyr Gly Ile Ala Asp Ser Pro Val Gly Leu Ala Ala Tyr Phe Leu Asp His
Asp Ala Arg Ser Leu Asp Leu Ile Ser Arg Val Phe Ala Gly Ala Ser Glu Gly Leu Ser Arg
Asp Asp Val Leu Asp Asn Val Thr Ile Ala Trp Leu Thr Asn Thr Gly Val Ser Gly Gly
Arg Leu Tyr Trp Glu Asn Tyr Gly Lys Leu Gly Phe Phe Asn Val Lys Gly Val Ser Ile Pro
Val Ala Val Ser Val Phe Pro Asp Glu Leu Tyr Pro Ala Pro Arg Ser Trp Thr Glu Lys Ala
Tyr Pro Lys Leu Ile His Phe Asn Lys Val Asp Lys Gly Gly His Phe Ala Ala Phe Glu Gln
Pro Lys Leu Leu Ser Asp Glu Ile Arg Thr Gly Leu Lys Ser Leu Arg Thr
An example of SEQ ID NO: 25 is
atgtccgaac cctggaagca tcacgccaaa gttgtcaacg gctttcgtat gcactatgtc
60
attgccggtt ccggctaccc actcgtattt ctgcatggct ggccccagag ttggtatgag
120
tggcgaaaga tcattccggc actcgctgag aagttcacgg taattgcccc ggacctacgc
180
ggattgggag attctgaacg tcctctcaca gggtatgata aacgtaccct ggcctcagat
240
gtgtacgagt tggtgaaatc cctgggcttc agcaaaattg ggctcactgg ccatgactgg
300
ggtggtgccg tagcgttcta ctvtgcttac gatcatccag agatggtcga acgcttgctg
360
attctcgaca tggtgccagg tcggggcgc aaaggtgggt caatggacct tcgccaagca
420
cagcgctatt ggcacgcgtt ctttcacggt ggcatgccag acttagctga aaagctggtc
480
agcgccaacg tcgaagccta cttaagccat ttctacactt cgaccacgta caactacagt
540
ccaaatgtgt teagigeaga agatatagcc gaatacgtgc gcgtatattc cgctccaggg
600
gcgatccgtg ccgggtttca atactatcgt gctgcgttgc aagaagacct tgacaacctc
660
agcagctgca cagaaaaact gaaaatgcct gtgctcgcat ggggaggcga agcattcatg
720
ggcaacgttg taccggtgtg gcagacggtc gccgagaacg tacaaggagg cgagctcaag
780
cagtgtggcc acttcatcgc ggagagaaa cctgagttcg ccactcaaca agcgctggaa
840
ttttcgcgc cgctccgggg agcaaagtag
870
Accordingly, an exemplary pair of primer sequences for amplification are residues 1 to 21 of SEQ ID NO:25 and the complementary strand of the last 21 residues of SEQ ID NO:25.
Example SEQ ID NO: 25 encodes a polypeptide having this sequence
(SEKW. NR ID.: 26)
Met Ser Glu Pro Trp Lys His His Ala Lys Val Val Asn Gly Phe Arg Met His Tyr Val
Ile Ala Gly Ser Gly Tyr Pro Leu Val Phe Leu His Gly Trp Pro Gln Ser Trp Tyr Glu Trp
Arg Lys Ile Pro Ala Leu Ala Glu Lys Phe Thr Val Ile Ala Pro Asp Leu Arg Gly Leu
Gly Asp Ser Glu Arg Pro Leu Thr Gly Tyr Asp Lys Arg Thr Leu Ala Ser Asp Val Tyr
Glu Leu Val Lys Ser Leu Gly Phe Ser Lys Ile Gly Leu Thr Gly His Asp Trp Gly Gly Ala
Val Ala Phe Tyr Phe Ala Tyr Asp His Pro Glu Met Val Glu Arg Leu Leu Ile Leu Asp
Met Val Pro Gly Tyr Gly Arg Lys Gly Gly Ser Met Asp Leu Arg Gln Ala Gln Arg Tyr
Trp His Ala Phe Phe His Gly Gly Met Pro Asp Leu Ala Glu Lys Leu Val Ser Ala Asn
Val Gln Ala Tyr Leu Ser His Phe Tyr Thr Ser Thr Thr Tyr Asn Tyr Ser Pro Asn Val Phe
Ser Ala Gln Asp Ile Ala Gln Tyr Val Arg Val Tyr Ser Ala Pro Gly Ala Ile Arg Ala Gly
Phe Gln Tyr Tyr Arg Ala Ala Leu Gln Glu Asp Leu Asp Asn Leu Ser Ser Cys Thr Gln
Lys Leu Lys Met Pro Val Leu Ala Trp Gly Gly Gln Ala Phe Met Gly Asn Val Val Pro
Val Trp Gln Thr Val Ala Gln Asn Val Gln Gly Gly Gln Leu Lys Gln Cys Gly His Phe Ile
Ala Glu Glu Lys Pro Glu Phe Ala Thr Gln Gln Ala Leu Gln Phe Phe Ala Pro Leu Arg
Gly Ala Lys
An example of SEQ ID NO: 27 is
atgacacgcg actcactcca actcgccgcc gtcgcgttgg ccatggtgct cgccggcgcc
60
ttcgcgattc ccgggtgggc gcaaaccacc gtcggcagcg atgcctcgat ccgtccctcc
120
aagatccaag tgccgcaagc ctcgctcgac gacctgcgcc ggcgtattgc ggcaacgcgc
180
tggcccgaca aggagaccgt cgacaacgca tcccagggcg cgcagcttgc gcagatgcag
240
gagctcgtga ggtactgggg cacgagctac gactggcgca aggccgaggc gaagctcaac
300
gcgttgccgc aattcacgac caacatcgac ggcgtcgaca ttcatttcat ccacgtgcgc
360
tcgcgtcatc ccaatgcgct gcccgtcatc attacgcacg gctggcccgg atcggtgatc
420
gagcagctca agctcatcga tccgctcacg gatccgaccg cgcacggcgg cagcgccgac
480
gacgcgttcg acgtcgtcat tccgtcggtg ccgggctacg ggtttttccgg caagccgacc
540
ggcaccgggt gggatccgga tcgcatcgcg cgcgcgtggg cggagctcat gaaacgcctc
600
ggctacacac gttatgtcgc gcaaggcggc gactggggct cgccgatctc gagcgcgatg
660
gcgcggcagg gagcgccggg gttgctcggt attcacatca acctgcctgc gacggtgccg
720
ccggaagcag ccgccgcgct cgggggtggc ccgctgccgg cagggctttc cgacaaggaa
780
cgcgccgcga tcgacacgct catggcttat gccaaggccg gcaacgcctc gtacttcacg
840
atgttgacgg cgcgcccgca aaccgtcggt tacggcgcga acgactcgcc gacgggcctt
900
geggecigga tcctcgtgca tccgggtttc aggcaatggt cgtacggcgt cgatccgacg
960
gagtcgccga gcaaggacga cgtgctcgac gacatcacgc tgtattggct caccgggacc
1020
gcgacctcgg ccggccggct gtactgggag aacggcgcgc gcggcagcgt catcgtcgcc
1080
gccgcgcaga agaceggega gatctcgctt ccggtcgcga tcacggtgtt tcccgacgac
1140
gtctatcgcg cgccggagac ctgggcgcgg cgcgcgtacc gcaacctcgt ctacttccac
1200
gaagtggaca agggcggaca tttcgcagcg tgggaacagc ccgagctgtt cagcgccgag
1260
ctgcgcgctg cgttcaggcc gctgcgcgag gcgcactga
1299
Accordingly, an exemplary pair of primer sequences for amplification are residues 1 to 21 of SEQ ID NO:27 and the complementary strand of the last 21 residues of SEQ ID NO:27.
Example SEQ ID NO: 27 encodes a polypeptide having this sequence
(SEKW. NR ID.: 28)
Met Thr Arg Asp Ser Leu Gln Leu Ala Ala Val Ala Leu Ala Met Val Leu Ala Gly Ala
Phe Ala Ile Pro Gly Trp Ala Gln Thr Thr Val Gly Ser Asp Ala Ser Ile Arg Pro Phe Lys
Ile Gln Val Pro Gln Ala Ser Leu Asp Asp Leu Arg Arg Arg Ile Ala Ala Thr Arg Trp Pro
Asp Lys Glu Thr Val Asp Asn Ala Ser Gln Gly Ala Gln Leu Ala Gln Met Gln Glu Leu
Val Arg Tyr Trp Gly Thr Ser Tyr Asp Trp Arg Lys Ala Glu Ala Lys Leu Asn Ala Leu
Pro Gln Phe Thr Thr Asn Ile Asp Gly Val Asp Ile Jego Phe Ile His Val Arg Ser Arg His
Pro Asn Ala Leu Pro Val Ile Ile Thr His Gly Trp Pro Gly Ser Val Ile Glu Gln Leu Lys
Leu Ile Asp Pro Leu Thr Asp Pro Thr Ala His Gly Gly Ser Ala Asp Asp Ala Phe Asp Val
Val Ile Pro Ser Val Pro Gly Tyr Gly Phe Ser Gly Lys Pro Thr Gly Thr Gly Trp Asp Pro
Asp Arg Ile Ala Arg Ala Trp Ala Glu Leu Met Lys Arg Leu Gly Tyr Thr Arg Tyr Val
Ala Gln Gly Gly Asp Trp Gly Ser Pro Ile Ser Ser Ala Met Ala Arg Gln Gly Ala Pro Gly
Leu Leu Gly Ile His Ile Asn Leu Pro Ala Thr Val Pro Pro Glu Ala Ala Ala Ala Leu Gly
Gly Gly Pro Leu Pro Ala Gly Leu Ser Asp Lys Glu Arg Ala Ala Ile Asp Thr Leu Met Ala
Tyr Ala Lys Ala Gly Asn Ala Ser Tyr Phe Thr Met Leu Thr Ala Arg Pro Gln Thr Val Gly
Tyr Gly Ala Asn Asp Ser Pro Thr Gly Leu Ala Ala Trp Ile Leu Val His Pro Gly Phe Arg
Gln Trp Ser Tyr Gly Val Asp Pro Thr Glu Ser Pro Ser Lys Asp Asp Val Leu Asp Asp Ile
Thr Leu Tyr Trp Leu Thr Gly Thr Ala Thr Ser Ala Gly Arg Leu Tyr Trp Glu Asn Gly
Ala Arg Gly Ser Val Ile Val Ala Ala Ala Gln Lys Thr Gly Glu Ile Ser Leu Pro Val Ala
Ile Thr Val Phe Pro Asp Asp Val Tyr Arg Ala Pro Glu Thr Trp Ala Arg Arg Ala Tyr Arg
Asn Leu Val Tyr Phe His Glu Val Asp Lys Gly Gly His Phe Ala Ala Trp Glu Gln Pro
Glu Leu Phe Ser Ala Glu Leu Arg Ala Ala Phe Arg Pro Leu Arg Glu Ala His
An example of SEQ ID NO: 29 is
atgcatgaga taaagcatcg cgttgtcgaa acgaatggca tccgcatgca cgtcgctgag
60
tgcggggtgg gtccgcttgt gccctgtgt cacgggtttc ccgagtgttg gtattcgtgg
120
cgccatcagt tgccggccct cgcggaagct ggattccacg tcgtcgcgcc tgacatgcga
180
ggctacggcg agacagaceg gccacaggaa atcgaggagt acacgctcct gcatttagtt
240
ggtgacatga taggtctgct cgacguttg ggtgcagaaa gcgcggtgat cgccggccac
300
gattggggtg ccccggtggc gtggcattct gegettetac gcccagatcg gttccgcgcc
360
gtcatcggct tgagcgtacc gttcaggccg agactccccg tgcgcccgac tagcgtcatg
420
cctcagaccg acgacgcgct cttctaccag cttacttcc aaacttcagg catcgccgag
480
gcggagttcg agcgcgacgt ccggctgagc atccgcagcc tcctctattc ggcttcgggc
540
gatgcgccgc gtcgcgataa caccggaatg cctggtggcg aagtcggaat ggtgccacgc
600
caaggtggtt tcctctcgcg cctgataaat cccgcatcgc taccccactg gctcaccgac
660
gcggacgtag acttctacgt gaaggagttc acgcgcacag gatttcgcgg cggtctgaac
720
tggtaccgca acatcgaccg caattgggag ctcttggcgc ccttcactgc ggcgcgtgtg
780
tccgtccccg cactcttttgt cgccggcgac cgcgatctcg tagtcgcctt tcgtgggatg
840
gaccaactca tccccaatct ggcgaagttt gtcccgcagc tccttggcac cctcatgctc
900
ccaggctgcg gccactggac ccaacaggaa tgtccgcgcg aggtcaatga cgccatgctc
960
gatttccttc gtcggctgta g
981
Accordingly, an exemplary pair of primer sequences for amplification are residues 1 to 21 of SEQ ID NO:29 and the complementary strand of the last 21 residues of SEQ ID NO:29.
Example SEQ ID NO: 29 encodes a polypeptide having this sequence
(SEKW. NR ID.: 30)
Met His Glu Ile Lys His Arg Val Val Glu Thr Asn Gly Ile Arg Met His Val Ala Glu
Cys Gly Val Gly Pro Leu Val Leu Leu Cys His Gly Phe Pro Glu Cys Trp Tyr Ser Trp
Arg His Gln Leu Pro Ala Leu Ala Glu Ala Gly Phe His Val Val Ala Pro Asp Met Arg
Gly Tyr Gly Glu Thr Asp Arg Pro Gln Glu Ile Glu Glu Tyr Thr Leu Leu His Leu Val Gly
Asp Met Ile Gly Leu Leu Asp Val Leu Gly Ala Glu Ser Ala Val Ile Ala Gly His Asp Trp
Gly Ala Pro Val Ala Trp His Ser Ala Leu Leu Arg Pro Asp Arg Phe Arg Ala Val Ile Gly
Leu Ser Val Pro Phe Arg Pro Arg Leu Pro Val Arg Pro Thr Ser Val Met Pro Gln Thr Asp
Asp Ala Leu Phe Tyr Gln Leu Tyr Phe Gln Thr Ser Gly Ile Ala Glu Ala Glu Phe Glu Arg
Asp Val Arg Leu Ser Ile Arg Ser Leu Leu Tyr Ser Ala Ser Gly Asp Ala Pro Arg Arg Asp
Asn Thr Gly Met Pro Gly Gly Glu Val Gly Met Val Pro Arg Gln Gly Gly Phe Leu Ser
Arg Leu Ile Asn Pro Ala Ser Leu Pro His Trp Leu Thr Asp Ala Asp Val Asp Phe Tyr Val
Lys Glu Phe Thr Arg Thr Gly Phe Arg Gly Gly Leu Asn Trp Tyr Arg Asn Ile Asp Arg
Asn Trp Glu Leu Leu Ala Pro Phe Thr Ala Ala Arg Val Ser Val Pro Ala Leu Phe Val Ala
Gly Asp Arg Asp Leu Val Val Ala Phe Arg Gly Met Asp Gln Leu Ile Pro Asn Leu Ala
Lys Phe Val Pro Gln Leu Leu Gly Thr Leu Met Leu Pro Gly Cys Gly His Trp Thr Gln
Gln Glu Cys Pro Arg Glu Val Asn Asp Ala Met Leu Asp Phe Leu Arg Arg Leu
An example of SEQ ID NO: 31 is
atgaagcgta tggttctaaa aacagcaatc gccctgcttg cgtcggatgc agccgagggt
60
ggcgagttcg agtcgcgggt gacgcatggt tacgccgatt cttcgggggt aaaaatccac
120
tatgccagca tgggcaaggg tccactggta gtgatggtcc acggtttccc cgatttctgg
180
tacaccigge gggcacaaat ggaagcactt teegattegi tccaatgtgt tgccatcgac
240
caacgcggat acaatttgag cgacaagccc atcggcgtcg agaactacgg cgtccgcctg
300
ttggtcggag acgtttcggc ggtgataaaa aagctgggca aagaaaaggc gatcctggtt
360
ggacatgact ggggcgggct ggttgcctgg caattcgcgc tcacccaacc gcaaatgacc
420
gageggelca tcattctgaa tttgccgcat cctcggggcc tgctgcgcga gttggcccag
480
aatccgcaac agaagagaa cagccagtat gcacgggact ttcagcaacc cgaggccgcc
540
tcgaaattga cggccgagca gcttgccttc tgggtgaaag atgcggaggc ccggaccaag
600
tacatcgaag cgttcaaacg ctccgatttt gaggcgatgc tcaactatta caagcgcaac
660
tacccgcgcg agccttacac cgaggatact tcgccagtgg taaaggtgca ggtgcctgtt
720
cttatgattc atgggttagg cgacacggct ttgctgcccg gcgcgctcaa caacacgtgg
780
gattggttgg agaaagattt gacgctggtc acgattcctg gcgccggcca cttcgttcaa
840
caggacgccg ctgaattggt gtcgcgctcg atgagagcat ggttgctgcg ctga
894
Accordingly, an exemplary pair of primer sequences for amplification are residues 1 to 21 of SEQ ID NO:31 and the complementary strand of the last 21 residues of SEQ ID NO:31.
Example SEQ ID NO: 31 encodes a polypeptide having this sequence
(SEKW. NR ID.: 32)
Met Lys Arg Met Val Leu Lys Thr Ala Ile Ala Leu Leu Ala Ser Asp Ala Ala Glu Gly
Gly Glu Phe Glu Ser Arg Val Thr His Gly Tyr Ala Asp Ser Ser Gly Val Lys Ile His Tyr
Ala Ser Met Gly Lys Gly Pro Leu Val Val Met Val His Gly Phe Pro Asp Phe Trp Tyr Thr
Trp Arg Ala Gln Met Glu Ala Leu Ser Asp Ser Phe Gln Cys Val Ala Ile Asp Gln Arg Gly
Tyr Asn Leu Ser Asp Lys Pro Ile Gly Vai Glu Asn Tyr Gly Val Arg Leu Leu Val Gly
Asp Val Ser Ala Val Ile Lys Lys Leu Gly Lys Glu Lys Ala Ile Leu Val Gly His Asp Trp
Gly Gly Leu Val Ala Trp Gln Phe Ala Leu Thr Gln Pro Gln Met Thr Glu Arg Leu Ile Ile
Leu Asn Leu Pro His Pro Arg Gly Leu Leu Arg Glu Leu Ala Gln Asn Pro Gln Gln Gln
Lys Asn Ser Gln Tyr Ala Arg Asp Phe Gln Gln Pro Glu Ala Ala Ser Lys Leu Thr Ala
Glu Gln Leu Ala Phe Trp Val Lys Asp Ala Glu Ala Arg Thr Lys Tyr Ile Glu Ala Phe Lys
Arg Ser Asp Phe Glu Ala Met Leu Asn Tyr Tyr Lys Arg Asn Tyr Pro Arg Glu Pro Tyr
Thr Glu Asp Thr Ser Pro Val Val Lys Val Gln Val Pro Val Leu Met Ile His Gly Leu Gly
Asp Thr Ala Leu Leu Pro Gly Ala Leu Asn Asn Thr Trp Asp Trp Leu Glu Lys Asp Leu
Thr Leu Val Thr Ile Pro Gly Ala Gly His Phe Val Gln Gln Asp Ala Ala Glu Leu Val Ser
Arg Ser Met Arg Ala Trp Leu Leu Arg
An example of SEQ ID NO: 33 is
atgcagctcg aaaaagcgca gtacatgccc gccttagcgt catcgcacac ttggcgcagc
60
tttcttcgct acataacagt cgcgtgcttt ttgggcattt tcctgctcgg cgctcagagc
120
tacgcccaga ccgglaggac cgccatcgcg gaggcctccg tcagcagctc gcttcctgcg
180
aagccgcctg cagcgaccga agataaggcg atccgtcctt tccgcgtcca cgtcccacaa
240
gaggcgctcg acgacctcag ccgtcgctc gcggcgacgc gcttgcctga ccaggagacc
300
gtcaacgatc gatcgcaggg caatcagttg gcaacgatga aggaactcgt gcggtattgg
360
cagacaggct acgactggcg caaggcggag cagaaactga acgcattgcc gcagtttgtt
420
acgacgatag acggcctaga catccatttc atccacgtcc gctcgaaaca tcccaacgcg
480
atgccactca ttatcacgca cggctggcct ggatcgatat ttgaattact aaaggttatc
540
ggcccgctta ccgatccgac ggcgttcggc agcggcgcgg aagatgcctt cgacgtcgtg
600
atcccgtcga tgcctggcta tggcttctcc ggcaagccga cggacgccgg ttgggacccc
660
gaacacatcg cgcgagtctg ggcggagctg atgaagcgcc teggatacac ccgctacgtc
720
gcccagggcg gccactgggg ctccccgtc tccagcgcga tggcgcgcca ggcgccggcg
780
ggactgctcg gcatccacgt caacttgccg gcggctatac cgcccgacgt gggcagggcg
840
ctcaacgccg gcgggcccgc gccggcggga ctctccgaga aggagcgcgc ggcgtttgac
900
gcgctcgtca cgttcaacac gaagaacagg gcctactcgg tgatgatggc cacgcggccg
960
cagacgatag gctacgcctt gacggattct ccggcggggc ttgcggcctg gatatatgac
1020
tacaacaacg gcgagcccga gcgctcactg accaaagacg agatgctgga cgacatcacg
1080
ctgtactggc tgacgaacag cgcgacctcg gcggcgcggc tgtactggga gaacagcgga
1140
cgaagccttc tttctgtggc cgcgcagaag accgccgaga tctcgctccc agtggccatc
1200
acggtatttc cgggagagat ctatcgagcc ccggagacgt gggcccggct cgcctatcgc
1260
aacctgatct actttcacga ggtcgacagg ggcggacact tcgcggcctg ggaagagccg
1320
gagcttttct ccgccgagtt gcgcgccgcc ttcagatcac ttcagaaaca gcaatga
1377
Accordingly, an exemplary pair of primer sequences for amplification are residues 1 to 21 of SEQ ID NO:33 and the complementary strand of the last 21 residues of SEQ ID NO:33.
Example SEQ ID NO: 33 encodes a polypeptide having this sequence
(SEKW. NR ID.: 34)
Met Gln Leu Glu Lys Ala Gln Tyr Met Pro Ala Leu Ala Cheese His Thr Trp Arg Cheese
Phe Leu Arg Tyr Ile Thr Val Ala Cys Phe Leu Gly Ile Phe Leu Leu Gly Ala Gln Ser Tyr
Ala Gln Thr Gly Arg Thr Ala Ile Ala Glu Ala Ser Val Ser Ser Ser Leu Pro Ala Lys Pro
Pro Ala Ala Thr Glu Asp Lys Ala Ile Arg Pro Phe Arg Val His Val Pro Gln Glu Ala Leu
Asp Asp Leu Ser Arg Arg Leu Ala Ala Thr Arg Leu Pro Asp Gln Glu Thr Val Asn Asp
Arg Ser Gln Gly Asn Gln Leu Ala Thr Met Lys Glu Leu Val Arg Tyr Trp Gln Thr Gly
Tyr Asp Trp Arg Lys Ala Glu Gln Lys Leu Asn Ala Leu Pro Gln Phe Val Thr Thr Ile
Asp Gly Leu Asp Ile His Phe Ile His Val Arg Ser Lys His Pro Asn Ala Met Pro Leu Ile
Ile Thr His Gly Trp Pro Gly Ser Ile Phe Glu Leu Leu Lys Val Ile Gly Pro Leu Thr Asp
Pro Thr Ala Phe Gly Ser Gly Ala Glu Asp Ala Phe Asp Val Val Ile Pro Ser Met Pro Gly
Tyr Gly Phe Ser Gly Lys Pro Thr Asp Ala Gly Trp Asp Pro Glu His Ile Ala Arg Val Trp
Ala Glu Leu Met Lys Arg Leu Gly Tyr Thr Arg Tyr Val Ala Gln Gly Gly Asp Trp Gly
Ser Pro Val Ser Ser Ala Met Ala Arg Gln Ala Pro Ala Gly Leu Leu Gly Ile His Val Asn
Leu Pro Ala Ala Ile Pro Asp Val Gly Arg Ala Leu Asn Ala Gly Pro Ala Pro Ala
Gly Leu Ser Glu Lys Glu Arg Ala Ala Phe Asp Ala Leu Val Thr Phe Asn Thr Lys Asn
Arg Ala Tyr Ser Val Met Ala Thr Arg Pro Gln Thr Ile Gly Tyr Ala Leu Thr Asp Ser
Pro Ala Gly Leu Ala Ala Trp Ile Tyr Asp Tyr Asn Asn Gly Glu Pro Glu Arg Ser Leu Thr
Lys Asp Glu Met Leu Asp Asp Ile Thr Leu Tyr Trp Leu Thr Asn Ser Ala Thr Ser Ala Ala
Arg Leu Tyr Trp Glu Asn Ser Gly Arg Ser Leu Leu Ser Val Ala Ala Gln Lys Thr Ala Glu
Ile Ser Leu Pro Val Ala Ile Thr Val Phe Pro Gly Glu Ile Tyr Arg Ala Pro Glu Thr Trp
Ala Arg Leu Ala Tyr Arg Asn Leu Ile Tyr Phe His Glu Val Asp Arg Gly Gly His Phe Ala
Ala Trp Glu Glu Pro Glu Leu Phe Ser Ala Glu Leu Arg Ala Ala Phe Arg Ser Leu Gln
Lys Gln Gln
An example of SEQ ID NO: 35 is
atgaacttca ataccgtcga ggtcacaggc cttaagatct tctaccgcga ggccgggaac
60
ccgtcaaagc cggccatcgt cctgctgcac gggttccctt cgtcctcgta ctcattccac
120
gatctcattc cgctcctgtc ggatcgtttt catgtcattg cgccggacta ccccggcatg
180
gggtacagcg aagcgccacc cacgggcgca atgcgcccga ctttcgacga tatggtgaag
240
gccatggaca catttatcgc ccaatgtgcc cctgggccgg tcatcttgta catgcatgac
300
atcggcggcc ccatcggctt gcgaatcgcg gcggcacacc cggagaggat cgcgggcctg
360
atctttcaga acttcacgat ttcgatggag ggttggaacc cggagcgtct caaggtctac
420
gagcggcttg gcggtccgga aaccccggag aatctggccg aaaccgagca attcgcaacc
480
gtagaacgca gtgcgtttct tcataagagg ggcgcgcatc ggcccgaggc cctgaatccg
540
gacagttggg cgattgatgc ctatgccttc tcgatcccgg ccagccgcgc ctttatgtcg
600
agcttgttta tgaatgtcac cagcaacatt ccgcactatc cggaatggca ggcatatctg
660
aaagaccggc agccgagatc gctgatcgtg tgggggcaaa atgacccggt tttctcgccg
720
gcagctccgg aaaccgtcaa gaggctcttg ccggcggcga gggttcattc tttcaacggc
780
ggacacttcg tgctcgacga atacgccgaa ccgatcgccg cggcgatcat cgagacgttt
840
gccggagaca agaaatga
858
Accordingly, an exemplary pair of primer sequences for amplification are residues 1 to 21 of SEQ ID NO:35 and the complementary strand of the last 21 residues of SEQ ID NO:35.
Example SEQ ID NO: 35 encodes a polypeptide having this sequence
(SEKW. NR ID.: 36)
Met Asn Phe Asn Thr Val Glu Val Thr Gly Leu Lys Ile Phe Tyr Arg Glu Ala Gly Asn
Pro Ser Lys Pro Ala Ile Val Leu Leu His Gly Phe Pro Cheese Ser Ser Tyr Ser Phe His Asp
Leu Ile Pro Leu Leu Ser Asp Arg Phe His Val Ile Ala Pro Asp Tyr Pro Gly Met Gly Tyr
Ser Glu Ala Pro Thr Gly Ala Met Arg Pro Thr Phe Asp Asp Met Val Lys Ala Met
Asp Thr Phe Ile Ala Gln Cys Ala Pro Gly Pro Val Ile Leu Tyr Met His Asp Ile Gly Gly
Pro Ile Gly Leu Arg Ile Ala Ala Ala His Pro Glu Arg Ile Ala Gly Leu Ile Phe Gln Asn
Phe Thr Ile Ser Met Gln Gly Trp Asn Pro Glu Arg Leu Lys Val Tyr Gln Arg Leu Gly Gly
Pro Glu Thr Pro Gln Asn Leu Ala Glu Thr Gln Gln Phe Ala Thr Val Glu Arg Ser Ala Phe
Leu His Lys Arg Gly Ala His Arg Pro Gln Ala Leu Asn Pro Asp Ser Trp Ala Ile Asp Ala
Tyr Ala Phe Ser Ile Pro Ala Ser Arg Ala Phe Met Ser Ser Leu Phe Met Asn Val Thr Ser
Asn Ile Pro His Tyr Pro Glu Trp Gln Ala Tyr Leu Lys Asp Arg Gln Pro Arg Ser Leu Ile
Val Trp Gly Gln Asn Asp Pro Val Phe Ser Pro Ala Ala Pro Gln Thr Val Lys Arg Leu Leu
Pro Ala Ala Arg Val His Ser Phe Asn Gly Gly His Phe Val Leu Asp Glu Tyr Ala Glu Pro
Ile Ala Ala Ala Ile Gln Thr Phe Ala Gly Asp Lys Lys
An example of SEQ ID NO: 37 is
atgacccaga cgacaacccg ccctgccatc cgctccttcg aggtctcctt tcccgatgaa
60
gcactcgcgg acctccgccg gcgcttagca gcgacgcgct ggccggagaa agagaccgtc
120
gccgacaact cacaaggcgt cccgctggtc aacatgcagc agctggcccg ctactggggcg
180
gccgaatacg actggcgcaa gacggaggcg aagctcaacg ccttgcccca attectgact
240
gaaatcgacg ggctggggcat tcacttcatt cacgtccgct cgcgccatga gaacgccctg
300
ccgatcatca tcacgcacgg ctggccgggc tcgattatcg agcagctcaa gatcatcgag
360
ccgctcacca acccgaccgc ctctggcggt agcgccgaag acgccttcca cgtggtcatc
420
ccttcgctgc ccggctatgg cttttccggc aagccggcgg cgccggggctg gaacccaatc
480
accatcgcaa ctgcctggac cacactgatg aaacgccttg gctactcccg cttcgtcgcc
540
cagggcggcg actggggcaa cgccgtatcg gagatcatgg ccttgcaggc tcctcccgaa
600
ctggtcggca tccacaccaa catggcggcc accgttccgg ccaacgtcgc gaaggcgctc
660
gcattccacg agggcccgcc ttccggcctt tcgcccgaag agtcctccgc ctggagccag
720
ctggactact tttacaagaa gggcctgggc tacgccctgg agatgaatac ccggccccag
780
accctgtacg ggctggcgga ttcgccggtt ggcctggccg cctggatgct cgaccacgac
840
attcgcagcc aggagctaat cgcccgcgtc tttgacggac agtcggaggg cctatctaaa
900
gaggacgtga tcgagaacgt caccctctac tggctgacga gcaccgcgat ttcctcggcg
960
cgcctctact ggataccgc tcaacttggc ggtggcgggt ttttcgacgt ccgaggtatc
1020
aagattccgg tcgccgtcag cgccttcccg gatgagatct acacgccgcc ccgcagttgg
1080
gccgaggcgg cctacccgaa gctcatccat tacaaccggc tcgacaaagg cggccacttc
1140
gccgcctggg aacaaccgca gctcttctcg tccgagctgc gcgcagcatt tagactttg
1200
cgctag
1206
Accordingly, an exemplary pair of primer sequences for amplification are residues 1 to 21 of SEQ ID NO:37 and the complementary strand of the last 21 residues of SEQ ID NO:37.
Example SEQ ID NO: 37 encodes a polypeptide having this sequence
(SEKW. NR ID.: 38)
Met Thr Gln Thr Thr Thr Arg Pro Ala Ile Arg Ser Phe Glu Val Ser Phe Pro Asp Glu
Ala Leu Ala Asp Leu Arg Arg Arg Leu Ala Ala Thr Arg Trp Pro Glu Lys Gln Thr Val
Ala Asp Asn Ser Gln Gly Val Pro Leu Val Asn Met Gln Gln Leu Ala Arg Tyr Trp Ala
Ala Glu Tyr Asp Trp Arg Lys Thr Glu Ala Lys Leu Asn Ala Leu Pro Gln Phe Leu Thr
Glu Ile Asp Gly Leu Gly Ile His Phe Ile His Val Arg Ser Arg His Gln Asn Ala Leu Pro
Ile Ile Ile Thr His Gly Trp Pro Gly Ser Ile Glu Gln Leu Lys Ile Glu Pro Leu Thr
Asn Pro Thr Ala Ser Gly Gly Ser Ala Glu Asp Ala Phe His Val Val Ile Pro Ser Leu Pro
Gly Tyr Gly Phe Ser Gly Lys Pro Ala Ala Pro Gly Trp Asn Pro Ile Thr Ile Ala Thr Ala
Trp Thr Thr Leu Met Lys Arg Leu Gly Tyr Ser Arg Phe Val Ala Gln Gly Gly Asp Trp
Gly Asn Ala Val Ser Glu Ile Met Ala Leu Gln Ala Pro Gln Leu Val Gly Ile His Thr
Asn Met Ala Ala Thr Val Pro Ala Asn Val Ala Lys Ala Leu Ala Phe His Gln Gly Pro Pro
Ser Gly Leu Ser Pro Gln Gln Ser Ser Ala Trp Ser Gln Leu Asp Tyr Phe Tyr Lys Lys Gly
Leu Gly Tyr Ala Leu Gln Met Asn Thr Arg Pro Gln Thr Leu Tyr Gly Leu Ala Asp Ser
Pro Val Gly Leu Ala Ala Trp Met Leu Asp His Asp Ile Arg Ser Gln Gln Leu Ile Ala Arg
Val Phe Asp Gly Gln Ser Gln Gly Leu Ser Lys Gln Asp Val Ile Gln Asp Val Thr Leu Tyr
Trp Leu Thr Ser Thr Ala Ile Ser Ser Ala Arg Leu Tyr Trp Asp Thr Ala Gln Leu Gly Gly
Gly Gly Phe Phe Asp Val Arg Gly Ile Lys Ile Pro Val Ala Val Ser Ala Phe Pro Asp Glu
Ile Tyr Thr Pro Arg Ser Trp Ala Glu Ala Ala Tyr Pro Lys Leu Ile His Tyr Asn Arg
Leu Asp Lys Gly Gly His Phe Ala Ala Trp Gln Gln Pro Gln Leu Phe Ser Ser Gln Leu
Arg Ala Ala Phe Arg Thr Leu Arg
An example of SEQ ID NO: 39 is
atgacctcag agaaactgca gtacccggcg agaactcaaa cgacccgcct tagcgccgcc
60
gcggcggccg ggcttgcctc gggacttctc gtcttctctt gcccgaatta cggccagacc
120
accaccgate gtgggagcgc gatcgtcgcc caggcgtctg cgcagcgcgc ggcagcggaa
180
gatccatcga tccgcccctt caaggtgcaa atacgcaag ccgcgctcga cgacctgcgc
240
cggcgcatca acgccacgcg ctggcccgac areagaccg tcgccgacga gtcgcagggt
300
gcgcagttgg cgaggctcca ggagctggtt cgctactggg gcagcggcta cgactggcgc
360
aagctgggaag cgaagctgaa tgccctgccg caattcacga cgaccatcga cggtgtcgag
420
attcacttca tccacgtccg ctctcgtcac aagaatgcgc tcccggtgat cgtcacccac
480
gggtggccgg gatccgtcgt cgagcaactc aagatcatcg gcccgctcac ggatccaacc
540
gcccatggcg gcagcgccga ggatgctttc gacgtcgtga tcccgtccct gccaggttac
600
ggcttctccg gcaagccaac cggtaccggc tgggaccccg accgaatcgc gcgagcctgg
660
gcggagctga tgaagcgcct cgggtacacc cgctacgtcg cccagggcgg cgactggggt
720
gcccccatca cgagcgcgat gcccgcggtg aaagcggcgg gattgcaggg tatccacgtc
780
aacctgcccg caacgctgcc gcccgaggtg actgcagcgc tcggcaccgg cgggcctgcg
840
ccggcgggac tctccgagaa ggaaagcgca gtgttcgagg cactgaagaa gtacggcatg
900
acggggaact cggcctactt cacgatgatg acggcgcggc cgcagacggt cggctatggc
960
gcgacggact caccggccgg cctcgcggca tggatcctcg tgcatccagg cttcgcccag
1020
tggagatacg gcgccgatcc aaagcagtcg ccgactaagg acgacgtgct cgacgacatc
1080
acgctgtact ggctgacgaa caccgcggcg tcggcggcgc ggctgtactg ggagaacggc
1140
gcacgaggca gcgtcattgc cgccgcgccg cagaaaacct ccgaaatctc gctgcccgtg
1200
gccattacgg ttttccccgga cgacgtctat cgagccccgg agtcatgggc ccggcgggca
1260
taccccaacc tgacctattt ccacgaggtc gacaagggcg gacatitege cgcgtgggag
1320
cagccggaac tcttcgcggc cgagctgcgc gccgcgttca agccacttcg gggggtgcaa
1380
tga
1383
Accordingly, an exemplary pair of primer sequences for amplification are residues 1 to 21 of SEQ ID NO:39 and the complementary strand of the last 21 residues of SEQ ID NO:39.
Example SEQ ID NO: 39 encodes a polypeptide having this sequence
(SEKW. NR ID.: 40)
Met Thr Ser Glu Lys Leu Gln Tyr Pro Ala Arg Thr Gln Thr Thr Arg Leu Ser Ala Ala
Ala Ala Ala Gly Leu Ala Ser Gly Leu Leu Val Phe Ser Cys Pro Asn Tyr Gly Gln Thr Thr
Thr Asp Arg Gly Ser Ala Ile Val Ala Gln Ala Ser Ala Gln Arg Ala Ala Ala Glu Asp Pro
Ser Ile Arg Pro Phe Lys Val Gln Ile Pro Gln Ala Ala Leu Asp Asp Leu Arg Arg Arg Ile
Asn Ala Thr Arg Trp Pro Asp Lys Glu Thr Val Ala Asp Glu Ser Gln Gly Ala Gln Leu
Ala Arg Leu Gln Glu Leu Val Arg Tyr Trp Gly Ser Gly Tyr Asp Trp Arg Lys Leu Glu
Ala Lys Leu Asn Ala Leu Pro Gln Phe Thr Thr Thr Ile Asp Gly Val Glu Ile His Phe Ile
His Val Arg Ser Arg His Lys Asn Ala Leu Pro Val Ile Val Thr His Gly Trp Pro Gly Ser
Val Val Glu Gln Leu Lys Ile Gly Pro Leu Thr Asp Pro Thr Ala His Gly Gly Ser Ala
Glu Asp Ala Phe Asp Val Val Ile Pro Ser Leu Pro Gly Tyr Gly Phe Ser Gly Lys Pro Thr
Gly Thr Gly Trp Asp Pro Asp Arg Ile Ala Arg Ala Trp Ala Glu Leu Met Lys Arg Leu
Gly Tyr Thr Arg Tyr Val Ala Gln Gly Gly Asp Trp Gly Ala Pro Ile Thr Ser Ala Met Ala
Arg Gln Lys Ala Ala Gly Leu Gln Gly Ile His Val Asn Leu Pro Ala Thr Leu Pro Pro Glu
Val Thr Ala Ala Leu Gly Thr Gly Gly Pro Ala Pro Ala Gly Leu Ser Glu Lys Glu Ser Ala
Val Phe Glu Ala Leu Lys Lys Tyr Gly Met Thr Gly Asn Ser Ala Tyr Phe Thr Met Met
Thr Ala Arg Pro Gln Thr Val Gly Tyr Gly Ala Thr Asp Ser Pro Ala Gly Leu Ala Ala Trp
Ile Leu Val His Pro Gly Phe Ala Gln Trp Arg Tyr Gly Ala Asp Pro Lys Gln Ser Pro Thr
Lys Asp Asp Val Leu Asp Asp Ile Thr Leu Tyr Trp Leu Thr Asn Thr Ala Ala Ser Ala
Ala Arg Leu Tyr Trp Glu Asn Gly Ala Arg Gly Ser Val Ile Ala Ala Ala Pro Gln Lys Thr
Ser Glu Ile Ser Leu Pro Val Ala Ile Thr Val Phe Pro Asp Asp Val Tyr Arg Ala Pro Glu
Ser Trp Ala Arg Arg Ala Tyr Pro Asn Leu Thr Tyr Phe His Glu Val Asp Lys Gly Gly His
Phe Ala Ala Trp Glu Gln Pro Glu Leu Phe Ala Ala Glu Leu Arg Ala Ala Phe Lys Pro
Leu Arg Gly Val Gln
Determining the degree of sequence identity
Izum osigurava nukleinske kiseline i polipeptide koji imaju najmanje 99%, 98%, 97%, 96%, 95%, 90%, 85%, 80%, 75%, 70%, 65%, 60%, 55% ili 50 % % identičnost sekvence (homologija) sa SEQ ID NO:1, SEQ ID NO:2, SEQ ID NO:3, SEQ ID NO:4, SEQ ID NO:5, SEQ ID NO:6, SEQ ID NO:7, SEQ ID NO: 7 ID NO: 8, ID NO: SEQ.: 9, ID BR. SEQ.: 10, ID BR. SEQ.: 11, ID BR. SEQ.: 12, ID BR. SEQ.: 13, ID BR. :16, SEQ ID NO:17, SEQ ID NO:18, SEQ ID NO:19, SEQ ID NO:20. SEQ ID NO: 21, SEQ ID NO: 22, SEQ ID NO: 23, SEQ ID NO: 24, SEQ ID NO: 25, SEQ ID NO: 26, SEQ ID NO: 27, SEQ ID NO: 28, SEQ ID NO: 28 SEQ. ID BR: 29, ID BR. SEQ.: 30, ID BR. SEQ.: 31, ID BR. SEQ.: 32, ID BR. SEQ.: 33, ID BR. SEQ.: 34, ID BR. 37, SEQ ID NO:38, SEQ ID NO:39, SEQ ID NO:40, SEQ ID NO:41, SEQ ID NO:42, SEQ ID NO:43, SEQ ID NO:44, SEQ ID NO:45, SEQ ID NO: 46, SEQ ID NO: 47, SEQ ID NO: 48, SEQ ID NO: 49, SEQ ID NO: 50, SEQ ID NO: 51, SEQ ID NO: 52, SEQ ID NO: 53, SEQ ID NE: 53 . ID BR.: 54, ID BR. SEQ.: 55, ID BR. SEQ.: 56, ID BR. SEQ.: 57, ID BR. SEQ.: 58, ID BR. 62, SEQ ID NO: 63, SEQ ID NO: 64, SEQ ID NO: 65, SEQ ID NO: 66, SEQ ID NO: 67, SEQ ID NO: 68, SEQ ID NO: 69, SEQ ID NO: 70, SEQ ID NO: 71, SEQ ID NO: 72, SEQ ID NO: 73, SEQ ID NO: 74, SEQ ID NO: 75, SEQ ID NO: 76, SEQ ID NO: 77, SEQ ID NO: 78, SEQ ID NO: 78, SEQ ID NO: 78. NO: 79, SEQ ID NO: 80. U alternativnim aspektima, identifikacija sekvence može sadržavati područje od najmanje oko 5, 10, 20, 30, 40, 50, 100, 150, 200, 250, 300, 350, 400, 450, 500, 550, 600, 650 , 700, 750, 800, 850, 900, 950, 1000 ili više uzastopnih ostataka ili pune duljine nukleinske kiseline ili polipeptida.
The range of sequence identity (homology) can be determined using any computer program and related parameters, including those described herein, such as BLAST 2.2.2. or FASTA version 3.0t78, with default parameters.
Homologous sequences also include RNA sequences in which uridines replace thymines in nucleic acid sequences. Homologous sequences may be obtained by any of the procedures described herein or may result from sequencing error correction. It will be appreciated that nucleic acid sequences as shown herein may be represented in the traditional single character format (see, e.g., Stryer, Lubert. Biochemistry, 3rd ed., W.H. Freeman & Co., New York) or any other format that records the identity of the nucleotides in the sequence.
Various sequence comparison programs identified herein are used in this aspect of the invention. Identities (homologies) of protein and/or nucleic acid sequences can be assessed using various sequence comparison algorithms and programs known in the art. Such algorithms and programs include, but are not limited to, TBLASTN, BLASTP, FASTA, TFASTA, and CLUSTALW (Pearson and Lipman, Proc. Natl. Acad. Sci. USA 85(8):2444-2448, 1988; Altschul et al. J . Mol. Biol. 215(3):403-410, 1990; Thompson et al., Nucleic Acids Res. 22(2):4673-4680, 1994; Higgins et al., Methods Enzymol. 266:383-402, 1996; Altschul et al., J. Mol. Biol. 215(3):403-410, 1990; Altschul et al., Nature Genetics 3:266-272, 1993).
Homology or identity can be measured using sequence analysis software (eg, Genetics Computer Group Sequence Analysis Software Package, University of Wisconsin Biotechnology Center, 1710 University Avenue, Madison, Wisconsin 53705). Such software aligns similar sequences by assigning degrees of homology to various deletions, substitutions, and other modifications. The terms "homology" and "identity" in the context of two or more nucleic acid or polypeptide sequences refer to two or more sequences or subsequences that are the same or have a certain percentage of amino acid residues or nucleotides that are the same when compared and aligned for maximum matching within of a comparison window or specific region, measured using any number of sequence comparison algorithms or by manual alignment and visual inspection. For sequence comparison, one sequence can act as a reference sequence (example sequence SEQ ID NO:1, SEQ ID NO:2, SEQ ID NO:3, SEQ ID NO:4, SEQ ID NO:5, SEQ ID NO:6, SEQ ID NO:7, SEQ ID NO:9, SEQ ID NO:10, SEQ ID NO:11, SEQ ID NO:12, SEQ ID NO:13, SEQ ID NO:14, SEQ ID NO:15, ID NO SEQ ID NO: 16, SEQ ID NO: 17, SEQ ID NO: 18, SEQ ID NO: 19, SEQ ID NO: 20, SEQ ID NO: 21, SEQ ID NO: 22, SEQ ID NO: 23, SEQ ID NO: SEQ ID NO: 24, SEQ ID NO: 25, SEQ ID NO: 26, SEQ ID NO: 27, SEQ ID NO: 28, SEQ ID NO: 29, SEQ ID NO: 32, SEQ ID NO: 33, SEQ ID NO:34, SEQ ID NO:35, SEQ ID NO:36, SEQ ID NO:37, SEQ ID NO:38, SEQ ID NO:39, SEQ ID NO:40, ID NO SEQ ID NO:41, SEQ ID NO:42,SEQ ID NO:43,SEQ ID NO:44,SEQ ID NO:45,SEQ ID NO:46,SEQ ID NO:47,SEQ ID NO:48,SEQ ID NO:48 SEQ ID NO : 49, SEQ ID NO: 50, SEQ ID NO: 51, SEQ ID NO: 52, SEQ ID NO: 53, SEQ ID NO: 57, SEQ ID NO: 58, SEQ ID NO: 59, SEQ ID NO : 60 , SEQ ID NO:61, SEQ ID NO:62, SEQ ID NO:63, SEQ ID NO:64, SEQ ID NO:65, SEQ ID NO:66, SEQ ID NO:67, SEQ ID NO:68 , SEQ ID NO:69, SEQ ID NO:70, SEQ ID NO:71, SEQ ID NO:72, SEQ ID NO:73, SEQ ID NO: ID NO:74, ID NO: SEQ ID NO: 75, SEQ ID NO : 76, SEQ ID NO: 77, SEQ ID NO: 78, SEQ ID NO: 79, SEQ ID NO: 80) to which the test sequences are compared. When using the sequence comparison algorithm, the test and reference sequences are entered into the computer, the coordinates of the subsequence are determined, if necessary, and the program parameters of the sequence algorithm are determined. You can use the program's default parameters or specify alternative parameters. The sequence comparison algorithm then calculates the percent sequence identity for the test sequences relative to the reference sequence, based on program parameters.
As used herein, "comparison window" includes reference to a segment of any number of contiguous residues. For example, in alternative aspects of the invention, contiguous residues in the range of 20 to full-length exemplary sequences of SEQ ID NO:1, SEQ ID NO:3, SEQ ID NO:5, SEQ ID NO:7, SEQ ID NO: :9, SEQ ID NO:10, SEQ ID NO:11, SEQ ID NO:12, SEQ ID NO:13, SEQ ID NO:14, SEQ ID NO:15, SEQ ID NO:16, SEQ ID NO:17 SEQ ID NO :: 18, SEQ ID NO: 19, SEQ ID NO: 20, SEQ ID NO: 21, SEQ ID NO: 22, SEQ ID NO: 23, SEQ ID NO: 24, SEQ ID NO: 25, SEQ ID NO: ID NO: 26, ID NO. SEQ.: 27, ID NO. SEQ.: 28, ID NO. SEQ.: 29, ID NO. SEQ.: 30, ID NO. SEQ.: 31, ID NO. :34, SEQ ID NO:35, SEQ ID NO:36, SEQ ID NO:37, SEQ ID NO:38, SEQ ID NO:39, SEQ ID NO:40, SEQ ID NO:41, SEQ ID NO:42 SEQ ID NO: 43, SEQ ID NO: 44, SEQ ID NO: 45, SEQ ID NO: 46, SEQ ID NO: 47, SEQ ID NO: 48, SEQ ID NO: 49, SEQ ID NO: 50, SEQ ID NO: 48, SEQ ID NO: 49, SEQ ID NO: 50, SEQ ID NO: 48 SEQ. ID NO.: 51, ID NO. SEQ.: 52, ID NO. SEQ.: 53, ID NO. SEQ.: 54, ID NO. SEQ.: 55, ID NO. SEQ.: 56, ID NO. :59, SEQ ID NO:60, SEQ ID NO:61, SEQ ID NO:62, SEQ ID NO:63, SEQ ID NO:64, SEQ ID NO:65, SEQ ID NO:66, SEQ ID NO:67 SEQ ID NO: 68, SEQ ID NO: 69, SEQ ID NO: 70, SEQ ID NO: 71, SEQ ID NO: 72, SEQ ID NO: 73, SEQ ID NO: 74, SEQ ID NO: 75, SEQ ID NO: 75, SEQ ID NO: 75 SEQ. ID NO: 76, ID NO. SEQ.: 77, ID NO. SEQ.: 78, ID NO. SEQ.: 79, ID NO. SEQ:80 is compared to a reference sequence of the same number of consecutive positions after the two sequences are optimally aligned. If the reference sequence has the required sequence identity with SEQ ID NO:1, SEQ ID NO:3, SEQ ID NO:5, SEQ ID NO:7, SEQ ID NO:9, eg 50%, 55%, 60%, 65 %, 70%, 75%, 80%, 90% or 95% sequence identity to SEQ ID NO:1, SEQ ID NO:3, SEQ ID NO:5, SEQ ID NO:7, SEQ ID NO:9 ID NO SEQ ID NO: 10, SEQ ID NO: 11, SEQ ID NO: 12, SEQ ID NO: 13, SEQ ID NO: 14, SEQ ID NO: 15, SEQ ID NO: 16, SEQ ID NO: 17, SEQ ID NO: 18, ID NO. SEQ.: 19, ID NO. SEQ.: 20, ID NO. SEQ.: 21, ID NO. SEQ.: 22, ID NO. SEQ.: 23, ID NO. :26, SEQ ID NO:27, SEQ ID NO:28, SEQ ID NO:29, SEQ ID NO:30, SEQ ID NO:31, SEQ ID NO:32, SEQ ID NO:33, SEQ ID NO:34 SEQ ID NO.: 35, SEQ ID NO.: 36, SEQ ID NO.: 37, SEQ ID NO.: 38, SEQ ID NO.: 39, SEQ ID NO.: 40, SEQ ID NO.: 41, SEQ ID NO.: 42, SEQ ID NO. SEQ. ID NO.: 43, ID NO. SEQ.: 44, ID NO. SEQ.: 45, ID NO. SEQ.: 46, ID NO. SEQ.: 47, ID NO. SEQ.: 48, ID NO. :51, SEQ ID NO:52, SEQ ID NO:53, SEQ ID NO:54, SEQ ID NO:55, SEQ ID NO:56, SEQ ID NO:57, SEQ ID NO:58, SEQ ID NO:59 ID NO. SEQ.: 60, ID NO. SEQ.: 61, ID NO. SEQ.: 62, ID NO. SEQ.: 63, ID NO. SEQ.: 64, ID NO. SEQ.: 65, ID NO. ID NO: 68, ID NO. SEQ.: 69, ID NO. SEQ.: 70, ID NO. SEQ.: 71, ID NO. SEQ.: 72, ID NO. SEQ.: 73, ID NO. :76, SEQ ID NO:77, SEQ ID NO:78, SEQ ID NO:79, SEQ ID NO:80, this sequence is within the scope of the invention. In alternative embodiments, subsequences in the range of about 20 to 600, about 50 to 200, and about 100 to 150 are compared to a reference sequence of the same number of consecutive positions after the two sequences are optimally aligned. Methods for aligning sequences for comparison are well known in the art. Optimal sequence alignment for comparison can be performed, for example, by the local homology algorithm of Smith and Waterman, Adv. Application Mathematics 2:482, 1981, by the homology matching algorithm of Needleman & Wunsch, J. Mol. Biol. 48:443, 1970, by the person search method and Lipman, Proc. Nat'l. Acad. Science USA 85:2444, 1988, using computer implementations of these algorithms (GAP, BESTFIT, FASTA, and TFASTA in the Wisconsin Genetics Software Package, Genetics Computer Group, 575 Science Dr., Madison, Wisconsin) or by manual alignment and visualization control. Other algorithms for determining homology or identity include, for example, in addition to BLAST (Basic Local Alignment Search Tool at the National Center for Biological Information), ALIGN, AMAS (Multiply Aligned Sequences Analysis), AMPS (Protein Multiple Sequence Alignment), ASSET for statistical evaluation aligned segments), BANDS, BESTSCOR, BIOSCAN (Biological Sequence Benchmarking Node), BLIMPS (Blocks IMProved Searcher), FASTA, Intervals & Points, BMB, CLUSTAL V, CLUSTAL W, CONSENSUS, LCONSENSUS, WCONSENSUS, Smith Algorithm Waterman, DARWIN, Las Vegas algorithm, FNAT (Forced Nucleotide Alignment Tool), Framealign, Framesearch, DYNAMIC, FILTER, FSAP (Fristensky Sequence Analysis Package), GAP (Global Alignment Program), GENAL, GIBBS, GenQuest, ISSC (Sensitive Sequence Comparison), LALIGN (Local sequence alignment), LCP (Local Content Program), MACAW (Multiple Alignment Construction and Analysis Workbench), MAP (Multiple Alignment Program), MBLKP, MBLKN, PIMA (Pattern-Induced Multi-sequence Alignment), SAGA (sequence alignment using genetic algorithm) and WHAT IF. Such alignment programs can also be used to search genome databases to identify polynucleotide sequences that have substantially identical sequences. Many genome databases are available, for example, much of the human genome is available as part of the Human Genome Sequencing Project (Gibbs, 1995). Several genomes have been sequenced, e.g. M. genitalium (Fraser et al., 1995), M. jannaschii (Bult et al., 1996), H. influenzae (Fleischmann et al., 1995), E. coli ( Blattner et al.). al., 1995), 1995) et al., 1997) and yeast (S. cerevisiae) (Mewes et al., 1997) and D. melanogaster (Adams et al., 2000). Significant progress has also been made in sequencing the genomes of model organisms such as the mouse, C. elegans and Arabadopsis sp. Databases of annotated genomic information with some functional information are maintained by various organizations and are available via the Internet.
BLAST, BLAST 2.0, BLASTBLAST 2.2.2 algorithms are also used to practice the invention. They are described, for example, in Altschul (1977) Nuc. res. 25:3389-3402; Altschul (1990) J. Mol. Biol. 215: 403-410. BLAST analysis software is publicly available through the National Center for Biotechnology Information. This algorithm involves first identifying high-scoring sequence pairs (HSPs) by identifying short words of length W in the query sequence that either match or satisfy some positive threshold score T when matched to words of the same length in the database sequence. T is called the neighbor word score threshold (Altschul (1990) above). These initial neighborhood words act as seeds to initiate searches to find longer HSPs that contain them. Word hits stretch in both directions along each sequence as long as the cumulative alignment score can be increased. Cumulative scores are calculated using, for nucleotide sequences, parameters M (reward score for a pair of matching residues; always > 0). For amino acid sequences, a scoring matrix is used to calculate a cumulative score. The propagation of word hits in each direction stops when: the cumulative alignment score falls by X from the maximum value reached; the cumulative score drops to or below zero due to the accumulation of one or more negative backlogs; or the end of any sequence has been reached. The W, T, and X parameters of the BLAST algorithm determine the sensitivity and speed of the alignment. The BLASTN program (for nucleotide sequences) defaults to a word length (W) of 11, an expected value (E) of 10, M=5, N=-4, and a comparison of both strands. For amino acid sequences, BLASTP defaults to a word length of 3 and an expectation (E) of 10 and a BLOSUM62 score matrix (see Henikoff & Henikoff (1989) Proc. Natl. Acad. Sci. USA 89:10915) alignment (B) of 50, expectation (E) 10, M=5, N=−4 and comparison of both series. The BLAST algorithm also performs a statistical analysis of the similarity between two sequences (see, eg, Karlin & Altschul (1993) Proc. Natl. Acad. Sci. USA 90:5873). One measure of similarity provided by the BLAST algorithm is the minimum sum probability (P(N)), which indicates the probability of a random match between two nucleotide or amino acid sequences. For example, a nucleic acid is considered similar to a reference sequence if the lowest sum of probabilities comparing the test nucleic acid to the reference nucleic acid is less than about 0.2, more preferably less than about 0.01, and most preferably less than about 0.001. In one aspect, protein and nucleic acid sequence homologies are assessed using a primary local match search ("BLAST") tool. For example, five specific BLAST programs can be used to perform the following task: (1) BLASTP and BLAST3 compare a query amino acid sequence to a protein sequence database; (2) BLASTN compares the nucleotide query sequence to a database of nucleotide sequences; (3) BLASTX compares six draft conceptual translation products of the nucleotide sequence query (both strands) to the protein sequence database; (4) TBLASTN compares the protein sequence of interest to a database of translated nucleotide sequences in all six reading frames (both strands); and (5) TBLASTX compares the translations of the six frames of the query nucleotide sequence with the translations of the six frames of the nucleotide sequence database. BLAST programs identify homologous sequences by identifying similar segments, referred to herein as "high-scoring segment pairs," between a test amino acid or nucleic acid sequence and a test sequence, preferably obtained from a protein or nucleic acid sequence database. Pairs of high-scoring segments are preferably identified (ie, matched) by scoring matrices, many of which are known in the art. A preferred scoring matrix is the BLOSUM62 matrix (Gonnet et al., Science 256:1443-1445, 1992; Henikoff and Henikoff, Proteins 17:49-61, 1993). Less preferably, PAM or PAM250 arrays can also be used (see, e.g., Schwartz and Dayhoff, eds., 1978, Matrices for Detecting Distance Relationships: Atlas of Protein Sequence and Structure, Washington: National Foundation for Biomedical Research).
In one aspect of the invention, NCBI BLAST 2.2.2 programs, with the default blastp options, are used to determine whether a nucleic acid has the required sequence identity to be within the scope of the invention. Approximately 38 setting options are available in BLAST 2.2.2. In this exemplary aspect of the invention, all default values are used except for the default filtering setting (ie, all parameters set to default values except for filtering which is set to OFF); the "-F F" setting is used instead, which disables filtering. Using the default filtering often leads to Karlin-Altschul violations due to the short string length.
Default values used in this exemplary aspect of the invention include:
"Low Complexity Filter: ON Word Size: 3 Matrix: Blosum62 Void Costs: Existence: 11 Expansion: 1"
Other defaults are: low complexity filter OFF, word size 3 for protein, BLOSUM62 matrix, gap existence penalty -11 and gap expansion penalty -1.
Examples of NCBI BLAST 2.2.2 settings are shown in Example 1 below. Note that the "-W" option defaults to 0. This means that if not set, the word size defaults to 3 for proteins and 11 for nucleotides.
Motifs detectable with the above programs include sequences encoding leucine closures, helix-turn-helix motifs, glycosylation sites, ubiquitination sites, alpha helices and beta sheets, signal sequences encoding signal peptides that guide the secretion of encoded proteins, sequences involved in regulation transcription, such as homeooxy, acid stretches, enzymatically active sites, substrate binding sites, and enzymatic cleavage sites.
computer systems and computer software products
To determine and identify sequence identities, structural homologies, motifs and the like in silico, the sequence of the invention can be stored, recorded and processed on any computer readable and accessible medium. Accordingly, the invention provides computers, computer systems, computer readable media, computer program products and the like on which the nucleic acid and polypeptide sequences of the invention are recorded or stored. As used herein, the words "recorded" and "stored" refer to the process of storing information on a computer medium. One skilled in the art can readily adopt any of the known methods of recording information on a computer-readable medium to generate articles containing one or more nucleic acid and/or polypeptide sequences of the invention.
Another aspect of the invention is a computer-readable medium on which at least one nucleic acid and/or polypeptide sequence of the invention has been recorded. Computer-readable media include magnetically readable media, optically readable media, electronically readable media, and magnetic/optical media. For example, a computer-readable medium may be a hard disk, floppy disk, magnetic tape, CD-ROM, digital universal disk (DVD), random access memory (RAM), or read-only memory (ROM), as well as other types of other media known experts in the field.
Aspects of the invention include systems (eg, Internet-based systems), especially computer systems, that store and manipulate the sequences and sequence information described herein. One example of computer system 100 is shown in block diagram form in FIG. 8. The term "computer system" as used herein refers to hardware components, software components, and data storage components used to perform nucleotide or polypeptide sequence analysis of the invention. Computer system 100 may include a processor for processing, accessing, and manipulating sequence data. Processor 105 may be any well-known type of central processing unit, such as a Pentium III from Intel Corporation or a similar processor from Sun, Motorola, Compaq, AMD or International Business Machines. Computer system 100 is a general purpose system that includes a processor 105 and one or more internal data storage components 110 and one or more data retrieval devices for retrieving data stored on the data storage components. One skilled in the art can readily determine that any of the currently available computer systems are suitable.
In one aspect, computer system 100 includes a processor 105 connected to a bus that is connected to main memory 115 (preferably implemented as RAM) and one or more internal data storage devices 110, such as a hard disk and/or other computers containing data stored on them. Computer system 100 may further include one or more data retrieval devices 118 for reading data stored on internal data storage devices 110. Data retrieval device 118 may be, for example, a floppy disk drive, compact disc drive, magnetic disk, tape drive, or modem that can be connected to a remote data storage system (e.g., via the Internet), etc. In some embodiments, the internal data storage device 110 is a removable computer-readable medium such as a floppy disk, compact disk, magnetic tape, etc. containing control logic and/or data stored on it. Computer system 100 may preferably include or be programmed with appropriate software to read control logic and/or data from the data storage component when inserted into the data retrieval device. Computer system 100 includes a display 120 that is used to display output to a computer user. It should also be noted that computer system 100 may be connected to other computer systems 125a-c in a network or wide area network to provide centralized access to computer system 100. Software for accessing and processing nucleotide sequences or amino acid sequences of the invention may reside in main memory 115 during execution. . In some aspects, computer system 100 may further include a sequence comparison algorithm for comparing nucleic acid sequences of the invention. The algorithm and sequences can be stored on a computer-readable medium. "Sequence comparison algorithm" refers to one or more programs executed (locally or remotely) on computer system 100 to compare a nucleotide sequence to other nucleotide sequences and/or compounds stored in the data storage means. For example, a sequence comparison algorithm can compare nucleotide sequences of the invention stored in a computer-readable medium with reference sequences stored in a computer-readable medium to identify homologies or structural motifs.
The parameters used with the above algorithms can be adjusted depending on the length of the sequence and the degree of homology being studied. In some aspects, the parameters may be default parameters used by the algorithms in the absence of instructions from the user. FIG. 9 is a flow diagram illustrating one aspect of a process 200 for comparing a new nucleotide or protein sequence to a sequence database to determine levels of homology between the new sequence and sequences in the database. The sequence database can be a private database stored in the computer system 100 or a public database such as GENBANK that can be accessed via the Internet. The process 200 begins in an initial state 201 and then proceeds to a state 202 where a new string to be compared is stored in the memory of the computer system 100. As discussed above, the memory can be of any type, including RAM or an internal storage device. Process 200 then continues to state 204 where the sequence database is opened for analysis and comparison. Process 200 then proceeds to state 206 where the first string stored in the database is loaded into computer memory. A comparison is then performed in state 210 to determine if the first sequence is the same as the second sequence. Note that this step is not limited to performing an exact comparison of the new string against the first string in the database. Those skilled in the art are familiar with well-known methods for comparing two nucleotide or protein sequences, even if they are not identical. For example, gaps can be introduced into a single sequence to increase the level of homology between two test sequences. Parameters that control whether blanks or other features are introduced during a string comparison are usually entered by the user of the computer system. After comparing the two sequences in state 210, a decision state 210 determines whether the two sequences are the same. Of course, the term "same" is not limited to sequences that are absolutely identical. Sequences that fall within the homology parameters entered by the user will be marked as "same" by process 200. If two sequences are determined to be the same, process 200 moves to state 214 where the name of the database sequence is displayed to the user. This condition informs the user that the sequence with the displayed name satisfies the homology constraints entered. After displaying the name of the stored sequence to the user, process 200 moves to a decision state 218 in which it is determined whether there are more sequences in the database. If there are no more strings in the database, process 200 exits to final state 220. However, if there are more strings in the database, process 200 continues to state 224 where the pointer is moved to the next string in the database so that it can be read.compare to with a new sequence. In this way, the new sequence is matched and compared with every sequence in the database. Note that if it is determined in decision state 212 that the sequences are not homologous, then process 200 would immediately proceed to decision state 218 to determine if any other sequences are available for comparison in the database. Accordingly, one aspect of the invention is a computer system comprising a processor, a data storage device in which the nucleic acid sequence of the invention is stored, and a sequence comparison module to perform the comparison. The sequence comparison tool may indicate the level of homology between the compared sequences or identify structural motifs, or may identify structural motifs in the sequences being compared to these nucleic acid codes and polypeptide codes. FIG. 10 is a flow diagram illustrating one embodiment of a process 250 in a computer for determining whether two sequences are homologous. Process 250 begins in an initial state 252 and then proceeds to state 254 where the first string to be compared is stored in memory. The second string to be compared is then stored in memory in state 256. Process 250 then transitions to state 260 in which the first character in the first string is read, and then to state 262 in which the first character in the second string is read. It is understood that if the sequence is a nucleotide sequence, the character is usually A, T, C, G or U. If the sequence is a protein sequence, it can be a single letter amino acid code so that the first sequence and sequence can be easily compared. Then, in decision state 264, it is determined whether the two characters are the same. If they are the same, then process 250 continues to state 268 where consecutive characters in the first and second strings are read. It is then determined whether consecutive characters are the same. If so, process 250 continues this loop until the two characters are the same. If the next two characters are determined not to be the same, process 250 continues to decision state 274 to determine if there is any more string of characters to read. If there are no more characters to read, then process 250 proceeds to state 276 where the level of homology between the first and second sequences is displayed to the user. The level of homology was determined by calculating the proportion of characters between strings that were the same out of the total number of strings in the first string. Therefore, if every character in the first sequence of 100 nucleotides matched every character in the second sequence, the level of homology would be 100%.
Alternatively, a computer program can compare a reference sequence to a sequence of the invention to determine whether the sequences differ at one or more positions. The program can record the length and identity of inserted, deleted or substituted nucleotides or amino acid residues relative to the reference sequence or invention. The computer program may be a program that determines whether the reference sequence contains a single nucleotide polymorphism (SNP) relative to the sequence of the invention or whether the sequence of the invention contains SNPs of known sequences. Therefore, in some aspects, the computer program is a program that identifies a SNP. The method can be implemented using the computer systems described above and the method shown in FIG. 10. The method can be carried out by reading the sequences of the invention and the reference sequences using a computer program and identifying the differences using the computer program.
In other aspects, the computer-based system comprises an identifier for identifying features within a nucleic acid or polypeptide of the invention. "Identifier" refers to one or more programs that identify specific features in a nucleic acid sequence. For example, the identifier may include a program that identifies an open reading frame (ORF) in a nucleic acid sequence. FIG. 11 is a flow diagram illustrating one aspect of an identifier process 300 for detecting the presence of a feature in an array. Process 300 begins in initial state 302 and then proceeds to state 304 where the first sequence to be feature checked is stored in memory 115 in computer system 100. Process 300 then proceeds to state 306 where the sequence feature database is opened. Such a database would list the attributes of each feature along with the feature name. For example, the feature name could be "initiation codon" and the attribute "ATG". Another example would be the function name "TAATAA Box" and the function attribute is "TAATAA". An example of such a database is being created by the Genetics Computer Group at the University of Wisconsin. Alternatively, the features may be structural polypeptide motifs, such as alpha helices, beta sheets, or functional polypeptide motifs, such as enzymatic active sites, helix-turn-helix motifs, or other motifs known to those skilled in the art. Once the feature database is opened in state 306, process 300 moves to state 308 where the first feature is read from the database. The first feature attribute is then compared to the first sequence in state 310. Then, in decision state 316, it is determined whether the feature attribute is found in the first sequence. If the attribute is found, then process 300 moves to state 318 where the name of the found feature is displayed to the user. Process 300 then proceeds to a decision state 320 in which it is determined whether transfer functions exist in the database. If there are no more features, process 300 ends in final state 324. However, if there are more features in the database, then process 300 reads the next array of functions in state 326 and returns to state 310 where the attribute of the next feature is compared to the first array. If the feature attribute is not found in the first sequence in decision state 316 , process 300 proceeds directly to decision state 320 to determine if there are any more features in the database. Therefore, in one aspect, the invention provides a computer program that identifies open reading frames (ORFs).
A polypeptide or nucleic acid sequence of the invention can be stored and processed in various data processing programs in various formats. For example, the sequence may be stored as text in a text editor file, such as MicrosoftWORD or WORDPERFECT, or as an ASCII file in various database programs known to those skilled in the art, such as DB2, SYBASE, or ORACLE. Additionally, many computer programs and databases can be used as sequence comparison algorithms, identifiers, or sources of reference nucleotide sequences or polypeptide sequences for comparison with the nucleic acid sequence of the invention. Programs and databases used to implement the invention include, but are not limited to: MacPattern (EMBL), DiscoveryBase (Molecular Applications Group), GeneMine (Molecular Applications Group), Look (Molecular Applications Group), MacLook (Molecular Applications Group), BLAST and BLAST2 (NCBI), BLASTN and BLASTX (Altschul et al., J. Mol. Biol. 215: 403, 1990), FASTA (Pearson and Lipman, Proc. Natl. Acad. Sci. USA, 85: 2444, 1988 ), FASTDB (Brutlag et al. Comp. App. Biosci. 6:237-245, 1990), Catalyst (Molecular Simulations Inc.), Catalyst/SHAPE (Molecular Simulations Inc.), Cerius2.DBAccess (Molecular Simulations Inc.) , HypoGen (Molecular Simulations Inc.), Inc.), Insight II (Molecular Simulations Inc.), Discover (Molecular Simulations Inc.), CHARMm (Molecular Simulations Inc.), Felix (Molecular Simulations Inc.), DelPhi (Molecular Simulations Inc.) .), QuanteMM, (Molecular Simulations Inc.) Inc.), Homology (Molecular Simulations Inc.), Modeler (Molecular Simulations Inc.), ISIS (Molecular Simulations Inc.), Quanta/Protein Design (Molecular Simulations Inc. .), WebLab (Molecular Simulations Inc.) .), WebLab Diversity Explorer (Molecular Simulations Inc.), Gene Explorer (Molecular Simulations Inc.), SeqFold (Molecular Simulations Inc.), MDL Available Chemicals Directory Database, MDL Drug Data Report Database, Comprehensive Medicinal Chemistry Database, Derwent Index of World Drug Databases, BioByteMasterFile Database, Genbank Database and Genseqn Database. Many other programs and databases will be apparent to those skilled in the art in light of this disclosure.
Motifs detectable with the above programs include sequences encoding leucine closures, helix-turn-helix motifs, glycosylation sites, ubiquitination sites, alpha helices and beta sheets, signal sequences encoding signal peptides that guide the secretion of encoded proteins, sequences involved in regulation transcription, such as homeooxy, acid stretches, enzymatically active sites, substrate binding sites, and enzymatic cleavage sites.
Hybridization of nucleic acids
Izum osigurava izolirane ili rekombinantne nukleinske kiseline koje hibridiziraju pod strogim uvjetima u egzemplarnu sekvencu iz izuma, npr. SEQ ID NO:1, SEQ ID NO:3, SEQ ID NO:5, SEQ ID NO:7, SEQ ID NO :: 9, SEQ ID NO:10, SEQ ID NO:11, SEQ ID NO:12, SEQ ID NO:13, SEQ ID NO:14, SEQ ID NO:15, SEQ ID NO:16, SEQ ID NO: 17 BR. SEQ ID NO: 18, SEQ ID NO: 19, SEQ ID NO: 20, SEQ ID NO: 21, SEQ ID NO: 22, SEQ ID NO: 23, SEQ ID NO: 24, SEQ ID NO: 25 SEQ ID NO: 25. ID BR: 26, ID BR. SEQ.: 27, ID BR. SEQ.: 28, ID BR. SEQ.: 29, ID BR. SEQ.: 30, ID BR. SEQ.: 31, ID BR. :34, SEQ ID NO:35, SEQ ID NO:36, SEQ ID NO:37, SEQ ID NO:38, SEQ ID NO:39, SEQ ID NO:40, SEQ ID NO:41, SEQ ID NO:42 SEQ ID NO: 43, SEQ ID NO: 44, SEQ ID NO: 45, SEQ ID NO: 46, SEQ ID NO: 47, SEQ ID NO: 48, SEQ ID NO: 49, SEQ ID NO: 50, SEQ ID NO: 48, SEQ ID NO: 49, SEQ ID NO: 50, SEQ ID NO: 48 SEQ. ID BR.: 51, ID BR. SEQ.: 52, ID BR. SEQ.: 53, ID BR. SEQ.: 54, ID BR. SEQ.: 55, ID BR. SEQ.: 56, ID BR. :59, SEQ ID NO:60, SEQ ID NO:61, SEQ ID NO:62, SEQ ID NO:63, SEQ ID NO:64, SEQ ID NO:65, SEQ ID NO:66, SEQ ID NO:67 SEQ ID NO: 68, SEQ ID NO: 69, SEQ ID NO: 70, SEQ ID NO: 71, SEQ ID NO: 72, SEQ ID NO: 73, SEQ ID NO: 74, SEQ ID NO: 75, SEQ ID NO: 75, SEQ ID NO: 75 SEQ. ID BR: 76, ID BR. SEQ.: 77, ID BR. uvjeti strogosti opisani u ovom dokumentu. U alternativnim izvedbama, nukleinske kiseline izuma, kako je definirano njihovom sposobnošću hibridizacije pod strogim uvjetima, mogu sadržavati od oko pet ostataka do pune duljine sekvenci izuma; npr. može ih biti najmanje 5, 10, 15, 20, 25, 30, 35, 40, 50, 55, 60, 65, 70, 75, 80, 90, 100, 150, 200, 250, 300, 350, duljina 400 ostataka Uključene su i nukleinske kiseline kraće od pune duljine. Ove nukleinske kiseline su korisne kao npr. hibridizacijske sonde, sonde za obilježavanje, PCR oligonukleotidne sonde, iRNA, antisense ili antitijelo-vezujući peptid (epitop) kodirajuće sekvence, motive, aktivna mjesta i slično.
In nucleic acid hybridization reactions, the conditions used to achieve a particular level of stringency will vary depending on the nature of the nucleic acids being hybridized. For example, the length, degree of complementarity, nucleotide sequence composition (eg, GC vs. AT content), and nucleic acid type (eg, RNA vs. DNA) of the hybridizing nucleic acid regions can be considered when selecting hybridization conditions. An additional consideration is whether one of the nucleic acids is immobilized, for example on a filter.
Hybridization can be performed under conditions of low stringency, moderate stringency or high stringency. As an example of nucleic acid hybridization, a polymer membrane containing immobilized denatured nucleic acids is first prehybridized for 30 minutes at 45°C in a solution consisting of 0.9 M NaCl, 50 mM NaH2PO4, pH 7.0, 5.0 mM Na2EDTA , 0.5% SDS, 10x Denhardt and 0.5 mg/ml polyriboadenylic acid. Approximately 2 x 107 cpm (specific activity 4-9 x 108 cpm/µg) of 32 P-labeled oligonucleotide probe is then added to the solution. After 12-16 hours of incubation, the membrane is washed for 30 minutes at room temperature in 1×SET (150 mM NaCl, 20 mM Tris hydrochloride, pH 7.8, 1 mM Na2EDTA) containing 0.5% SDS, then washed for 30 minutes into fresh 1×SET at Tm-10°C for oligonucleotide probe. The membrane is then exposed to autoradiographic film to detect hybridization signals.
By varying the stringency of the hybridization conditions used to identify nucleic acids, such as cDNA or genomic DNA, that hybridize to a detectable probe, nucleic acids having different levels of homology to the probe can be identified and isolated. The stringency can be varied by hybridization at different temperatures below the melting point of the probes. The melting point, Tₘ, is the temperature (under defined ionic strength and pH) at which 50% of the target sequence hybridizes to a perfectly complementary probe. Very stringent conditions were chosen to be equal to or about 5°C lower than Tₘ for a particular probe. The melting point of the probe can be calculated using the following formulas:
For probes from 14 to 70 nucleotides, the melting point (Tₘ) is calculated as follows: Tₘ=81.5+16.6(log [Na+])+0.41(G+C fraction)−(600/N) where N is the length of the probe.
If the hybridization is performed in a solution containing formamide, the melting point can be calculated using the equation: Tₘ=81.5+16.6(log [Na+])+0.41(G+C fraction)−(0.63% formamide) −(600 /N ) where N is the length of the probe.
Prehybridization can be performed in 6×SSC, 5×Denhardt's reagent, 0.5% SDS, 100 μg denatured fragmented salmon sperm DNA or 6×SSC, 5×Denhardt's reagent, 0.5% SDS, 100 μg denatured fragmented salmon sperm DNA , 50% formamide. The formulas of SSC and Denhardt's solution are given in Sambrook et al., supra.
Hybridization is performed by adding a detectable probe to the previously mentioned pre-hybridization solutions. When the probe contains double-stranded DNA, it is denatured before being added to the hybridization solution. The filter is brought into contact with the hybridization solution long enough for the probe to hybridize with cDNA or genomic DNA containing sequences that are complementary or homologous to it. For probes longer than 200 nucleotides, hybridization can be performed at 15-25°C below T2. For shorter probes, such as oligonucleotide probes, hybridization can be performed at 5-10°C below T2. Typically, for hybridization in 6xSSC, hybridization is performed at about 68°C. Typically, for hybridization in solutions containing 50% formamide, hybridization is performed at about 42°C.
All of the above hybridizations would be considered to be performed under very strict conditions.
After hybridization, the filter is washed to remove nonspecifically bound detectable probe. The stringency used to wash the filters can also vary depending on the nature of the hybridized nucleic acids, the length of the hybridized nucleic acids, the degree of complementarity, the composition of the nucleotide sequences (eg, GC vs. AT content), and the type of nucleic acid (eg, RNA v. DNA). Examples of washing under increasingly stringent conditions are as follows: 2 x SSC, 0.1% SDS at room temperature for 15 minutes (low stringency); 0.1 x SSC, 0.5% SDS at room temperature for 30 minutes to 1 hour (moderate weight); 0.1 x SSC, 0.5% SDS for 15 to 30 minutes between hybridization temperature and 68°C (high stringency); and 0.15 M NaCl for 15 min at 72°C (very stringent conditions). A final low stringency wash can be performed in 0.1 x SSC at room temperature. The above examples are only illustrative of one set of conditions that can be used to wash filters. A person skilled in the art knows that there are many washing recipes with varying degrees of severity. Some other examples are listed below.
Nucleic acids that have hybridized to the probe are identified by autoradiography or other conventional techniques.
The above procedure can be modified to identify nucleic acids with decreasing levels of homology to the probe sequence. For example, less stringent conditions can be used to obtain nucleic acids with reduced homology to the detectable probe. For example, the annealing temperature can be decreased in 5°C steps from 68°C to 42°C in about 1M Na+ hybridization buffer. After hybridization, the filter can be washed with 2 x SSC, 0.5% SDS at the hybridization temperature. These conditions are considered "moderate" conditions above 50°C and "low" conditions below 50°C. A specific example of "moderate" hybridization conditions is where the above hybridization is performed at 55°C. "Low stringency" hybridization conditions are when the above hybridization is performed at 45°C.
Alternatively, hybridization can be performed in buffers such as 6xSSC containing formamide at 42°C. In this case, the concentration of formamide in the hybridization buffer can be decreased in 5% increments from 50% to 0% to identify clones having decreasing levels of homology to the probe. After hybridization, the filter can be washed 6 x SSC, 0.5% SDS at 50°C. These conditions are considered "moderate" conditions above 25% formamide and "low" conditions below 25% formamide. A specific example of "moderate" hybridization conditions is when the above hybridization is performed with 30% formamide. A specific example of "low stringency" hybridization conditions is when the above hybridization is performed with 10% formamide.
For example, the above methods can be used to isolate nucleic acids having a sequence of at least about 97%, at least 95%, at least 90%, at least 85%, at least 80%, at least 75%, at least 70%, at least 65%, at least 60%, at least 55%, or at least 50% homology with the nucleic acid sequence of the invention, or fragments containing at least about 10, 15, 20, 25, 30, 35, 40, 50, 75, 100, 150, 200, 300 , 400 or 500 consecutive bases and their complementary sequences. Homology can be measured using a matching algorithm. For example, homologous polynucleotides may have a coding sequence that is a natural allelic variant of one of the coding sequences described herein. Such allelic variants may have a substitution, deletion or addition of one or more nucleotides compared to the nucleic acids of the invention or their complements.
However, the choice of hybridization format is not critical - it is the stringency of the washing conditions that determine the conditions that determine whether a nucleic acid is within the scope of the invention. Elution conditions used to identify nucleic acids within the scope of the invention include, for example, a salt concentration of about 0.02 mol at pH 7 and a temperature of at least about 50°C, or from about 55°C to about 60°C. ; or a salt concentration of about 0.15 M NaCl at 72°C for about 15 minutes; or a salt concentration of about 0.2 x SSC at a temperature of at least about 50°C or about 55°C to about 60°C for about 15 to about 20 minutes; or the hybridization complex is washed twice with a salt concentration of about 2xSSC containing 0.1% SDS at room temperature for 15 minutes followed by two washes with 0.1xSSC containing 0.1% SDS at 68°C for 15 minutes; or equivalent conditions. See Sambrook, Tijssen, and Ausubel for a description of SSC buffers and equivalent expressions.
Probes derived from sequences near the 3' or 5' ends of the nucleic acid sequences of the invention can also be used in chromosome walking procedures to identify clones containing additional, e.g. genomic sequences. Such methods enable the isolation of genes encoding additional proteins of interest from the host organism.
In one embodiment, the nucleic acid sequences of the invention are used as probes to identify and isolate related nucleic acids.
In some aspects, the cognate nucleic acids so identified may be cDNA or genomic DNA from organisms other than those from which the nucleic acid of the invention was first isolated. In such procedures, a nucleic acid sample is contacted with a probe under conditions that allow the probe to hybridize specifically with cognate sequences. Hybridization of the probe with nucleic acids from a related organism is then detected by any of the methods described above.
In nucleic acid hybridization reactions, the conditions used to achieve a particular level of stringency will vary depending on the nature of the nucleic acids being hybridized. For example, the length, degree of complementarity, nucleotide sequence composition (eg, GC vs. AT content), and nucleic acid type (eg, RNA vs. DNA) of the hybridizing nucleic acid regions can be considered when selecting hybridization conditions. An additional consideration is whether one of the nucleic acids is immobilized, for example on a filter. Hybridization can be performed under conditions of low stringency, moderate stringency or high stringency. As an example of nucleic acid hybridization, a polymer membrane containing immobilized denatured nucleic acids is first prehybridized for 30 minutes at 45°C in a solution consisting of 0.9 M NaCl, 50 mM NaH2PO4, pH 7.0, 5.0 mM Na2EDTA, 0.5% SDS, 10x Denhardt and 0.5 mg/ml polyriboadenylic acid. About 2 x 107 cpm (specific activity 4-9 x 108 cpm/µg) 32 P-terminally labeled oligonucleotide probe was then added to the solution. After 12-16 hours of incubation, the membrane is washed for 30 minutes at room temperature (RT) in 1×SET (150 mM NaCl, 20 mM Tris hydrochloride, pH 7.8, 1 mM Na2EDTA) containing 0.5% SDS, and then a 30 minute wash in fresh 1×SET at Tm-10°C for the oligonucleotide probe. The membrane is then exposed to autoradiographic film to detect hybridization signals.
By varying the stringency of the hybridization conditions used to identify nucleic acids, such as cDNA or genomic DNA, that hybridize to a detectable probe, nucleic acids having different levels of homology to the probe can be identified and isolated. The stringency can be varied by hybridization at different temperatures below the melting point of the probes. The melting point, Tm, is the temperature (under defined ionic strength and pH) at which 50% of the target sequence hybridizes to a perfectly complementary probe. Very stringent conditions were chosen that were equal to or about 5°C lower than the Tm for a particular probe. The melting point of the probe can be calculated using the following example formulas. For probes from 14 to 70 nucleotides, the melting point (Tm) is calculated from the formula: Tm=81.5+16.6(log [Na+])+0.41(G+C fraction)−(600/N) where is N the length of the probe. If the hybridization is performed in a solution containing formamide, the melting point can be calculated according to the formula: Tm=81.5+16.6(log [Na+])+0.41 (G+C fraction)−(0.63% formamide) − (600 /N ) where N is the length of the probe. Prehybridization can be performed in 6×SSC, 5×Denhardt's reagent, 0.5% SDS, 100 μg denatured fragmented salmon sperm DNA or 6×SSC, 5×Denhardt's reagent, 0.5% SDS, 100 μg denatured fragmented salmon sperm DNA , 50% formamide. Formulas for SSC and Denhardt and other solutions are given, for example, in Sambrook.
Hybridization is performed by adding a detectable probe to the previously mentioned pre-hybridization solutions. When the probe contains double-stranded DNA, it is denatured before being added to the hybridization solution. The filter is brought into contact with the hybridization solution long enough for the probe to hybridize with cDNA or genomic DNA containing sequences that are complementary or homologous to it. For probes longer than 200 nucleotides, hybridization can be performed at 15-25°C below Tm. For shorter probes, such as oligonucleotide probes, hybridization can be performed at 5-10°C below the Tm. In one embodiment, hybridizations in 6xSSC are performed at about 68°C. In one embodiment, hybridizations in 50% solutions containing formamide are performed at about 42°C. All the hybridizations mentioned above can be considered to have been performed under very strict conditions.
After hybridization, the filter is washed to remove nonspecifically bound detectable probe. The stringency used to wash the filters can also vary depending on the nature of the hybridized nucleic acids, the length of the hybridized nucleic acids, the degree of complementarity, the composition of the nucleotide sequences (eg, GC vs. AT content), and the type of nucleic acid (eg, RNA v. DNA). Examples of washing under increasingly stringent conditions are as follows: 2 x SSC, 0.1% SDS at room temperature for 15 minutes (low stringency); 0.1 x SSC, 0.5% SDS at room temperature for 30 minutes to 1 hour (moderate weight); 0.1 x SSC, 0.5% SDS for 15 to 30 minutes between hybridization temperature and 68°C (high stringency); and 0.15 M NaCl for 15 min at 72°C (very stringent conditions). A final low stringency wash can be performed in 0.1 x SSC at room temperature. The above examples are only illustrative of one set of conditions that can be used to wash filters. A person skilled in the art knows that there are many washing recipes with varying degrees of severity.
Nucleic acids that have hybridized to the probe can be identified by autoradiography or other conventional techniques. The above procedure can be modified to identify nucleic acids with decreasing levels of homology to the probe sequence. For example, less stringent conditions can be used to obtain nucleic acids with reduced homology to the detectable probe. For example, the annealing temperature can be decreased in 5°C steps from 68°C to 42°C in about 1M Na+ hybridization buffer. After hybridization, the filter can be washed with 2 x SSC, 0.5% SDS at the hybridization temperature. These conditions are considered "moderate" conditions above 50°C and "low" conditions below 50°C. An example of "moderate" hybridization conditions is when the above hybridization is performed at 55°C. An example of "low" hybridization conditions. Stringent hybridization conditions are when the above hybridization is performed at 45°C.
Alternatively, hybridization can be performed in buffers such as 6xSSC containing formamide at 42°C. In this case, the concentration of formamide in the hybridization buffer can be decreased in 5% increments from 50% to 0% to identify clones having decreasing levels of homology to the probe. After hybridization, the filter can be washed 6 x SSC, 0.5% SDS at 50°C. These conditions are considered "moderate" conditions above 25% formamide and "low" conditions below 25% formamide. A specific example of "moderate" hybridization conditions is when the above hybridization is performed with 30% formamide. A specific example of "low stringency" hybridization conditions is when the above hybridization is performed with 10% formamide.
These probes and methods of the invention can be used to isolate nucleic acids having a sequence of at least about 99%, 98%, 97%, at least 95%, at least 90%, at least 85%, at least 80%, at least 75%, at least 70% , at least 65%, at least 60%, at least 55%, or at least 50% homology with a nucleic acid sequence of the invention containing at least about 10, 15, 20, 25, 30, 35, 40, 50, 75, 100, 150, 200, 250, 300, 350, 400, 500, 550, 600, 650, 700, 750, 800, 850, 900, 950, 1000 or more consecutive bases and sequences complementary to it. Homology can be measured using a matching algorithm as explained here. For example, homologous polynucleotides may have a coding sequence that is a natural allelic variant of one of the coding sequences described herein. Such allelic variants may have a substitution, deletion or addition of one or more nucleotides compared to the nucleic acid of the invention.
Additionally, the probes and methods of the invention can be used to isolate nucleic acids encoding polypeptides having at least about 99%, at least 95%, at least 90%, at least 85%, at least 80%, at least 75%, at least 70%, at least 65% , at least 60%, at least 55% or at least 50% sequence identity (homology) with the polypeptide of the invention containing at least 5, 10, 15, 20, 25, 30, 35, 40 of which , 50, 75, 100 or 150 of consecutive amino acids determined using a sequence alignment algorithm (eg, such as the FASTA version 3.0t78 algorithm with default parameters or the BLAST 2.2.2 program with the example settings provided herein).
Oligonucleotide probes and methods of their use
The invention also provides nucleic acid probes for identifying nucleic acids encoding a polypeptide having epoxide hydrolase activity. In one embodiment, the probe comprises at least 10 consecutive bases of sequence as set forth in the exemplary sequence of the invention. Alternatively, a probe of the invention may have at least about 5, 6, 7, 8 or 9 to about 40, about 10 to 50, about 20 to 60, about 30 to 70 contiguous bases of the sequence of the invention. Probes identify nucleic acid by binding or hybridization. The probes may be used in arrays of the invention, see discussion below, including, for example, capillary arrays. The assays of the invention can also be used to isolate other nucleic acids or polypeptides.
The assays of the invention can be used to determine whether a biological sample, such as a soil sample, contains an organism having a nucleic acid sequence of the invention or an organism from which the nucleic acid is derived. In such procedures, a biological sample is obtained that potentially contains the organism from which the nucleic acid was isolated, and nucleic acids are obtained from the sample. Nucleic acids are contacted with the probe under conditions that allow the probe to hybridize specifically with any complementary sequences present in the sample. Where necessary, conditions that allow specific hybridization of the probe with complementary sequences can be determined by contacting the probe with complementary sequences from samples known to contain the complementary sequence as well as control sequences that do not contain the complementary sequence. Hybridization conditions, such as salt concentration in the hybridization buffer, formamide concentration in the hybridization buffer, or hybridization temperature, can be varied to identify conditions that allow the probe to hybridize specifically with complementary nucleic acids (see discussion of specific hybridization conditions).
If the sample contains the organism from which the nucleic acid was isolated, specific hybridization of the probe is detected. Hybridization can be detected by labeling the probe with a detectable agent, such as a radioisotope, a fluorescent dye, or an enzyme that can catalyze the formation of a detectable product. Many methods of using labeled probes to detect the presence of complementary nucleic acids in a sample are known to those skilled in the art. These include Southern Blots, Northern Blots, colony hybridization procedures and spot blots. Protocols for each of these procedures are available from Ausubel and Sambrook.
Alternatively, more than one probe (at least one of which is capable of specifically hybridizing to any complementary sequence present in the nucleic acid sample) can be used in the amplification reaction to determine whether the sample contains an organism comprising a nucleic acid sequence of the invention (e.g., an organism from which the nucleic acid was isolated). In one embodiment, the probes comprise oligonucleotides. In one embodiment, the amplification reaction may include a PCR reaction. PCR protocols are described in Ausubel and Sambrook (see discussion of amplification reactions). In such procedures, the nucleic acids in the sample are brought into contact with the probes, the amplification reaction is carried out, and each resulting amplification product is detected. The amplification product can be detected by performing gel electrophoresis on the reaction products and staining the gel with an intercalator such as ethidium bromide. Alternatively, one or more probes can be labeled with a radioactive isotope, and the presence of a radioactive amplification product can be detected by autoradiography after gel electrophoresis.
Probes derived from sequences near the 3' or 5' ends of the nucleic acid sequences of the invention can also be used in chromosome walking procedures to identify clones containing additional, e.g. genomic sequences. Such methods enable the isolation of genes encoding additional proteins of interest from the host organism. In one embodiment, the nucleic acid sequences of the invention are used as probes to identify and isolate related nucleic acids.
In some aspects, the cognate nucleic acids so identified may be cDNA or genomic DNA from organisms other than those from which the nucleic acid of the invention was first isolated. In such procedures, a nucleic acid sample is contacted with a probe under conditions that allow the probe to hybridize specifically with cognate sequences. Hybridization of the probe with nucleic acids from a related organism is then detected by any of the methods described above.
In nucleic acid hybridization reactions, the conditions used to achieve a particular level of stringency will vary depending on the nature of the nucleic acids being hybridized. For example, the length, degree of complementarity, nucleotide sequence composition (eg, GC vs. AT content), and nucleic acid type (eg, RNA vs. DNA) of the hybridizing nucleic acid regions can be considered when selecting hybridization conditions. An additional consideration is whether one of the nucleic acids is immobilized, for example on a filter. Hybridization can be performed under conditions of low stringency, moderate stringency or high stringency. As an example of nucleic acid hybridization, a polymer membrane containing immobilized denatured nucleic acids is first prehybridized for 30 minutes at 45°C in a solution consisting of 0.9 M NaCl, 50 mM NaH2PO4, pH 7.0, 5.0 mM Na2EDTA, 0.5% SDS, 10x Denhardt and 0.5 mg/ml polyriboadenylic acid. About 2 x 107 cpm (specific activity 4-9 x 108 cpm/µg) 32 P-terminally labeled oligonucleotide probe was then added to the solution. After 12-16 hours of incubation, the membrane is washed for 30 minutes at room temperature (RT) in 1×SET (150 mM NaCl, 20 mM Tris hydrochloride, pH 7.8, 1 mM Na2EDTA) containing 0.5% SDS, and then a 30 minute wash in fresh 1×SET at Tm-10°C for the oligonucleotide probe. The membrane is then exposed to autoradiographic film to detect hybridization signals.
By varying the stringency of the hybridization conditions used to identify nucleic acids, such as cDNA or genomic DNA, that hybridize to a detectable probe, nucleic acids having different levels of homology to the probe can be identified and isolated. The stringency can be varied by hybridization at different temperatures below the melting point of the probes. The melting point, Tm, is the temperature (under defined ionic strength and pH) at which 50% of the target sequence hybridizes to a perfectly complementary probe. Very stringent conditions were chosen that were equal to or about 5°C lower than the Tm for a particular probe. The melting point of the probe can be calculated using the following example formulas. For probes from 14 to 70 nucleotides, the melting point (Tm) is calculated from the formula: Tm=81.5+16.6(log [Na+])+0.41(G+C fraction)−(600/N) where is N the length of the probe. If the hybridization is performed in a solution containing formamide, the melting point can be calculated according to the formula: Tm=81.5+16.6(log [Na+])+0.41 (G+C fraction)−(0.63% formamide) − (600 /N ) where N is the length of the probe. Prehybridization can be performed in 6×SSC, 5×Denhardt's reagent, 0.5% SDS, 100 μg denatured fragmented salmon sperm DNA or 6×SSC, 5×Denhardt's reagent, 0.5% SDS, 100 μg denatured fragmented salmon sperm DNA , 50% formamide. Formulas for SSC and Denhardt and other solutions are given, for example, in Sambrook.
Hybridization is performed by adding a detectable probe to the previously mentioned pre-hybridization solutions. When the probe contains double-stranded DNA, it is denatured before being added to the hybridization solution. The filter is brought into contact with the hybridization solution long enough for the probe to hybridize with cDNA or genomic DNA containing sequences that are complementary or homologous to it. For probes longer than 200 nucleotides, hybridization can be performed at 15-25°C below Tm. For shorter probes, such as oligonucleotide probes, hybridization can be performed at 5-10°C below the Tm. In one embodiment, hybridizations in 6xSSC are performed at about 68°C. In one embodiment, hybridizations in 50% solutions containing formamide are performed at about 42°C. All the hybridizations mentioned above can be considered to have been performed under very strict conditions.
After hybridization, the filter is washed to remove nonspecifically bound detectable probe. The stringency used to wash the filters can also vary depending on the nature of the hybridized nucleic acids, the length of the hybridized nucleic acids, the degree of complementarity, the composition of the nucleotide sequences (eg, GC vs. AT content), and the type of nucleic acid (eg, RNA v. DNA). Examples of washing under increasingly stringent conditions are as follows: 2 x SSC, 0.1% SDS at room temperature for 15 minutes (low stringency); 0.1 x SSC, 0.5% SDS at room temperature for 30 minutes to 1 hour (moderate weight); 0.1 x SSC, 0.5% SDS for 15 to 30 minutes between hybridization temperature and 68°C (high stringency); and 0.15 M NaCl for 15 min at 72°C (very stringent conditions). A final low stringency wash can be performed in 0.1 x SSC at room temperature. The above examples are only illustrative of one set of conditions that can be used to wash filters. A person skilled in the art knows that there are many washing recipes with varying degrees of severity.
Nucleic acids that have hybridized to the probe can be identified by autoradiography or other conventional techniques. The above procedure can be modified to identify nucleic acids with decreasing levels of homology to the probe sequence. For example, less stringent conditions can be used to obtain nucleic acids with reduced homology to the detectable probe. For example, the annealing temperature can be decreased in 5°C steps from 68°C to 42°C in about 1M Na+ hybridization buffer. After hybridization, the filter can be washed with 2 x SSC, 0.5% SDS at the hybridization temperature. These conditions are considered "moderate" conditions above 50°C and "low" conditions below 50°C. An example of "moderate" hybridization conditions is when the above hybridization is performed at 55°C. An example of "low" hybridization conditions. Stringent hybridization conditions are when the above hybridization is performed at 45°C.
Alternatively, hybridization can be performed in buffers such as 6xSSC containing formamide at 42°C. In this case, the concentration of formamide in the hybridization buffer can be decreased in 5% increments from 50% to 0% to identify clones having decreasing levels of homology to the probe. After hybridization, the filter can be washed 6 x SSC, 0.5% SDS at 50°C. These conditions are considered "moderate" conditions above 25% formamide and "low" conditions below 25% formamide. A specific example of "moderate" hybridization conditions is when the above hybridization is performed with 30% formamide. A specific example of "low stringency" hybridization conditions is when the above hybridization is performed with 10% formamide.
These probes and methods of the invention can be used to isolate nucleic acids having a sequence of at least about 99%, 98%, 97%, at least 95%, at least 90%, at least 85%, at least 80%, at least 75%, at least 70% , at least 65%, at least 60%, at least 55%, or at least 50% homology with a nucleic acid sequence of the invention containing at least about 10, 15, 20, 25, 30, 35, 40, 50, 75, 100, 150, 200, 250, 300, 350, 400, 500, 550, 600, 650, 700, 750, 800, 850, 900, 950, 1000 or more consecutive bases and sequences complementary to it. Homology can be measured using a matching algorithm as explained here. For example, homologous polynucleotides may have a coding sequence that is a natural allelic variant of one of the coding sequences described herein. Such allelic variants may have a substitution, deletion or addition of one or more nucleotides compared to the nucleic acid of the invention.
Additionally, the probes and methods of the invention can be used to isolate nucleic acids encoding polypeptides having at least about 99%, at least 95%, at least 90%, at least 85%, at least 80%, at least 75%, at least 70%, at least 65% , at least 60%, at least 55% or at least 50% sequence identity (homology) with the polypeptide of the invention containing at least 5, 10, 15, 20, 25, 30, 35, 40 of which , 50, 75, 100 or 150 of consecutive amino acids determined using a sequence alignment algorithm (eg, such as the FASTA version 3.0t78 algorithm with default parameters or the BLAST 2.2.2 program with the example settings provided herein).
Inhibition of epoxide hydrolase expression
The invention further provides nucleic acids complementary (eg, antisense sequences) to the nucleic acid sequences of the invention. Antisense sequences are capable of inhibiting the transport, splicing or transcription of genes encoding epoxide hydrolase. Inhibition can be achieved by targeting genomic DNA or messenger RNA. Transcription or function of the target nucleic acid can be inhibited, for example, by hybridization and/or cleavage. One particularly useful set of inhibitors provided by the present invention includes oligonucleotides capable of binding an epoxide hydrolase gene or message, in each case preventing or inhibiting epoxide hydrolase production or function. Linkage can be achieved by sequence-specific hybridization. Another useful class of inhibitors includes oligonucleotides that cause inactivation or cleavage of the epoxide hydrolase message. The oligonucleotide may have enzymatic activity that causes cleavage such as, for example, ribozymes. The oligonucleotide can be chemically modified or conjugated to an enzyme or preparation that can cleave complementary nucleic acid. A wide range of such oligonucleotides can be screened for those with the desired activity.
antisense oligonucleotides
The invention provides antisense oligonucleotides capable of binding epoxide hydrolase messages that can inhibit proteolytic activity by targeting mRNA. Strategies for designing antisense oligonucleotides are well described in the scientific and patent literature, and one skilled in the art can design such epoxide hydrolase oligonucleotides using the novel reagents of the invention. For example, gene walking/RNA mapping protocols for screening effective antisense oligonucleotides are well known in the art, see e.g. Ho (2000) Methods Enzymol. 314:168-183, which describes an RNA mapping assay based on standard molecular techniques to provide a simple and reliable method for selecting strong antisense sequences. See also Smith (2000) Eur. J Pharm. Science 11:191-198.
Naturally occurring nucleic acids are used as antisense oligonucleotides. Antisense oligonucleotides can be of any length; for example, in alternative aspects, the antisense oligonucleotides are from about 5 to 100, from about 10 to 80, from about 15 to 60, from about 18 to 40. The optimal length can be determined by routine examination. Antisense oligonucleotides can be present at any concentration. The optimal concentration can be determined by routine examination. Many synthetic analogues of nucleotides and nucleic acids that do not occur in nature are known to solve this potential problem. For example, peptide nucleic acids (PNAs) containing nonionic backbones such as N-(2-aminoethyl)glycine units can be used. Antisense oligonucleotides having phosphorothioate linkages can also be used, as described in WO 97/03211; WO 96/39154; Mata (1997) Toxicol Appl Pharmacol 144:189□197; Antisense therapy, ed. Agrawal (Human Press, Totowa, NJ, 1996). Antisense oligonucleotides having synthetic DNA backbone analogs provided by the invention may also include phosphorodithioate, methylphosphonate, phosphoramidate, phosphotriester alkyl, sulfamate, 3'-thioacetal, methylene (methylimino), 3'-N-carbamate and morpholino carbamate nucleic acids as herein described above.
Combinatorial chemistry methodology can be used to generate vast numbers of oligonucleotides that can be rapidly screened for specific oligonucleotides having appropriate binding affinities and specificities for any target, such as the sense and antisense epoxide hydrolase sequences of the invention (see, e.g., 1995) J Biol. Chem. 270:13581-13584).
inhibition ribozymes
The invention provides ribozymes capable of binding epoxide hydrolase messages that can inhibit proteolytic activity by targeting mRNA. Strategies for designing ribozymes and selecting epoxide hydrolase-specific antisense sequences for targeting are well described in the scientific and patent literature, and one skilled in the art can design such ribozymes using the novel reagents of the invention. Ribozymes work by binding to a target RNA using the target RNA-binding portion of the ribozyme, which is held close to the enzymatic portion of the RNA that cleaves the target RNA. Thus, the ribozyme recognizes and binds the target RNA through complementary base pairing, and after binding to the correct site, acts enzymatically to cleave and inactivate the target RNA. Cleaving the target RNA in this way will destroy its ability to directly synthesize the encoded protein if the cleavage occurs in the coding sequence. After a ribozyme binds and cleaves its target RNA, it is usually released from that RNA so that it can bind and cleave new targets many times.
In certain circumstances, the enzymatic nature of the ribozyme may be preferable to other technologies, such as antisense technology (where a nucleic acid molecule simply binds to a target nucleic acid to block its transcription, translation, or association with another molecule), because an effective concentration of the ribozyme is necessary for therapeutic treatment may be lower than the concentration of the antisense oligonucleotide. This potential advantage reflects the ribozyme's ability to act enzymatically. Thus, one ribozyme molecule can cleave many target RNA molecules. In addition, a ribozyme is usually a highly specific inhibitor whose specificity of inhibition depends not only on the mechanism of base-pairing binding, but also on the mechanism by which the molecule inhibits the expression of the RNA to which it binds. That is, inhibition is due to cleavage of the target RNA, and therefore specificity is defined as the ratio of the rate of cleavage of the target RNA to the rate of cleavage of the non-target RNA. This cleavage mechanism depends on factors additional to those related to base pairing. Therefore, the specificity of ribozyme action can be greater than the specificity of antisense oligonucleotides that bind the same RNA site.
The RNA ribozyme enzyme molecule can be formed with a hammerhead motif, but it can also be formed with a hairpin motif, hepatitis delta virus, a group I intron, or an RNAseP-like RNA (combined with an RNA leader sequence). Examples of such hammer motifs are described in Rossi (1992) Aids Research and Human Retroviruses 8:183; hairpin motifs Hampel (1989) Biochemistry 28:4929 and Hampel (1990) Nuc. cut acids 18:299; hepatitis delta virus motif Perrotta (1992) Biochemistry 31:16; RNaseP motif Guerrier-Takada (1983) Cell 35:849; and the group I intron of Cech, US Pat. United 4,987,071. The recitation of these specific motifs is not meant to be limiting; Those skilled in the art will appreciate that an RNA enzyme molecule of the invention has a specific substrate binding site complementary to one or more target RNA regions of a gene and has a nucleotide sequence within or around that substrate binding site that confers RNA cleaving activity on the molecule.
Modifications of nucleic acids
The invention provides methods for generating nucleic acid variants of the invention, eg encoding epoxide hydrolase enzymes. These methods can be repeated or used in different combinations to produce an epoxide hydrolase enzyme with altered or different activity or altered or different stability than the epoxide hydrolase encoded by the parent nucleic acid. These methods can also be repeated or used in different combinations, eg to generate changes in gene/message expression, message translation or message stability. In another aspect, the genetic makeup of the cell is changed, eg, by ex vivo modification of the homologous gene and its subsequent reintroduction into the cell.
The nucleic acid of the invention can be altered in any way. For example, random or stochastic methods or non-stochastic or "directed evolution" methods, see, e.g., 5,830,696. For example, mutagens can be used to randomly mutate genes. Mutagens include, eg, ultraviolet light or gamma radiation, or a chemical mutagen, eg, mitomycin, nitric acid, photoactivated psoralens, alone or in combination to induce DNA breaks amenable to recombinant repair. Other chemical mutagens include, for example, sodium bisulfite, nitric acid, hydroxylamine, hydrazine or formic acid. Other mutagens are analogues of nucleotide precursors, eg nitrosoguanidine, 5-bromouracil, 2-aminopurine or acridine. These factors can be added to the PCR reaction instead of the nucleotide precursor, thereby changing the sequence. Intercalating agents such as proflavin, acriflav, quinacrine and the like can also be used.
Any molecular biology technique can be used, e.g. PCR random mutagenesis, see e.g. Rice (1992) Proc. Natl. Acad. Science USA 89:5467-5471; or combinatorial multi-cassette mutagenesis, see, e.g., Crameri (1995) Biotechniques 18:194-196. Alternatively, nucleic acids, e.g. genes, may reassemble after random or "stochastic" fragmentation, see, e.g., US Pat. no. 6291242; 6,287,862; 6,287,861; 5,955,358; 5,830,721; 5,824,514; 5,811,238; 5 605 793. In alternative aspects, modifications, additions or deletions are introduced by error-prone PCR, shuffling, oligonucleotide-directed mutagenesis, splicing PCR, sex PCR mutagenesis, in vivo mutagenesis, cassette mutagenesis, recursive team mutagenesis, exponential team mutagenesis. mutagenesis, site-specific, gene reassembly, gene site saturation mutagenesis (GSSM™), synthetic ligation reassembly (SLR), recombination, recursive sequence recombination, phosphorothioate-modified DNA mutagenesis, uracil-containing template mutagenesis, duplex gap mutagenesis, mutagenesis mismatch repair repair-deficient mutagenesis, host strain mutagenesis, chemical mutagenesis, radiogenic mutagenesis, deletion mutagenesis, restriction selection mutagenesis, restriction purification mutagenesis, artificial gene synthesis, team mutagenesis, creation of chimeric multimers of nucleic acids and/or a combination of these and other methods.
The following publications describe various recursive recombination procedures and/or methods that may be incorporated into the methods of the invention: Stemmer (1999) "Molecular Growing of Viruses for Targeting and Other Clinical Properties" Tumor Targeting 4:1-4; Ness (1999) Nature Biotechnology 17:893-896; Chang (1999) "Cytokine Evolution Using DNA Family Shuffling" Nature Biotechnology 17:793-797; Minshull (1999) "Protein Evolution by Molecular Breeding" Current Opinion in Chemical Biology 3:284-290; Christians (1999) "Directed Evolution of Thymidine Kinase for AZT Phosphorylation Using DNA Family Shuffling" Nature Biotechnology 17:259-264; Crameri (1998) "DNA shuffling of gene families from different species accelerates directed evolution" Nature 391:288-291; Crameri (1997) "Molecular evolution of the arsenate detoxification pathway by DNA scrambling," Nature Biotechnology 15:436-438; Zhang (1997) "Directed evolution of efficient fucosidase from DNA galactosidase by shuffling and screening" Proc. Natl. Acad. Science USA 94:4504-4509; Patten et al. (1997) "Applications of DNA Shuffling to Pharmaceuticals and Vaccines" Current Opinion in Biotechnology 8:724-733; Cramer et al. (1996) "Construction and evolution of antibody-phage libraries by DNA shuffling" Nature Medicine 2:100-103; Gates et al. (1996) "Affinity-selective isolation of ligands from peptide libraries by display of lac repressor 'helmet dimers'" Journal of Molecular Biology 255:373-386; Stemmer (1996) "Sexual PCR and Pooled PCR" in: Encyclopedia of Molecular Biology. VCH Publishers, New York. p. 447-457; Crameri and Stemmer (1995) "Multi-cassette combinatorial mutagenesis generates all permutations of mutant and wild-type cassettes" BioTechniques 18:194-195; Stemmer et al. (1995) "One-step assembly of genes and whole plasmids to form large numbers of oligodeoxyribonucleotides" Gene, 164:49-53; Stemmer (1995) "The Evolution of Molecular Computation" Science 270:1510; Stemmer (1995) "Searching Sequence Space" Bio/Technology 13:549-553; Stemmer (1994) "Rapid in vitro evolution of proteins by DNA shuffling" Nature 370:389-391; and Stemmer (1994) "DNA shuffling by random fragmentation and reassembly: in vitro recombination for molecular evolution." Natl. Acad. Science USA 91:10747-10751.
Mutational methods for generating diversity include, for example, site-directed mutagenesis (Ling et al. (1997) "Approaches to DNA mutagenesis: an overview" Anal Biochem. 254(2): 157-178; Methods Mol. Biol. 57:369 -374; Smith (1985) "In Vitro Mutagenesis" Ann. Rev. Genet. 19:423-462; Botstein & Shortle (1985) "Strategies and Applications of In Vitro Mutagenesis" Science 229:1193-1201 Carter (1986) " Site-directed mutagenesis" Biochem J. 237:1-7 and Kunkel (1987) "The performance of oligonucleotide site-directed mutagenesis" in Nucleic Acids & Molecular Biology (Eckstein, F. and Lilley, eds. DMJ, Springer Verlag, Berlin )); mutagenesis using uracil-containing templates (Kunkel (1985) "Rapid and efficient site-specific mutagenesis without phenotypic selection" Proc. Natl. Acad. Sci. USA 82:488-492; Kunkel et al. (1987) "Rapid and efficient mutagenesis "without phenotypic selection" Methods in Enzymol. 154, 367-382 and Bass et al. (1988) "Mutant Trp repressors with new DNA-binding species" Science 242:240-245); oligonucleotide-directed mutagenesis (Methods in Enzymol. 100: 468-500 (1983); Methods in Enzymol. 154: 329-350 (1987); Zoller & Smith (1982) "Oligonucleotide-directed mutagenesis using M13-derived vectors: an and general procedure for producing point mutations in any DNA fragment" Nucleic Acids Res. 10:6487-6500; Zoller & Smith (1983) "Oligonucleotide-directed mutagenesis of DNAfrags cloned into M13 vectors" Methods in Enzymol. 100:468-500 and Zoller & Smith (1987) Oligonucleotide-directed mutagenesis mutagenesis: a simple method using two oligonucleotide primers and single-stranded DNA template" Methods in Enzymol. 154:329-350); modified DNA in restriction enzyme reactions for the preparation of excised DNA" Nucl. Acids Res. 13: 8749-8764; Taylor et al. (1985) "Rapid generation of oligonucleotide-directed mutations at high frequency using phosphorothioate-modified DNA" Nucl. Acids Res. 13: 8765-8787 (1985); Nakamaye (1986) "Inhibition of restriction endonuclease Nci I cleavage by phosphorothioate groups and its application to oligonucleotide-directed mutagenesis" Nucl. Acids Res. 14: 9679-9698; Sayers et al. (1988) " Y-T exonucleases in phosphorothioate-directed mutagenesis" Nucl. Acids Res. 16:791-802 and Sayers et al. (1988) "Strand-specific cleavage of phosphorothioate-containing DNA by restriction endonucleases in the presence of ethidium bromide" Nucl. Acids Res. 16: 803 -814 ) mutagenesis using gapped duplex DNA (Kramer et al. (1984) "The gapped duplex DNA approach to oligonucleotide-directed mutation construction" Nucl. Acids Res. 12: 9441-9456; Kramer & Fritz (1987) Methods in Enzymol. "Oligonucleotide"-construction of a site-directed mutation by a broken DNA duplex" 154:350-367 Kramer et al. (1988) "Improved in Vitro Enzymatic Reactions in the Spaced Duplex DNA Approach for Oligonucleotide-Directed Mutation Construction" Nucl. Acids Res. 16: 7207 and Fritz et al (1988) "Oligonucleotide-directed construction of mutations: a gapped duplex DNA procedure without enzymatic responses in vitro" Nucl. Acids Res. 16: 6987-6999).
Additional protocols used in the methods of the invention include point mismatch repair (Kramer (1984) "Point Mismatch Repair" Cell 38:879-887), mutagenesis using repair-deficient host strains (Carter et al. (1985) "Improved Oligonucleotide Site-Directed Mutagenesis using the M13 vector" Nucl. Acids Res. 13: 4431-4443 and Carter (1987) "Improved oligonucleotide-directed mutagenesis using the M13 vector" Methods in Enzymol. 154: 382-403), deletion mutagenesis (Eghtedarzadeh (1986) "Use of oligonucleotides to create large deletions" Nucl. Acids Res. 14: 5115), restriction selection and restriction selection and restriction purification (Wells et al. (1986) "Importance of hydrogen bonds in stabilizing the subtilisin transition state" Phil. Trans.R Soc. Lond. A 317: 415-423), mutagenesis by total gene synthesis (Nambiar et al. (1984) "Total synthesis and cloning of the gene encoding the ribonuclease S protein" Science 223: 1299-1301; Sakamar and Khorana (1988) "Total synthesis and expression of the gene for the α-subunit of the bovine rod outer segment guanine nucleotide binding protein (transducin)" Nucl. cut 14: 6361-6372; Wells et al. (1985) "Cassette Mutagenesis: An Efficient Method for Generating Multiple Site-Specific Mutations" Gene 34:315-323; and Grundstrom et al. (1985) "Microscale oligonucleotide-directed mutagenesis "shotgun gene synthesis" Nucl. acids res. 13: 3305-3316), double-strand break repair (Mandecki (1986); Arnold (1993) "Protein engineering in unusual environments" Current Opinion in Biotechnology 4:450-455. "Oligonucleotide-driven repair of double-strand breaks in Escherichia coli plasmids: a site-specific mutagenesis method" Proc. Natl. Acad. Sci. USA, 83:7177-7181). of the above methods can be found in Methods in Enzymology Volume 154, which also describes useful controls for troubleshooting various mutagenesis methods. See also US Patent No. 5,605,793 to Stemmer (February 25, 1997), "Methods for in in vitro recombination"; U.S. Patent No. 5,811,238 Stemmer et al. (September 22, 1998) "Methods for generating polynucleotides with desired properties by iterative selection and recombination"; U.S. Patent No. 5,830,721 Stemmer et al. (November 3, 1998) , "DNA Mutagenesis by Random Fragmentation and Reassembly"; U.S. patent no. 5,834,252 Stemmer et al. (10 Nov 1998) "Complementary End Polymerase Reaction"; U.S. patent no. 5,837,458 Minshull et al. (November 17, 1998), "Methods and Compositions for Cellular and Metabolic Engineering"; WO 95/22625, Stemmer and Crameri, "Mutagenesis by Random Fragmentation and Reassembly"; WO 96/33207 to Stemmer and Lipschutz "Complementary polymerase chain reaction"; WO 97/20078, Stemmer and Crameri "Methods for generating polynucleotides with desired properties by iterative selection and recombination"; WO 97/35966 to Minshull and Stemmer, "Methods and compositions for cellular and metabolic engineering"; WO 99/41402 Punnonen et al. "Targeting genetic vaccine vectors"; WO 99/41383 Punnonen et al. "Immunization of antigen library"; WO 99/41369 Punnonen et al. "Engineering of genetic vaccine vectors"; WO 99/41368 Punnonen et al. "Optimization of immunomodulatory properties of genetic vaccines"; EP 752008, Stemmer and Crameri, "DNA mutagenesis by random fragmentation and reassembly"; EP 0932670 by Stemmer "Evolution of cellular DNA uptake by recombination of a recursive sequence"; WO 99/23107 Stemmer et al., "Modification of viral tropism and host range by shuffling the viral genome"; WO 99/21979 by Apt et al., "Human Papilloma Virus Vectors"; WO 98/31837 del Cardayre et al. "Evolution of Whole Cells and Organisms by Recursive Sequence Recombination"; WO 98/27230 to Patten and Stemmer, "Methods and compositions for engineering polypeptides"; WO 98/27230, Stemmer et al., "Methods for optimizing gene therapy by recursive shuffling and sequence selection", WO 00/00632, "Methods for generating highly diverse libraries", WO 00/09679, "Methods for in vitro generation of Banks of recombinant polynucleotide sequences and the resulting sequence", WO 98/42832, Arnold et al., "Recombination of polynucleotide sequences using random or defined primers", WO 99/29902, Arnold et al., "Method for generating polynucleotide and polypeptide sequences, WO 98/ 41653 by Vind, "In vitro method for DNA library construction", WO 98/41622 by Borchert et al., "Method for library construction using DNA shuffling" and WO 98/42727 by Pati and Zarling, "Sequence changes using DNA shuffling" homologous recombination.
Some US applications provide additional details on various methods of generating diversity, including "SHUFFLING OF CODON-CHANGED GENES" by Patten et al. filed Sep. 28, 1999 (US Serial No. 09/407,800); "EVOLUTION OF WHOLE CELLS AND ORGANS BY RECURSIVE SEQUENCE RECOMBINATION" by del Cardayre et al filed Jul 15, 1998 (US Serial No. 09/166,188) and Jul 15, 1999 (US Serial No. 09/354,922); "OLIGONUCLEOTIDE-MEDIATED NUCLEIC ACID RECOMBINATION" by Crameri et al., filed Sep. 28, 1999 (US Serial No. 09/408,392) and "OLIGONUCLEOTIDE-MEDIATED NUCLEIC ACID RECOMBINATION" by Crameri et al., filed Jan. 18, 2000 (US Serial No. 09/408,392). /01203); "USE OF SYNTHETIC SCHEDULING CODON OLIGONUCLEOTIDE SYNTHESIS" by Welch et al., filed Sep. 28, 1999 (US Serial No. 09/408393); "METHODS FOR MAKING ARRAYS OF CHARACTERS, POLYNUCLEOTIDES AND POLYPEPTIDES WITH DESIRED CHARACTERISTICS" by Selifonov et al filed Jan. 18, 2000 (PCT/US00/01202) and, for example, "METHODS FOR MAKING ARRAYS OF CHARACTERS, POLYNUCLEOTIDES AND POLYPEPTIDES WITH DESIRED CHARACTERISTICS" BASIC CHARACTERISTICS," by Selifonov et al., filed Jul. 18, 2000 (US Serial No. 09/618,579); "METHODS FOR FILLING DATA STRUCTURES FOR USE IN EVOLUTIONARY SIMULATIONS," by Selifonov and Stemmer, filed Jan. 18, 2000 (US Serial No. 09/618,579); /01138); and "SINGLE-STRANDED NUCLEIC TEMPLATE-MEDIATED RECOMBINATION AND NUCLEIC ACID FRAGMENT ISOLATION" by Affholter, filed Sep. 6, 2000 (US Serial No. 09/656,549).
Non-stochastic or "directed evolution" methods include e.g. saturated mutagenesis (GSSM), synthetic ligation reassembly (SLR), or a combination thereof, used to modify the nucleic acids of the invention to produce epoxide hydrolase with new or altered properties (eg strongly acidic or alkaline, high temperature, etc.). Polypeptides encoded by modified nucleic acids can be tested for activity prior to testing for proteolytic or other activity. Any test method or protocol can be used, e.g. using the capillary system platform. See, for example, US Pat. United no. 6361974; 6,280,926; 5 939 250.
Saturation mutagenesis or GSSM
In one aspect of the invention, non-stochastic gene modification, a "directed evolution process", is used to generate epoxide hydrolases with new or altered properties. Variations of this method are called "gene site saturation mutagenesis", "saturated site mutagenesis", "saturation mutagenesis" or simply "GSSM". It can be used in combination with other mutagenicity processes. See, for example, US Pat. United no. 6171820; 6, 238, 884. In one aspect, GSSM comprises the delivery of a template polynucleotide and a plurality of oligonucleotides, each oligonucleotide comprising a sequence homologous to the template polynucleotide, thereby targeting a specific template polynucleotide sequence and a sequence that is homologous to the gene variant; generating progeny polynucleotides containing non-stochastic sequence variations by replicating the template polynucleotide with oligonucleotides, thereby generating polynucleotides containing homologous gene sequence variations.
In one embodiment, primer codons containing a degenerate N,N,G/T sequence are used to introduce point mutations into the polynucleotide so as to generate a set of progeny polypeptides each containing the full range of individual amino acid substitutions. amino acid position, e.g. an amino acid residue in an enzyme active site or ligand binding site that needs to be modified. These oligonucleotides may include a contiguous first homologous sequence, a degenerate N,N,G/T sequence and optionally a second homologous sequence. The downstream progeny translation products resulting from the use of such oligonucleotides include all possible amino acid changes at each amino acid site along the polypeptide, since the degeneracy of the N,N,G/T sequence includes codons for all 20 amino acids. In one embodiment, one such degenerate oligonucleotide (consisting of, eg, one degenerate N,N,G/T cassette) is used to subject each source codon in the parent polynucleotide template to a full range of codon substitutions. In another embodiment, at least two degenerate cassettes - whether in the same oligonucleotide or not - are used to subject at least two source codons in the parent polynucleotide template to a full range of codon substitutions. For example, more than one N,N,G/T sequence can be included in a single oligonucleotide to introduce amino acid mutations at more than one site. These multiple N,N,G/T sequences may be directly adjacent or separated by one or more additional nucleotide sequences. In another aspect, oligonucleotides capable of introducing additions and deletions can be used alone or in combination with codons containing an N,N,G/T sequence to introduce any combination or permutation of amino acid additions, deletions and/or substitutions.
In one embodiment, simultaneous mutagenesis of two or more adjacent amino acid positions is performed using an oligonucleotide containing N,N,G/T adjacent triplets, ie. degenerate sequence (N,N,G/T)n. In another aspect, degenerate cassettes having less degeneracy than the N,N,G/T sequence are used. For example, in some cases it may be desirable to use (eg, in an oligonucleotide) a degenerate triplet sequence consisting of only one N, wherein said N is at the first, second, or third position of the triplet. In the other two positions out of three, any other rules may be used, including any combinations and permutations thereof. Alternatively, in some cases it may be desirable to use (eg in oligo) a degenerate N,N,N triplet sequence.
In one aspect, the use of degenerate triplets (eg, N,N,G/T triplets) allows the systematic and simple generation of the full range of possible natural amino acids (20 amino acids in total) at each amino acid position in the polypeptide (in alternative aspects, the methods also include generating less of all possible substitutions for the position of an amino acid residue or codon). For example, for a polypeptide of 100 amino acids, 2000 different species can be generated (ie, 20 possible amino acids per position X 100 amino acid positions). Using an oligonucleotide or set of oligonucleotides containing a degenerate N,N,G/T triplet, 32 individual sequences can encode all 20 possible naturally occurring amino acids. Therefore, in a reaction vessel in which the parent polynucleotide sequence is subjected to saturation mutagenesis using at least one such oligonucleotide, 32 different progeny polynucleotides encoding 20 different polypeptides are generated. In contrast, the use of a non-degenerate oligonucleotide in site-directed mutagenesis results in only one progeny polypeptide product per reaction vessel. Non-degenerate oligonucleotides can optionally be used in combination with the degenerate primers disclosed; for example, non-degenerate oligonucleotides can be used to create specific point mutations in a working polynucleotide. This provides one way to generate specific silent point mutations, point mutations that lead to corresponding amino acid changes, and point mutations that result in stop codon generation and appropriate expression of polypeptide fragments.
In one aspect, each mutagenesis reaction vessel contains polynucleotides encoding at least 20 progeny polypeptide molecules (e.g., (other aspects use fewer than all 20 natural combinations). 32-fold degenerate progeny polypeptides generated from each mutagenesis reaction vessel can be clonally amplified (e.g., cloned into a suitable host, e.g., an E. coli host, using, e.g., an expression vector) and screened for expression When an individual progeny polypeptide is identified by screening to exhibit a favorable change in properties (compared to the parent, such as enhanced proteolytic activity under alkaline or acidic conditions), can be sequenced to identify a suitably desirable substitution contained therein.
In one embodiment, after mutagenesis of each amino acid position in the parent polypeptide using saturation mutagenesis as disclosed herein, beneficial amino acid changes can be identified at more than one amino acid position. One or more new progeny molecules containing a combination of all or part of these preferred amino acid substitutions can be generated. For example, if 2 specific favorable amino acid changes are identified at each of the 3 amino acid positions in the polypeptide, the permutations include 3 possibilities at each position (no change from the original amino acid and each of the two favorable changes) and 3 possibilities. positions. So there are 3×3×3 or 27 total possibilities, including the 7 previously explored – 6 single point mutations (ie 2 at each of the three positions) and no change at any position.
In another embodiment, site-saturation mutagenesis can be used in conjunction with other methods of stochastic or non-stochastic sequence modification, e.g. synthetic ligation reassembly (see below), shuffling, chimerization, recombination and other mutagenic processes and mutagenic agents. The present invention provides for the use of any mutagenesis process, including saturation mutagenesis, in a repetitive manner.
Synthetic Ligation Reassembly (SLR)
The invention provides a non-stochastic gene modification system called "synthetic ligation reassembly" or simply "SLR", a "directed evolution process", to generate epoxide hydrolases with new or altered properties. SLR is a method of non-stochastic ligation of oligonucleotide fragments. This method differs from stochastic mixing of oligonucleotides in that the nucleic acid building blocks are not randomly mixed, combined or chimerized, but assembled non-stochastically. See, e.g., U.S. Patent Application Ser. 09/332,835 entitled “Synthetic Ligation Reassembly in Directed Evolution” and filed Jun. 14, 1999 (“U.S. Ser. No. 09/332,835”). In one aspect, SLR comprises the steps of: (a) providing a template polynucleotide, wherein the template polynucleotide comprises a sequence encoding a homologous gene; (b) providing a plurality of building block polynucleotides, wherein the building block polynucleotides are designed to cross-assemble with a template polynucleotide in a predetermined sequence, and the building block polynucleotide comprises a sequence that is homologous to the variant gene and a sequence homologous to the template polynucleotide surrounding the variant sequence; (c) linking the building block polynucleotide to the template polynucleotide such that the building block polynucleotide is cross-linked to the template polynucleotide to produce polynucleotides containing homologous variants of the gene sequence.
SLR does not depend on the presence of a high level of homology between the polynucleotides to be rearranged. Therefore, this method can be used to nonstochastically generate libraries (or pools) of progeny molecules consisting of more than 10,100 different chimeras. SLR can be used to generate libraries consisting of over 101,000 different progeny chimeras. Accordingly, aspects of the present invention include non-stochastic methods for producing a set of finalized chimeric nucleic acid molecules with an overall design-selected assembly order. The method includes the steps of generating by design a plurality of specific nucleic acid building blocks having useful, mutually compatible, exposed ends, and assembling those nucleic acid building blocks to achieve the designed overall assembly sequence.
The mutually compatible, linkable ends of the nucleic acid building blocks being assembled are considered "usable" for this type of ordered assembly if they allow the joining of the building blocks in a predetermined order. Therefore, the general order of assembly in which nucleic acid building blocks can be joined is determined by the construction of the ends that can be joined. If more than one assembly step is to be used, then the overall assembly order in which the nucleic acid building blocks can be joined is also determined by the sequential order of the assembly step(s). In one aspect, the annealed structural members are treated with an enzyme, such as a ligase (eg, T4 DNA ligase), to effect covalent attachment of the structural members.
In one aspect, the design of oligonucleotide building blocks is obtained by analyzing a set of template nucleic acid sequences that serve as the basis for generating a progeny set of finalized chimeric polynucleotides. These parental oligonucleotide templates therefore serve as a source of sequence information to aid in the design of nucleic acid building blocks to be mutagenized, eg, chimerized or scrambled. In one aspect of this method, the sequences of multiple parental nucleic acid templates are aligned to select one or more demarcation points. Breakpoints can be within a region of homology and consist of one or more nucleotides. These demarcation points are preferably common to at least two parent templates. Demarcation points can therefore be used to delineate the boundaries of the oligonucleotide building blocks that are generated to rearrange the parent polynucleotides. The demarcation points identified and selected in the progenitor molecules serve as potential chimerization points in the assembly of the final chimeric progenitor molecules. A demarcation point can be a region of homology (consisting of at least one homologous nucleotide base) shared by at least two parental polynucleotide sequences. Alternatively, the demarcation point can be a region of homology shared by at least half of the parental polynucleotide sequences, or it can be a region of homology shared by at least two-thirds of the parental polynucleotide sequences. Even more preferably, the useful demarcation point is a region of homology shared by at least three-quarters of the parental polynucleotide sequences, or may be shared with substantially all of the parental polynucleotide sequences. In one aspect, the demarcation point is a region of homology that is common to all parental polynucleotide sequences.
In one aspect, the ligation reassembly process is performed exhaustively to generate an exhaustive library of chimeric progeny polynucleotides. In other words, all possible ordered combinations of building blocks of nucleic acids are represented in the set of finalized chimeric nucleic acid molecules. At the same time, in another aspect, the order of assembly (ie, the order of assembly of each building block in the 5' to 3 sequence of each finalized chimeric nucleic acid) in each combination is by design (or non-stochastic) as described above. Due to the non-stochastic nature of this invention, the possibility of unwanted by-products is greatly reduced.
In another aspect, the method of reassembling the ligation is carried out systematically. For example, the method is implemented to generate a systematically partitioned library of progeny molecules with partitions that can be systematically searched, e.g., one at a time. In other words, the present invention ensures that through the selective and judicious use of specific nucleic acid building blocks, together with the selective and judicious use of sequentially graded assembly reactions, a design can be achieved that produces specific sets of progeny in each of several reaction vessels. This enables a systematic review and screening process. Therefore, these methods enable the systematic study of a potentially very large number of progeny molecules in smaller groups. Due to their ability to perform chimerization in a way that is very flexible, but also exhaustive and systematic, especially when there is a low level of homology between the ancestor molecules, these methods allow the creation of a library (or set) consisting of a large number of progeny molecules. Due to the non-stochastic nature of the ligation reassembly invention, the generated progeny molecules preferably contain a library of finalized chimeric nucleic acid molecules in the overall order of assembly selected by design. Saturation mutagenesis and optimized directed evolution methods can also be used to generate different molecular types of progeny. It will be appreciated that the invention provides freedom of choice and control with respect to the choice of demarcation points, the size and number of nucleic acid building blocks, and the size and design of junctions. It will further be appreciated that the requirement of intermolecular homology is greatly reduced for the utility of the present invention. In fact, breakpoints can be selected even in regions with little or no intermolecular homology. For example, due to codon drift, i.e. codon degeneracy, nucleotide substitutions can be introduced into nucleic acid building blocks without changing the amino acid originally encoded in the corresponding progenitor template. Alternatively, the codon can be changed so that the original amino acid is encoded differently. The invention contemplates that such substitutions can be introduced into a building block of a nucleic acid to increase the frequency of intermolecular homologous demarcation points, thus allowing for an increased number of bonds between the building blocks, which in turn allows for more chimeric offspring molecules to be generated.
In another aspect, the synthetic nature of the step in which the building blocks are generated allows for the design and introduction of nucleotides (e.g., optional removal by an in vitro process (e.g., mutagenesis) or an in vivo process (e.g., by exploiting the host organism's ability to assemble. It will be clear that in many cases the introduction of these nucleotides may also be desirable for many other reasons, in addition to the potential benefit of creating a useful demarcation point.
In one embodiment, a nucleic acid building block is used to introduce an intron. Therefore, functional introns are introduced into a man-made gene produced according to the methods described herein. Artificially introduced introns can function in host cells for gene assembly in a similar way that natural introns are functionally used for gene assembly.
An optimized directed evolution system
The invention provides a non-stochastic gene modification system called an "optimized directed evolution system" for generating epoxide hydrolases with new or altered properties. Optimized directed evolution is aimed at exploiting repetitive cycles of reductive recombination, recombination, and selection that enable directed molecular evolution of nucleic acids by recombination. Optimized directed evolution allows the generation of a large population of evolved chimeric sequences, whereby the generated population is highly enriched in sequences that have a predetermined number of crossover events.
A crossover event is the point in a chimeric sequence where the sequence shifts from one parental variant to another parental variant. Such a point is usually at the junction where the oligonucleotides from the two parents are joined together to form a single sequence. This method allows the calculation of exact concentrations of oligonucleotide sequences so that the final chimeric population of sequences is enriched with a selected number of crossover events. This provides greater control over the selection of chimeric variants that have a predetermined number of crossover events.
In addition, this method provides a convenient way to explore the vast amount of possible protein variant space compared to other systems. Previously, if, for example, 1013 chimeric molecules were generated per reaction, it would have been extremely difficult to test such a large number of chimeric variants for a particular activity. Moreover, a significant fraction of the progeny population would have a very high number of crossover events, resulting in proteins that are less likely to have increased levels of a particular activity. Using these methods, the population of chimeric molecules can be enriched with those variants that have a certain number of crossovers. Thus, while 1013 chimeric molecules can still be generated during the reaction, each of the molecules selected for further analysis most likely only has three crossovers, for example. Because the resulting progeny population can be skewed to have a predetermined number of crossover events, the limits of functional diversity between chimeric molecules are reduced. This allows for a more manageable number of variables when calculating which oligonucleotide from the original parental polynucleotides might be responsible for influencing a particular trait.
One method of generating chimeric progeny polynucleotide sequences is to generate oligonucleotides that correspond to fragments or parts of each parent sequence. Each oligonucleotide preferably contains a unique overlapping region so that mixing the oligonucleotides together results in a new variant where each oligonucleotide fragment is assembled in the correct order. Additional information can also be found, e.g. in the USA. Cheese. number 09/332835; US Patent No. 6,361,974. The number of oligonucleotides produced for each parental variant is related to the total number of crossovers produced in the final chimeric molecule formed. For example, three parental nucleotide sequence variants can be subjected to a ligation reaction to find a chimeric variant that has, for example, higher activity at high temperature. As an example, a set of 50 oligonucleotide sequences corresponding to each part of each parental variant can be generated. Accordingly, up to 50 crossover events can occur within each chimeric sequence during the ligation reassembly process. The probability that each of the generated chimeric polynucleotides will contain oligonucleotides from each parental variant in alternating order is very low. If each oligonucleotide fragment is present in the same molar amount in the ligation reaction, it is likely that at certain positions oligonucleotides from the same parent polynucleotide will ligate side by side and therefore no crossover event will result. If the concentration of each oligonucleotide from each parent is held constant during any ligation step in this example, there is a ⅓ chance (assuming 3 parents) that an oligonucleotide from the same parental variant will ligate within the chimeric sequence and no crossover will result.
Therefore, a probability density function (PDF) can be derived to predict the population of crossover events likely to occur during each step of the ligation reaction, given a fixed number of parental variants, the number of oligonucleotides corresponding to each variant, and the concentrations of each variant during each step of the ligation reaction. The statistics and mathematics behind the determination of the PDF file are described below. Using these methods, one can calculate such a probability density function and thus enrich the chimeric progeny population with a predetermined number of crossover events resulting from a particular ligation reaction. Moreover, the target number of crossover events can be predetermined and the system then programmed to calculate the initial amounts of each parent oligonucleotide during each step of the ligation reaction to obtain a probability density function that focuses on the predetermined number of crossover events. These methods are aimed at using repetitive cycles of reductive resorting, recombination, and selection that allow directed molecular evolution of the polypeptide-encoding nucleic acid through recombination. This system allows the generation of a large population of evolved chimeric sequences, whereby the generated population is highly enriched in sequences that have a predetermined number of crossover events. A crossover event is the point in a chimeric sequence where the sequence shifts from one parental variant to another parental variant. Such a point is usually at the junction where the oligonucleotides from the two parents are joined together to form a single sequence. The method calculates the exact concentrations of oligonucleotide sequences so that the final chimeric population of sequences is enriched with a selected number of crossover events. This provides greater control over the selection of chimeric variants that have a predetermined number of crossover events.
In addition, these methods provide a convenient way to explore the vast amount of possible protein variant space compared to other systems. Using the methods described here, a population of chimeric molecules can be enriched for those variants that have a certain number of crossover events. Thus, while 1013 chimeric molecules can still be generated during the reaction, each of the molecules selected for further analysis most likely only has three crossovers, for example. Because the resulting progeny population can be skewed to have a predetermined number of crossover events, the limits of functional diversity between chimeric molecules are reduced. This allows for a more manageable number of variables when calculating which oligonucleotide from the original parental polynucleotides might be responsible for influencing a particular trait.
In one embodiment, the method generates a chimeric progeny polynucleotide sequence by generating oligonucleotides corresponding to fragments or portions of each parental sequence. Each oligonucleotide preferably contains a unique overlapping region so that mixing the oligonucleotides together results in a new variant where each oligonucleotide fragment is assembled in the correct order. See also U.S. Sir. number 09/332835.
The number of oligonucleotides produced for each parental variant is related to the total number of crossovers produced in the final chimeric molecule formed. For example, three parental nucleotide sequence variants can be subjected to a ligation reaction to find a chimeric variant that has, for example, higher activity at high temperature. As an example, a set of 50 oligonucleotide sequences corresponding to each part of each parental variant can be generated. Accordingly, up to 50 crossover events can occur within each chimeric sequence during the ligation reassembly process. The probability that each of the generated chimeric polynucleotides will contain oligonucleotides from each parental variant in an alternating order is very low. If each oligonucleotide fragment is present in the same molar amount in the ligation reaction, it is likely that at certain positions, oligonucleotides from the same parent polynucleotide will ligate side by side and therefore no crossover event will result. If the concentration of each oligonucleotide from each parent is held constant during any ligation step in this example, there is a ⅓ chance (assuming 3 parents) that an oligonucleotide from the same parental variant will ligate within the chimeric sequence and no crossover will result.
Therefore, a probability density function (PDF) can be derived to predict the population of crossover events likely to occur during each step of the ligation reaction, given a fixed number of parental variants, the number of oligonucleotides corresponding to each variant, and the concentrations of each variant during each step of the ligation reaction. The statistics and mathematics behind the determination of the PDF file are described below. Such a probability density function can be calculated and thus enrich the chimeric offspring population with a predetermined number of crossover events resulting from a particular binding reaction. Moreover, the target number of crossover events can be predetermined and the system then programmed to calculate the initial amounts of each parent oligonucleotide during each step of the ligation reaction to obtain a probability density function that focuses on the predetermined number of crossover events.
Determination of crossover events
Aspects of the invention include a system and software that receives a desired crossover probability density function (PDF), the number of parental genes to be reassembled, and the number of fragments to be reassembled as input. The output of this program is a "PDF fragment" that can be used to determine the recipe for producing the reassembled genes and the estimated cross PDF of those genes. The processing described herein is preferably performed in MATLAB® (The Mathworks, Natick, Massachusetts), a programming language and development environment for technical computing.
Iterative processes
In the practice of the invention, these processes may be repeated iteratively. For example, the nucleic acid (or nucleic acids) responsible for the altered epoxide hydrolase phenotype is identified, re-isolated, re-modified, re-tested for activity. This process can be repeated iteratively until the desired phenotype is obtained. For example, an entire biochemical anabolic or catabolic pathway can be introduced into a cell, including proteolytic activity.
Similarly, if a particular oligonucleotide is found to have no effect on a desired trait (eg, a novel epoxide hydrolase phenotype), it can be removed as a variable by synthesizing larger parent oligonucleotides that include the sequence to be removed. Since the incorporation of a sequence into a larger sequence prevents any crossover events, there will no longer be any changes to that sequence in the progeny polynucleotides. This iterative practice of determining which oligonucleotides are most closely related to the desired trait, and which are not, allows more efficient testing of all possible protein variants that may convey a particular trait or activity.
Tasowanie in vivo
In vivo molecular hybridization is used in the methods of the invention to yield variants of the polypeptides of the invention, eg, antibodies, epoxide hydrolases, and the like. In vivo mixing can be performed by exploiting the natural property of cells to recombine multimers. Although in vivo recombination has provided a major natural route to molecular diversity, genetic recombination remains a relatively complex process involving 1) recognition of homology; 2) filament cleavage, filament invasion and metabolic steps leading to the formation of a recombinant chiasm; and finally 3) separation of the chiasma into separate recombinant molecules. Chiasma formation requires the recognition of homologous sequences.
In one aspect, the invention provides a method for making a hybrid polynucleotide from at least a first polynucleotide and a second polynucleotide. The invention can be used to produce a hybrid polynucleotide by introducing at least a first polynucleotide and a second polynucleotide having at least one region of partial sequence homology into a suitable host cell. Regions of partial sequence homology promote processes that lead to sequence reorganization to form a hybrid polynucleotide. The term "hybrid polynucleotide" as used herein refers to any nucleotide sequence resulting from the process of the present invention and comprising sequence from at least two original polynucleotide sequences. Such hybrid polynucleotides may result from intermolecular recombination events that promote sequence integration between DNA molecules. Additionally, such hybrid polynucleotides may result from intramolecular reductive reassortment processes that use repetitive sequences to alter the nucleotide sequence of the DNA molecule.
Generation of sequence variants
The invention also provides methods for generating sequence variants of nucleic acid sequences and epoxide hydrolases of the invention or isolating epoxide hydrolases using nucleic acids and polypeptides of the invention. In one aspect, the invention provides variants of the epoxide hydrolase gene of the invention, which can be altered in any manner, including, for example, those described above.
Isolated variants may occur naturally. A variant can also be created in vitro. Variants can be created using genetic engineering techniques such as site-directed mutagenesis, random chemical mutagenesis, exonuclease III removal procedures, and standard cloning techniques. Alternatively, such variants, fragments, analogs or derivatives may be prepared using chemical synthesis or modification procedures. Other methods of making variants are also known to those skilled in the art. These include procedures in which nucleic acid sequences obtained from natural isolates are modified to produce nucleic acids encoding polypeptides having characteristics that make them valuable in industrial or laboratory applications. In such procedures, a large number of variant sequences having one or more nucleotide differences from the sequence obtained from the natural isolate are generated and characterized. These nucleotide differences may result in amino acid changes relative to polypeptides encoded by nucleic acids from natural isolates.
For example, variants can be generated using error-prone PCR. In error-prone PCR, the PCR is performed under conditions where the fidelity of DNA polymerase copying is low, so that a high rate of point mutation along the length of the PCR product is achieved. Error-prone PCR is described e.g. in Leung, D.W. et al., Technique, 1:11-15, 1989) and Caldwell, R.C. & Joyce G.F., PCR Methods Applic., 2:28-33, 1992. Briefly, in such procedures, the nucleic acids to be mutagenized are mixed with PCR primers, reaction buffer, MgCl2, MnCl2, Taq polymerase, and an appropriate concentration of dNTPs as a high point mutation along the entire length of the PCR product would be achieved. For example, the reaction can be performed using 20 fmol of mutagenizing nucleic acid, 30 pmol of each PCR primer, reaction buffer containing 50 mM KCl, 10 mM Tris HCl (pH 8.3) and 0.01% gelatin, 7 mM MgCl2, 0 , 5 mM MnCl2, 5 units of Taq polymerase, 0.2 mM dGTP, 0.2 mM dATP, 1 mM dCTP, and 1 mM dTTP. PCR can be performed in 30 cycles at 94°C for 1 minute, 45°C for 1 minute and 72°C for 1 minute. However, it should be noted that these parameters can be changed accordingly. The mutated nucleic acids are cloned into the appropriate vector and the activity of the polypeptides encoded by the mutated nucleic acids is assessed.
Variants can also be created using site-directed oligonucleotide mutagenesis to create site-specific mutations in any cloned DNA of interest. Oligonucleotide mutagenesis is described, for example, in Reidhaar-Olson (1988) Science 241:53-57. Briefly, in such procedures, a plurality of double-stranded oligonucleotides carrying one or more mutations to be introduced into the cloned DNA are synthesized and inserted into the cloned DNA to be mutagenized. Clones containing the mutated DNA are recovered and the activities of the polypeptides they encode are assessed.
Another method of generating variants is PCR assembly. PCR assembly involves the assembly of a PCR product from a mixture of small DNA fragments. A large number of different PCR reactions take place in parallel in the same vial, with the products of one reaction preceding the products of the other reaction. PCR assembly is described, e.g., in U.S. Pat. no. 5965408.
Another method of generating variants is sex-specific PCR mutagenesis. In PCR sex mutagenesis, forced homologous recombination occurs between DNA molecules of different but closely related DNA sequences in vitro, by randomly fragmenting the DNA molecule based on sequence homology and then fixing the crossover by primer extension in the PCR reaction. Sexual PCR mutagenesis is described, for example, in Stemmer (1994) Proc. Natl. Acad. Science USA 91:10747-10751. Briefly, in such procedures, many of the nucleic acids to be recombined are digested with DNase to produce fragments averaging 50-200 nucleotides in size. Fragments of the desired average size were purified and resuspended in the PCR mixture. PCR is performed under conditions that facilitate recombination between nucleic acid fragments. For example, PCR can be performed by resuspending the purified fragments at a concentration of 10-30 ng/L in a solution of 0.2 mM of each dNTP, 2.2 mM MgCl2, 50 mM KCl, 10 mM Tris HCl, pH 9.0 and 0.1% Triton X-100. Add 2.5 units of Taq polymerase per 100:1 reaction mixture and run PCR using the following regime: 94°C for 60 seconds, 94°C for 30 seconds, 50-55°C for 30 seconds, 72°C for 30 seconds (30- 45 times) and 72°C for 5 minutes. However, it should be noted that these parameters can be changed accordingly. In some aspects, PCR reactions can include oligonucleotides. In other aspects, the Klenow fragment of DNA polymerase I can be used in the first set of PCR reactions and the Taq polymerase can be used in the next set of PCR reactions. Recombinant sequences are isolated and the activities of the polypeptides they encode are assessed.
Variants can also be generated by in vivo mutagenesis. In some aspects, random mutations in the sequence of interest are generated by amplifying the sequence of interest in a bacterial strain, such as an E. coli strain, that carries mutations in one or more DNA repair pathways. Such "mutator" strains have a higher random mutation rate than the wild-type parent. DNA replication in one of these strains will eventually generate random mutations in the DNA. Mutator strains suitable for use in in vivo mutagenesis are described, for example, in PCT Publication No. WO 91/16427.
Variants can also be generated by cassette mutagenesis. In cassette mutagenesis, a small region of a double-stranded DNA molecule is replaced with a synthetic oligonucleotide "cassette" that differs from the original sequence. An oligonucleotide often contains a fully and/or partially randomized native sequence.
Recursive team mutagenesis can also be used to generate variants. Recursive batch mutagenesis is a protein engineering (protein mutagenesis) algorithm developed to generate diverse populations of phenotypically related mutants whose members differ in amino acid sequence. This method uses a feedback mechanism to control successive rounds of combinatorial cassette mutagenesis. Recursive ensemble mutagenesis is described, for example, in Arkin (1992) Proc. Natl. Acad. Science USA 89:7811-7815.
In some aspects, variants are generated using exponential batch mutagenesis. Exponential batch mutagenesis is the process of generating combinatorial libraries with a high percentage of unique and functional mutants in which small groups of residues are randomly identified in parallel, at each changed position, of amino acids leading to functional proteins. Exponential ensemble mutagenesis is described, for example, in Delegrave (1993) Biotechnology Res. 11:1548-1552. Random and site-directed mutagenesis is described, for example, in Arnold (1993) Current Opinion in Biotechnology 4:450-455.
In some aspects, variants are created using shuffling procedures in which portions of multiple nucleic acids encoding different polypeptides are joined together to form chimeric nucleic acid sequences encoding chimeric polypeptides, as described, e.g. in Patent no. 5,965,408; 5 939 250.
The invention also provides variants of the polypeptides of the invention comprising sequences in which one or more amino acid residues (e.g., an exemplary polypeptide such as SEQ ID NO: 2) are substituted with a conserved or non-conserved amino acid residue (e.g., a conserved amino acid residue) and such substituted amino acid residue it may or may not be encoded by the genetic code. Conservative substitutions are those that replace a given amino acid in a polypeptide with another amino acid with similar properties. Accordingly, polypeptides of the invention include those with conservative substitutions of the sequences of the invention, eg, SEQ ID NO:2, including but not limited to the following substitutions: substitutions of aliphatic amino acids such as alanine, valine, leucine, and isoleucine with another aliphatic amino acid; replacing serine with threonine or vice versa; replacing an acid residue such as aspartic acid and glutamic acid with another acid residue; replacing an amide-bearing residue such as asparagine and glutamine with another amide-bearing residue; replacing a basic residue such as lysine and arginine with another basic residue; and replacing an aromatic residue such as phenylalanine, tyrosine with another aromatic residue. Other variants are those in which one or more amino acid residues of the polypeptide of the invention carry a substituent group.
Other variants within the scope of the invention are those in which the polypeptide is combined with another compound, such as a compound that prolongs the half-life of the polypeptide, such as polyethylene glycol.
Additional variants within the scope of the invention are those in which additional amino acids are linked to the polypeptide, such as a leader sequence, a secretory sequence, a proprotein sequence, or a sequence that facilitates purification, enrichment, or stabilization of the polypeptide.
In some aspects, variants, fragments, derivatives, and analogs of the polypeptides of the invention retain the same biological function or activity as the exemplary polypeptides, e.g., proteolytic activity, as described herein. In other aspects, the variant, fragment, derivative, or analog comprises a proprotein such that the variant, fragment, derivative, or analog can be activated by cleaving a portion of the proprotein to produce an active polypeptide.
Codon optimization to achieve high levels of protein expression in host cells
The invention provides methods for modifying nucleic acids encoding epoxide hydrolase to modify codon usage. In one aspect, the invention provides methods for modifying a codon in a nucleic acid encoding an epoxide hydrolase to increase or decrease its expression in a host cell. The invention also provides nucleic acids encoding an epoxide hydrolase modified to increase its expression in a host cell, such modified epoxide hydrolase, and methods for producing the modified epoxide hydrolases. The method includes identifying an "undesirable" or "less desirable" codon in a nucleic acid encoding an epoxide hydrolase and replacing one or more of these undesirable or less desirable codons with a "preferred codon" encoding the same amino acid as the substituted codon and at least one undesirable or less desirable codon in the nucleic acid they are replaced by the preferred codon that codes for the same amino acid. A preferred codon is a codon that is overrepresented in the coding sequences of a host cell gene, and an unpreferred or less preferred codon is a codon that is underrepresented in the coding sequences of a host cell gene.
Host cells for the expression of nucleic acids, expression cassettes and vectors of the invention include bacteria, yeast, fungi, plant cells, insect cells and mammalian cells. Accordingly, the invention provides methods for optimizing codon usage in all of these cells, codon-altered nucleic acids, and polypeptides produced by codon-altered nucleic acids. Examples of host cells include Gram-negative bacteria such as Escherichia coli and Pseudomonas fluorescens; gram-positive bacteria such as Streptomyces diversa, Lactobacillus gasseri, Lactococcus lactis, Lactococcus cremoris, Bacillus subtilis. Examples of host cells also include eukaryotic organisms, e.g. various yeasts such as Saccharomyces sp., including Saccharomyces cerevisiae, Schizosaccharomyces pombe, Pichia pastoris and Kluyveromyces lactis, Hansenula polymorpha, Aspergillus niger and mammalian cells and cell lines, and insect cells and cell lines. Accordingly, the invention also includes nucleic acids and polypeptides optimized for expression in these organisms and species.
For example, the codons of a nucleic acid encoding an epoxide hydrolase isolated from a bacterial cell are modified so that the nucleic acid is optimally expressed in a bacterial cell other than the bacterium from which the epoxide hydrolase was derived, a yeast, a fungus, a plant cell, an insect cell, or a mammalian cell. Methods for codon optimization are well known in the art, see e.g. no. 5,795,737; Baca (2000) Int. J. Parasitol. 30:113-118; Halle (1998) Expr. Purif. 12:185-188; Narum (2001) Contagious. Resilience 69:7250-7253. See also Narum (2001) Infect. resistance 69:7250-7253, which describes codon optimization in mouse systems; Outtchkourov (2002) Expr. Purif. 24:18-24, which describes codon optimization in yeast; Feng (2000) Biochemistry 39:15399-15409, which describes codon optimization in E. coli; Humphreys (2000) Protein Expr. Purif. 20:252-264, which describes the optimization of codon usage affecting secretion in E. coli.
Transgenic non-human animals
The invention provides transgenic non-human animals comprising a nucleic acid, polypeptide (eg, epoxide hydrolase), expression cassette or vector, or transfected or transformed cell of the invention. Transgenic non-human animals can be, for example, goats, rabbits, sheep, pigs, cows, rats and mice containing the nucleic acids of the invention. These animals can be used, e.g. as in vivo models for testing epoxide hydrolase activity or as models for screening modulators of epoxide hydrolase activity in vivo. Sequences encoding polypeptides to be expressed in non-human transgenic animals can be designed to be constitutive or under the control of tissue-specific regulatory factors, developmentally specific or inducible transcription factors. Non-human transgenic animals can be designed and generated by any method known in the art; see, e.g., US Pat. United no. 6211428; 6,187,992; 6156952; 6118044; 6111166; 6107541; 5,959,171; 5,922,854; 5,892,070; 5,880,327; 5,891,698; 5,639,940; 5,573,933; 5,387,742; 5,087,571, which describes the production and use of transformed cells and eggs and transgenic mice, rats, rabbits, sheep, pigs and cows. See also, eg, Pollock (1999) J. Immunol. Methods 231:147-157, which describe the production of recombinant proteins in the milk of transgenic dairy animals; Baguisi (1999) Nat. Biotechnology. 17:456-461, demonstrating the production of transgenic goats. US Patent US 6,211,428 describes the production and use of transgenic non-human mammals that express a nucleic acid construct containing the DNA sequence in their brains. US Patent No. 5,387,742 describes injecting cloned recombinant or synthetic DNA sequences into fertilized mouse eggs, implanting the injected eggs into pseudo-pregnant females, and breeding full-term transgenic mice whose cells express proteins associated with Alzheimer's disease pathology. US Patent US 6,187,992 describes the production and use of a transgenic mouse whose genome contains a disruption of the gene encoding the amyloid precursor protein (APP).
"Knockout animals" may also be used to practice the methods of the invention. For example, in one aspect, transgenic or modified animals of the invention include a "knockout animal", e.g.
Polypeptides and peptides
The invention provides isolated or recombinant polypeptides having a sequence identical to the exemplary sequence of the invention, eg SEQ ID NO:2; SEQ ID NO: 4; SEQ ID NO: 6; SEQ ID NO: 8, SEQ ID NO: 10, SEQ ID NO: 12, SEQ ID NO: 14, SEQ ID NO: 16, SEQ ID NO: 18, SEQ ID NO: 20; SEQ ID NO: 22; SEQ ID NO: 22; SEQ ID NO: 26; SEQ ID NO: 28, SEQ ID NO: 30, SEQ ID NO: 32, SEQ ID NO: 34, SEQ ID NO: 36; SEQ ID NO: 38, SEQ ID NO: 40, SEQ ID NO: 42, SEQ ID NO: 44, SEQ ID NO: 46, SEQ ID NO: 48, SEQ ID NO: 50; SEQ ID NO: 52; SEQ ID NO: 54; SEQ ID NO: 56; SEQ ID NO: 58, SEQ ID NO: 60; SEQ ID NO:62, SEQ ID NO:64, SEQ ID NO:66, SEQ ID NO:68, SEQ ID NO:70; SEQ ID NO: 72; SEQ ID NO: 74; SEQ ID NO: 76; SEQ ID NO: 78, SEQ ID NO: 80. As discussed above, the identity may include the entire length of the polypeptide, or the identity may include a region of at least about 50, 60, 70, 80, 90, 100, 150 , 200, 250, 300, 350, 400, 450, 500, 550, 600, 650, 700 or more residues. Polypeptides of the invention may also be shorter than full-length exemplary polypeptides (eg, SEQ ID NO:2; SEQ ID NO:4; SEQ ID NO:6; SEQ ID NO:8, SEQ ID NO:10, SEQ ID NO: SEQ ID NO: 12, SEQ ID NO: 14, SEQ ID NO: 16, SEQ ID NO: 18, SEQ ID NO: 20, SEQ ID NO: 22, SEQ ID NO: 28, SEQ ID NO: 30, SEQ ID NO: 32, SEQ ID NO:34, SEQ ID NO:36, SEQ ID NO:38, SEQ ID NO:40, SEQ ID NO:42, SEQ ID NO:44, ID NO SEQ ID NO: 46, SEQ ID NO: 48, SEQ ID NO: 50, SEQ ID NO: 52, SEQ ID NO: 54, SEQ ID NO: 56, SEQ ID NO: 58, SEQ ID NO: 60; SEQ ID NO: 62, SEQ ID NO: 64, SEQ ID NO: 66, SEQ ID NO: 68, SEQ ID NO: 70; SEQ ID NO: 78, SEQ ID NO: 80). In alternative aspects, the invention provides polypeptides (peptides, fragments) ranging in size from about 5 to full length polypeptide, eg an enzyme such as epoxide hydrolase; sample sizes are approximately 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 100, 125, 150, 175, 200, 250, 300, 350, 400, 450, 500, 550, 600, 650, 700 or more residues, eg adjacent residues of an exemplary epoxide hydrolase of the invention. Peptides of the invention may be useful, for example, as probes for labeling antigens, tolerogens, motifs, epoxide hydrolase active sites.
Polypeptides and peptides of the invention can be isolated from natural sources, they can be synthetic or recombinantly generated polypeptides. Peptides and proteins can be recombinantly expressed in vitro or in vivo. The peptides and polypeptides of the invention can be prepared and isolated by any method known in the art. The polypeptide and peptides of the invention may also be synthesized, in whole or in part, using chemical methods well known in the art. See, e.g., Caruthers (1980) Nucleic Acids Res. Symp. Cheese. 215-223; Horn (1980) Nucleic Acids Res. Symp. Cheese. 225-232; Banga, A.K., Therapeutic Peptides and Proteins, Formulation, Processing and Delivery Systems (1995) Technomic Publishing Co., Lancaster, Pa. For example, peptide synthesis can be performed using various solid phase techniques (see, e.g., Roberge (1995) Science 269:202; Merrifield (1997) Methods Enzymol. 289:3-13) and automated synthesis can be performed, e.g., using ABI 431 A Peptide Synthesizer (Perkin Elmer) according to the manufacturer's instructions.
Peptides and polypeptides of the invention may also be glycosylated. Glycosylation can be added post-translationally either chemically or via cellular biosynthetic mechanisms, the latter involving the use of known glycosylation motifs that may be native to the sequence or may be added as a peptide or added to the nucleic acid coding sequence. Glycosylation can be O-linked or N-linked.
The peptides and polypeptides of the invention as defined above include all "mimetic" and "peptidomimetic" forms. The terms "mimetic" and "peptidomimetic" refer to a synthetic chemical compound having substantially the same structural and/or functional characteristics as the polypeptides of the invention. The mimetic may be composed entirely of synthetic, non-natural amino acid analogs or be a chimeric molecule of partially natural peptide amino acids and partially unnatural amino acid analogs. The mimetic may also contain any number of conservative substitutions of natural amino acids, as long as such substitutions also do not significantly alter the structure and/or activity of the mimetic. As with polypeptides from the invention, which are conservative variants, routine experimentation will determine whether the mimetic falls within the scope of the invention, i.e. its structure and/or function have not been significantly changed. Accordingly, in one aspect, a mimetic composition is within the scope of the invention if it has epoxide hydrolase activity.
The polypeptide-mimicking compositions of the invention may contain any combination of unnatural structural components. In an alternative aspect, the mimetic compositions of the invention contain one or all of the following three structural groups: a) residual linking groups other than natural amide bonds ("peptide bond"); b) unnatural residues instead of naturally occurring amino acid residues; or c) residues that induce secondary structural mimicry, i.e. induce or stabilize secondary structure, e.g. beta-turn, gamma-turn, beta-sheet, alpha helix conformation and the like. For example, a polypeptide of the invention can be characterized as a mimetic when all or some of its residues are linked by chemical means other than natural peptide bonds. Individual peptidomimetic residues may be linked by peptide bonds, other chemical bonds or coupling agents, such as e.g. Linking groups that can provide an alternative to traditional amide bonds ("peptide bond") include, for example, ketomethylene (eg -C(=O)-CH2- instead of -C(=O)-NH-), aminomethylene (CH2 -NH ), ethylene, olefin (CH═CH), ether (CH2-O), thioether (CH2-S), tetrazole (CN4-), thiazole, retroamide, thioamide, or ester (see, e.g., Spatola (1983) in Chemistry and Biochemistry of Amino Acids, Peptides and Proteins, Vol. 7, pp. 267-357, "Peptide Backbone Modifications", Marcell Dekker, N.Y.).
A polypeptide of the invention may also be characterized as a mimetic containing all or some of the unnatural residues in place of naturally occurring amino acid residues. Unnatural residues are well described in the scientific and patent literature; some examples of non-natural preparations useful as mimetics of natural amino acid residues and guidelines are described below. Aromatic amino acid mimetics can be made by substitution with, for example, D- or L-naphylalanine; D- or L-phenylglycine; D- or L-2 thienylalanine; D- or L-1, -2,3- or 4-pyrenylalanine; D- or L-3 thienylalanine; D- or L-(2-pyridinyl)-alanine; D- or L-(3-pyridinyl)-alanine; D- or L-(2-pyrazinyl)-alanine; D- or L-(4-isopropyl)-phenylglycine; D-(trifluoromethyl)phenylglycine; D-(trifluoromethyl)-phenylalanine; D-p-fluorophenylalanine; D- or L-p-biphenylphenylalanine; K- or L-p-methoxybiphenylphenylalanine; D- or L-2-indole(alkyl)alanines; and D- or L-alkylalanines where alkyl may be substituted or unsubstituted methyl, ethyl, propyl, hexyl, butyl, pentyl, isopropyl, isobutyl, sec-isotyl, iso-pentyl or non-acidic amino acids. Non-natural amino acid aromatic rings include, for example, thiazolyl, thiophenyl, pyrazolyl, benzimidazolyl, naphthyl, furanyl, pyrrolyl and pyridyl aromatic rings.
Mimics of acidic amino acids can be made by substituting, for example, noncarboxylate amino acids while retaining the negative charge; (phosphono)alanine; sulfated threonine. Carboxyl side groups (eg, aspartyl or glutamyl) can also be selectively modified by reaction with carbodiimides (R'-N-C-N-R'), such as, for example, 1-cyclohexyl-3(2-morpholinyl-(4-ethyl)carbodiimide or 1-ethyl- 3(4-azonia-4,4-dimethylpentyl)carbodiimide Aspartyl or glutamyl can also be converted to asparaginyl and glutaminyl residues by reaction with ammonium ions Basic amino acid mimics can be made by substitution with e.g. (except lysine and arginine) amino acids omitin , citrulline or (guanidino)acetic acid or (guanidino)alkylacetic acid where alkyl is as defined above A nitrile derivative (eg containing a CN moiety instead of COOH) can be replaced by asparagine or glutamine Asparaginyl and glutaminyl residues can be deaminated in the corresponding aspartyl or glutamyl residues Mimetics of arginine residues can be obtained by reacting arginyl with, for example, one or more conventional reagents, including, for example, phenylglyoxal, 2,3-butanedione, 1,2-cyclohexanedione or ninhydrin, preferably under alkaline conditions . Mimetics of tyrosine residues can be obtained by reacting tyrosyl with, for example, aromatic diazonium compounds or tetranitromethane. N-acetylimidizole and tetranitromethane can be used to form O-acetyl tyrosyl forms and 3-nitro derivatives, respectively. Mimetics of cysteine residues can be prepared by reacting cysteinyl residues with, for example, alpha-haloacetates such as 2-chloroacetic acid or chloroacetamide and the corresponding amines; giving carboxymethyl or carboxamidomethyl derivs. Mimetics of cysteine residues can also be prepared by reacting cysteinyl residues with, for example, bromo-trifluoroacetone, alpha-bromo-beta-(5-imidosoyl)propionic acid; chloroacetyl phosphate, N-alkyl maleimides, 3-nitro-2-pyridyl disulfide; 2-pyridyl methyl disulfide; p-chloromercuric benzoate; 2-chloromercury-4-nitrophenol; or chloro-7-nitrobenzooxa-1,3-diazole. Lysine mimetics can be made (and the amino terminal residues can be changed) by reacting lysine with eg succinic anhydrides or other carboxylic acids. Lysine and other mimetics of alpha-amine containing residues can also be prepared by reaction with imidoesters such as methyl picolinimidate, pyridoxal phosphate, pyridoxal hydrochloride, trinitrobenzenesulfonic acid, O-methylisourea, 2,4, pentanedione and transamidase-catalyzed reactions with glyoxylate. Methionine mimetics can be prepared by reaction, for example, with methionine sulfoxide. Proline mimetics include, for example, pipecolic acid, thiazolidine carboxylic acid, 3- or 4-hydroxyproline, dehydroproline, 3- or 4-methylproline, or 3,3-dimethylproline. Mimetics of the histidine residue can be obtained by reacting histidyl with, for example, diethyl procarbonate or para-bromphenacyl bromide. Other mimetics include, for example, mimetics produced by hydroxylation of proline and lysine; phosphorylation of hydroxyl groups of seryl or threonyl residues; methylation of alpha-amino groups of lysine, arginine and histidine; acetylation of the N-terminal amine; methylation of main chain amide residues or substitution with N-methylamino acids; or amidation of C-terminal carboxyl groups.
A residue, eg an amino acid, of a polypeptide of the invention can also be replaced with an amino acid (or peptidomimetic residue) of opposite chirality. Therefore, any amino acid that occurs naturally in the L configuration (which can also be called R or S, depending on the structure of the chemical unit) can be replaced by an amino acid of the same chemical structure or a peptidomimetic, but with the opposite chirality, called a D-amino acid, but can also be called R or S shape.
The invention also provides methods for modifying the polypeptides of the invention by natural processes such as post-translational processing (eg, phosphorylation, acylation, etc.). Modifications can occur anywhere in the polypeptide, including the peptide backbone, amino acid side chains, and the amino or carboxy termini. It should be noted that the same type of modification may be present to the same or different degrees at several sites in a given polypeptide. Also, a certain polypeptide can have several types of modifications. Modifications include acetylation, acylation, ADP ribosylation, amidation, covalent attachment of flavin, covalent attachment of heme, covalent attachment of nucleotides or derivatives of nucleotides, covalent attachment of lipids or derivatives of lipids, covalent attachment of phosphatidylinositol, cross-linking cyclization, disulfide bond formation, demethylation, covalent cross-linking - binding, cysteine formation, pyroglutamate formation, formylation, gamma-carboxylation, glycosylation, GPI anchor formation, hydroxylation, iodination, methylation, myristolysis, oxidation, pegylation, proteolytic processing, phosphorylation, prenylation, racemization, selenoylation, sulfation, and protein-mediated amino acid addition transfer RNA, such as arginylation. See, e.g., Creighton, T.E., Proteins-Structure and Molecular Properties, ed. 2, W.H. Freeman and Company, New York (1993); Posttranslational Covalent Modification of Proteins, B. C. Johnson, Ed., Academic Press, New York, p. 1-12 (1983).
Solid phase peptide chemical synthesis methods can also be used to synthesize polypeptides or fragments of the invention. Such methods have been known in the art since the early 1960s (Merrifield, R.B., J. Am. Chem. Soc., 85:2149-2154, 1963) (see also Stewart, J.M. and Young, J.D., Solid Phase Peptide Synthesis 2nd ed. ., Pierce Chemical Co., Rockford, Ill., pp. 11-12)) and has recently been used in a commercial laboratory design and kit for peptide synthesis (Cambridge Research Biochemicals). Such commercially available laboratory kits generally employ the teachings of H. M. Geysen et al., Proc. Natl. Acad. Sci., USA, 81:3998 (1984) and allow peptides to be synthesized at the ends of multiple "sticks" or "pins", all of which are connected to a single board. When such a system is used, the needle rod or slide is inverted and placed in another plate of appropriate wells or containers containing solutions for attaching or fixing the appropriate amino acid to the tips of the needle or rod. By repeating such a step in the process, i.e. turning and introducing the ends of the sticks and needles into the appropriate solutions, the amino acids are incorporated into the desired peptides. In addition, there are many available FMOC systems for peptide synthesis. For example, assembly of a polypeptide or fragment can be performed on a solid support using an Applied Biosystems, Inc. automated peptide synthesizer. Model 431A™. Such equipment allows easy access to the peptides of the invention either by direct synthesis or by synthesis of a series of fragments that can be joined by other known techniques.
Epoxy hydrolases
Epoxide hydrolases show promise as attractive tools for the synthesis of enantiomerically pure epoxides by hydrolytic kinetic resolution of racemic epoxides. Some of the attractive features of this potentially useful class of enzymes are listed below.
EH are ubiquitous in nature. EH has been found in all tested mammalian species, and the most studied is microsomal epoxide hydrolase (mEH) of mammalian liver. (Armstrong, R.N. Drug Metab. Rev. 1999, 31, 71-86.) Most mammalian EHs are involved in epoxide detoxification, while a few are involved in hormone biosynthesis. Although mammalian EHs have been known for decades, most research has focused on their biological role and mechanism. In the few cases where their use in organic synthesis has been investigated, it has been found that several substrates can be efficiently processed by epoxide hydrolases, leading to enantiomer-enriched epoxides (the unreacted enantiomer) and/or the corresponding vicinal diols. (Archer, IV J. Tetrahedron 1997, 53, 15617-15662.) The observed intrinsic enantioselectivity of these enzymes demonstrated the potential of EH as a biocatalyst for the synthesis of chiral epoxides and diols. However, their use on a preparative scale was not possible due to the difficulty in obtaining large amounts of enzymes by overexpression.
Over the past ten years, a number of EHs have been found in various bacteria, yeasts and fungi. (Svaving, J.; de Bont, J. A. M. Enz. Microbiol. Technol. 1998, 22, 19-26.) Examples of bacterial EHs include those isolated from Agrobacterium radiobacter, Rhodococcus sp., Corynebacterium sp., Mycobacterium paraffinicum, Nocardia sp. , Pseudomonas NRRL B-2994 and some strains of Streptomyces. The fungus EH has also been found in Aspergillus niger, Helminthosporum sativum, Diploida gossypin, Beauveria sulfurescens and some species of Fursarium. The most famous EH yeast is the Rhodotorula glutinis enzyme. Almost all of these enzymes were discovered by screening available strains with different epoxy substrates, and only a few of them were further investigated at the genetic level. Some of these enzymes show good enantioselectivity and are potentially readily available by fermentation. However, before they can be used for large-scale industrial production of epoxides, the range of substrates recognized by microbial epoxide hydrolases needs to be expanded, and the discovery of new EHs should be a viable solution.
EH are "easy to use" catalysts without cofactors. Biochemical studies have shown that EH, like other well-known hydrolytic enzymes such as lipases and esterases, requires neither prosthetic groups nor metal ions to function. The currently proposed mechanism of action of EH is also similar to that of esterase in that a covalent adduct is formed between the active site of the enzyme and the substrate during the catalytic cycle. Site-directed mutagenesis studies and structural data from a bacterial enzyme (A. radiobacter) suggest that the active site is an Asp nucleophile. (Nardini, M.; Ridder, I.S.; Rozeboon, H.J.; Kalk, K.H.; Rink, R.; Janssen, DB; Dijkstra, B.W.J. Biol. Chem. 1999, 274, 14579-14596.)
FIGURE Figure 12 illustrates the mechanism of A. radiobacter epoxide hydrolase. The catalytic mechanism involves two distinct steps. The first step (a) is an SN2 nucleophilic attack by the carboxylate oxygen of Asp107 on the smallest carbon atom of the hindered epoxide, resulting in a covalent ester intermediate. In the second step (b), the intermediate ester is hydrolyzed by a water molecule activated by the Asp246-His275 pair. Compared to ester hydrolysis, where there are no concerns about stereochemistry, epoxide hydrolysis has important stereochemical consequences: regioselectivity (two possible attacked carbons) and reversal of the absolute configuration of the attacked carbon. Therefore, when analyzing reactions catalyzed by epoxide hydrolase, both regioselectivity and enantioselectivity must be considered.
EHs often show high enantioselectivity as well as high activity towards certain categories of epoxy substrates.
Studies on different EHs have provided a significant amount of information regarding their stereoselectivity on different epoxy supports. (Orru, R. V. A.; Faber, F. Curr. Opin. Chem. Biol. 1999, 3, 16-21.) In general, epoxy substrates can be divided into five types: mono-substituted, 2,2-disubstituted, 2, 3 -disubstituted, trisubstituted and styrene oxides (Figure 13). Known EHs have been shown to have different stereoselectivities for different types of substrates.
Most of the tested bacterial and fungal epoxide hydrolases were not highly stereoselective towards monosubstituted epoxides. These molecules, which are quite flexible and less bulky molecules, can make chiral recognition difficult. However, some enzymes found in red yeast, such as Rhodotorula glutinis strain CIMW 147, showed excellent selectivity. (Weijers, C.A.G.M.; Botes, A.L.; van Dyk, M.S.; de Bont, J.A.M. Tetrahedron: Asymmetry 1998, 9, 467-473.) Most of these enzymes show selectivity for R-epoxides as their substrates.
In the case of 2,2-disubstituted substrates with larger bulk, some bacterial enzymes show good enantioselectivity, especially those from Rhodococcus (NCIMB 11216, DSM 43338 strains) and the closely related Nocardia sp. (H8, TB1, EH1 strains) . (Orru, RVA; Archelas, A.; Furstoss, R.; Faber, K. Adv. Biochem. Eng. Biotechnol. 1998, 63, 145-167.) less disordered unsubstituted oxirane carbon). Interestingly, most bacterial epoxide hydrolases were selective for S-enantiomers.
Mixed regioselectivities are common in the hydrolysis of 2,3-disubstituted substrates where ring opening occurs at both positions of the oxirane ring in varying proportions. This is probably due to the fact that both reaction centers have similar spatial effects. Interestingly, significant use can be found in two scenarios. In cases where R1 and R2 are identical, the substrates are meso compounds. Desymmetrization catalyzed by epoxide hydrolases can lead to a single enantiomeric diol product in 100% yield. In some other cases, hydrolysis has been shown to be enantiomerically convergent, yielding only one stereoisomeric diol as the sole product. This could potentially be useful for the synthesis of enantiomerically pure vicinal diols. For example, Norcardia EH1 catalyzed the enantiomeric hydrolysis of cis-2,3-epoxyheptane to 2R,3R-2,3-dihydroxyheptane in good yield and enantiomeric excess (Figure 14). (Kroutil, W.; Mischitz, M.; Plachota, P.; Faber, K. Tetrahedron Lett. 1996, 37, 8379-8382.) The 2S,3R enantiomer reacted 10 times faster than the 2R,3S enantiomer, but hydrolysis of both of the enantiomer occurred by attack at the S-centers, leading exclusively to the product 2R,3R-diol.
Only limited data are available on the enzymatic hydrolysis of trisubstituted epoxides. In several cases, bacterial and yeast EHs have shown good enantioselectivity for these bulky substrates. (Weijers, C.A.G.M. Tetrahedron: Asymmetry 1997, 8, 639-647; and Archer, IV J.; Leak, D.J.; Widdowson, D.A. Tetrahedron Lett. 1996, 37, 8819-8822). More enzymes for these substrates may become available as new EHs continue to be discovered.
Styrene oxides are considered a special group of substrates because the benzyl carbon of these substrates provides stability to the carbocationic nature of the reaction transition state. As a result, this group of substrates shows poor regioselectivity if the benzylic carbon is also a steric hindrance. In contrast, excellent enantioselectivity was observed in reactions catalyzed by red yeast enzymes such as Rhodotorula glutinis strain CIMW 147, and especially by fungal epoxide hydrolases such as the enzyme from Aspergillus niger. (Weijers, C.A.G.M. Tetrahedron: Asymmetry 1997, 8, 639-647; and Archelas, A.; Furstoss, R. Curr. Opin. Chem. Biol. 2001, 5, 112-119.) In the latter case, very good regioselectivity is also obtained for the synthesis of diols.
A review of the data available to date shows that EHs with high stereoselectivity exist for almost all types of epoxides, although there appears to be a correlation between some microbial sources and the substitution pattern of different types of epoxide substrates. For example, yeast EHs work best with monosubstituted oxiranes, while fungal EHs show the highest enantioselectivity with styrene oxide substrates. Bacterial enzymes are the catalysts of choice for 2,2- and 2,3-disubstituted epoxides. However, since only a small number of enzymes have been discovered and studied, this correlation may be due to a biased data set. Nevertheless, the high stereoselectivity and activity exhibited by microbial EH on some epoxy substrates strongly suggest that these enzymes could be the tools that chemists are looking for for the preparation of enantiomerically pure epoxides and vicinal diols.
Chiral epoxides and diols have important applications in anticancer, antiviral, antifungal, antibacterial and other drugs. In the preparation of these important compounds, epoxide hydrolases have shown great promise. As a kinetic separation method with 50% yield, epoxide hydrolase-mediated syntheses are not expected to completely replace the current chemical asymmetric epoxidation. However, industrial applications of epoxide hydrolases can be imagined in the following possibilities: replacing chemical methods as "cleaner" catalysts in some transformations; be the catalyst of choice where chemical methods are limited; preparation of certain diols in an enantioconvergent manner where yields are not limited to 50%; for use in combination with other asymmetric epoxidation methods to improve the overall ee by hydrolysis of amine epoxide enantiomers.
As used herein, the bioactivity of interest is epoxide modification catalyst activity. As used herein, the term biomolecule refers to epoxide hydrolases.
Preferably, the first step in the discovery of these enzymes involves the development of sensitive, high-throughput detection methods for epoxide modification catalysts. A combination of optimized assays and host screening can be used to demonstrate that biocatalysts can be derived from environmental gene libraries. Host strain libraries and environmental gene libraries can be constructed using the technologies described in U.S. Pat. Pat. 5,958,672, US Patent No. 5,958,672, No. 6,001,574 and US Patent No. number 5763239.
Hybrid epoxide hydrolases and peptide libraries
In one aspect, the invention provides hybrid epoxide hydrolases and fusion proteins, including peptide libraries, comprising the sequences of the invention. Peptide libraries containing sequences of the invention are used to isolate peptide inhibitors of targets (eg, receptors, enzymes) and to identify formal target binding partners (eg, ligands such as cytokines, hormones, and the like).
The field of biomolecule searching for biologically and therapeutically relevant compounds is rapidly evolving. Suitable biomolecules that have been tested include chemical libraries, nucleic acid libraries, and peptide libraries for molecules that inhibit or enhance the biological activity of the identified target molecules. With a particular focus on peptide libraries, the main objective was the isolation of peptide inhibitors of the targets and the identification of formal target binding partners. Screening combinatorial libraries of drug candidates against therapeutically relevant target cells is a rapidly growing and important field. However, one particular problem with peptide libraries is the difficulty in assessing whether a particular peptide is expressed and at what level before determining whether the peptide has a biological effect. Therefore, in order to express and subsequently screen functional peptides in cells, the peptides must be expressed in amounts sufficient to overcome catabolic mechanisms such as proteolysis and transport from the cytoplasm to endosomes.
In one embodiment, the fusion proteins of the invention (eg, the peptide portion) are conformationally stabilized (relative to linear peptides) to enable higher binding affinity to their cellular targets. The present invention provides fusions of the epoxide hydrolases of the invention and other peptides, including known and random peptides, which are fused such that the structure of the epoxide hydrolases is not significantly disrupted and the peptide is metabolically or structurally conformationally stabilized. This enables the creation of a peptide library that can be easily monitored, both in terms of presence in cells and quantity.
Amino acid sequence variants of the invention may be characterized by the predetermined nature of the variation, a feature that distinguishes them from naturally occurring allelic or interspecies epoxide hydrolase amino acid sequence variation. In one aspect, variants of the invention exhibit the same qualitative biological activity as the natural analog, although variants with modified properties may also be selected. Although the site or region for introducing the amino acid sequence change is predetermined, the mutation per se need not be predetermined. For example, to optimize mutation efficiency at a given site, random mutagenesis can be performed on a target codon or region and the expressed epoxide hydrolase variants screened for the optimal combination of desired activity. Techniques for generating substitution mutations at specific sites in DNA of known sequence are well known, for example, M13 primer mutagenesis and PCR mutagenesis. Mutant screening is performed using proteolytic activity assays. In alternative aspects, amino acid substitutions can be single residues; insertions can be on the order of about 1 to 20 amino acids, although much larger insertions can be tolerated. Deletions can range from about 1 to about 20 residues, although in some cases deletions can be much larger. Substitutions, deletions, insertions or any combination thereof can be used to obtain a final derivative with optimal properties. Generally, these changes are made to a few amino acids to minimize changes to the molecule. However, under certain circumstances, larger changes can be tolerated.
The invention provides epoxy hydrolases in which the polypeptide backbone, secondary or tertiary structure is modified, e.g. alpha-helical or beta-sheet structure. In one embodiment, the charge or hydrophobicity is modified. In one embodiment, the majority of the side chain is modified. Significant changes in immune function or identity are made by selecting substitutions that are less conservative. For example, substitutions can be made that further affect: the structure of the polypeptide backbone in the lesion, such as an alpha-helical or beta-sheet structure; the charge or hydrophobicity of the molecule at the target site; or most of the side chain. The substitutions generally expected to result in the greatest change in the properties of the polypeptide are those in which (a) is a hydrophilic moiety, e.g. seryl or threonyl, substituted with (or by) a hydrophobic moiety, e.g. valyl or alanyl; (b) cysteine or proline is replaced (or replaced) by any other residue; (c) a residue having a positively charged side chain, e.g. lysyl, arginyl or histidyl, is substituted (or by) an electronegative residue, e.g. glutamyl or aspartyl; or (d) a residue having an extensive side chain, e.g. phenylalanine, is substituted with (or with) one that does not have a side chain, e.g. glycine. Variants can have the same qualitative biological activity (ie, proteolytic activity), although variants can be selected to modify the properties of the epoxide hydrolase if desired.
In one embodiment, epoxide hydrolases of the invention include purification epitopes or tags, signal sequences or other fusion sequences, etc. In one embodiment, the epoxide hydrolases of the invention can be fused to a random peptide to form a fusion polypeptide. By "fusion" or "operably linked" herein is meant that the random peptide and the epoxide hydrolase are linked together in such a way as to minimize disruption of the structural stability of the epoxide hydrolase (ie, maintain a Tm of at least 42° C. The fusion polypeptide (or polynucleotide, fusion a polypeptide encoding a fusion polypeptide) may also include further components, including multiple peptides in multiple loops.
In one embodiment, the peptides and nucleic acids encoding them are randomized, or completely randomized, or are biased in their randomization, e.g. in overall nucleotide/residue frequency or in position. "Random" means that each nucleic acid and peptide is composed of random nucleotides or amino acids. In one embodiment, the nucleic acids from which the peptides are formed can be chemically synthesized and therefore can contain any nucleotide at any position. Therefore, when nucleic acids are expressed in peptide form, any amino acid residue can be incorporated at any position. The synthesis process can be designed to generate randomized nucleic acids to allow the generation of all or most of the possible combinations along the entire length of the nucleic acid, thereby creating a library of randomized nucleic acids. A library can provide a sufficiently structurally diverse population of random expression products to affect a likely sufficient range of cellular responses to ensure that one or more cells exhibit the desired response. Accordingly, the invention provides a library of interactions large enough that at least one of its members has a structure that confers affinity to a molecule, protein, or other agent whose activity is required to complete the signaling pathway.
In one embodiment, the peptide library of the invention is completely random with no sequence preferences or constants at any position. In another aspect, the library is biased, that is, some positions in the array are either held constant or chosen from a limited number of possibilities. For example, in one embodiment, nucleotides or amino acid residues are randomly distributed within a particular class, e.g., hydrophobic amino acids, hydrophilic residues, sterically displaced residues (small or large), creating cysteine, cross-linking, prolines for SH-3 domains, serine , threonine, tyrosine or histidine for phosphorylation sites, etc., or purines, etc. For example, individual residues can be fixed in a random insert peptide sequence to create a structural deviation. In an alternative aspect, random libraries can be directed to a particular secondary structure by including an appropriate number of residues (other than glycine linkers) that prefer a particular secondary structure.
In one aspect, the trend is for peptides that interact with known classes of molecules. For example, most intracellular signaling is known to be mediated by short regions of polypeptides interacting with other polypeptides via small peptide domains. For example, a short region from the cytoplasmic domain of the HIV-1 envelope has previously been shown to block the action of cellular calmodulin. The regions of the Fas cytoplasmic domain that share identity with the wasp mastoparan toxin may be limited to a short peptide region with apoptotic, death-inducing function or a G protein. Therefore, multiple molecules or protein domains are suitable as starting points for generating biased randomized peptides. A large number of small molecule domains are known to share a common function, structure, or affinity. In addition, regions with weak amino acid homology may have strong structural homology. Examples of molecules, domains and/or corresponding consensus sequences used in the invention (e.g., incorporated into the fusion proteins of the invention) include SH-2 domains, SH-3 domains, Pleckstrin, death domains, epoxide hydrolase cleavage/recognition sites, substrate enzyme inhibitors , enzymes, Traf, etc. Similarly, there are many known nucleic acid binding proteins containing domains suitable for use in the invention, e.g. leucine zipper consensus sequences.
The invention provides a variety of expression vectors comprising the nucleic acids of the invention, including those encoding a fusion protein. Expression vectors can be self-replicating helper chromosomal vectors or vectors that integrate into the host genome. In general, these expression vectors include a nucleic acid that regulates transcription and translation operably linked to a nucleic acid encoding a fusion protein. The term "control sequence" refers to DNA sequences necessary for the expression of an operably linked coding sequence in a particular host organism. Control sequences suitable for prokaryotes include, for example, a promoter, optionally an operator sequence and a ribosome binding site.
Transcriptional and translational regulatory sequences used in the expression cassettes and vectors of the invention include, but are not limited to, promoter sequences, ribosome binding sites, transcription start and stop sequences, translation start and stop sequences, and enhancer or activator sequences. In one embodiment, the regulatory sequences include a promoter and transcription start and stop sequences. Promoter sequences encode constitutive or inducible promoters. Promoters can be either natural promoters or hybrid promoters. Hybrid promoters that combine elements from more than one promoter are also known in the art and are useful in this invention. In one embodiment, the promoters are strong promoters that allow high expression in cells, particularly mammalian cells, such as a CMV promoter, particularly in combination with a Tet regulatory element.
Additionally, the expression vector may contain additional elements. In one example, an expression vector may have two replication systems, allowing it to be maintained in two organisms, for example, a mammalian or insect cell for expression and a prokaryotic host for cloning and amplification. Additionally, for integration of expression vectors, the expression vector contains at least one sequence homologous to the genome of the host cell, and preferably two homologous sequences flanking the expression construct. An integrating vector can be directed to a specific locus in the host cell by selecting the appropriate homologous sequence for incorporation into the vector. Vector integration constructs are well known in the art.
In one embodiment, the nucleic acids or vectors of the invention are introduced into the screening cells, and thus the nucleic acids enter the cells in a manner suitable for subsequent expression of the nucleic acid. The method of introduction is largely determined by the type of target cell. Examples of methods include CaPO4 deposition, liposome fusion, lipofection (eg LIPOFECTIN™), electroporation, viral infection, etc. Candidate nucleic acids may be stably integrated into the host cell genome (eg, after retrovirus introduction) or may exist transiently or stably in the cytoplasm (ie, using traditional plasmids, using standard regulatory sequences, selectable markers, etc.). Since many pharmaceutically important screens require human target cells or mammalian model cells, retroviral vectors that can transfect such targets are desirable.
The expression vectors of the invention may also contain a selectable marker gene that allows the selection of bacterial strains that have been transformed, e.g. genes that confer resistance to drugs such as ampicillin, chloramphenicol, erythromycin, kanamycin, neomycin and tetracycline. Selectable markers may also include biosynthetic genes, such as those in the histidine, tryptophan, and leucine biosynthetic pathways.
Screening methodologies and on-line monitoring devices.
In the practice of the methods of the invention, various devices and methodologies can be used in conjunction with the polypeptides and nucleic acids of the invention, e.g., enhancing or inhibiting enzyme activity), for antibodies that bind to a polypeptide of the invention, for nucleic acids that hybridize to a nucleic acid of the invention, and the like .
Solid supports of immobilized enzyme
Epoxide hydrolase enzymes, fragments thereof, and nucleic acids encoding the enzymes and fragments can be attached to a solid support. It is often economical and efficient to use epoxy hydrolases in industrial processes. For example, a consortium or cocktail of epoxide hydrolase enzymes (or their active fragments) used in a particular chemical reaction can be attached to a solid support and immersed in a process vessel. An enzymatic reaction may occur. The solid support can then be removed from the vat, along with the enzymes attached to it, for reuse. In one embodiment of the invention, the isolated nucleic acid of the invention is attached to a solid support. In another embodiment of the invention, the solid substrate is selected from the group consisting of gel, resin, polymer, ceramic, glass, microelectrode, and any combination thereof.
For example, solid carriers useful in the present invention include gels. Some examples of gels include sepharose, gelatin, glutaraldehyde, chitosan-treated glutaraldehyde, albumin-glutaraldehyde, chitosan-xanthan, toyopearl gel (polymer gel), alginate, alginate-polylysine, carrageenan, agarose, glyoxyl agarose, magnetic agarose, dextran-agarose, poly (carbamoyl sulfonate) hydrogel, BSA-PEG hydrogel, phosphorylated polyvinyl alcohol (PVA), monoaminoethyl-N-aminoethyl (MANA), amino, or any combination thereof.
Other solid supports useful in the present invention are resins or polymers. Some examples of resins or polymers include cellulose, acrylamide, nylon, rayon, polyester, anion exchange resin, AMBERLITE™ XAD-7, AMBERLITE™ XAD-8, AMBERLITE™ IRA-94, AMBERLITE™ IRC-50, polyvinyl, polyacrylic, polymethacrylate or any combination thereof. another type of solid substrate useful in the present invention is a ceramic material. Some examples include non-porous ceramics, porous ceramics, SiO2, Al2O3. Another type of solid substrate useful in the present invention is glass. Some examples include nonporous glass, porous glass, aminopropyl glass, or any combination thereof. Another type of solid support that can be used is a microelectrode. An example is magnetite coated with polyethyleneimine. Graphite particles can be used as a solid support. Another example of a solid support is a cell such as a red blood cell.
Methods of immobilization
There are many methods that would be known to those skilled in the art for immobilizing enzymes or fragments thereof, or nucleic acids, on a solid support. Some examples of such methods include, for example, electrostatic droplet generation, electrochemical means, by adsorption, covalent bonding, cross-linking, chemical reaction or process, encapsulation, entrapment, calcium alginate or poly(2). -hydroxyethyl methacrylate). Similar methods are described in Methods in Enzymology, Immobilized Enzymes and Cells, Part C. 1987. Academic Press. Edited by SP Colowick and N, O. Kaplan. volume 136; and Immobilization of enzymes and cells. 1997. Human Press. Edited by G. F. Bickerstaff. Series: Methods in Biotechnology, edited by JM Walker.
Capillary matrices
Hollow arrays such as GIGAMATRIX™, Diversa Corporation, San Diego, CA can be used in the methods of the invention. Nucleic acids or polypeptides of the invention may be immobilized or applied to an array, including capillary arrays. Arrays can be used to screen or monitor libraries of compositions (eg, small molecules, antibodies, nucleic acids, etc.) for their ability to bind or modulate the activity of a nucleic acid or polypeptide of the invention. Capillary arrays provide a different system for sample storage and screening. For example, a sample screening apparatus may include a plurality of capillaries formed into a plurality of contiguous capillaries, each capillary comprising at least one wall defining a sample retention lumen. The device may further include an interstitial material located between adjacent capillaries in the system and one or more reference indicators formed in the interstitial material. A sample screening capillary, wherein the capillary is adapted for placement in a capillary array, may include a first wall defining a sample retention lumen and a second wall formed of filter material for filtering the excitation energy supplied to the lumen to excite the sample.
A polypeptide or nucleic acid, eg, a ligand, can be introduced into the first component in at least a portion of the capillary of the capillary system. Each capillary of the capillary array may include at least one wall defining a lumen for holding the first element. An air bubble can be introduced into the capillary downstream of the first component. The second component may be introduced into the capillary, wherein the second component is separated from the first component by air bubbles. A sample of interest may be introduced as a first liquid labeled with a detectable particle into a capillary of a capillary system, wherein each capillary of the capillary system comprises at least one wall defining a lumen for containing the first liquid and the detectable particle, and wherein at least one wall is covered with a binding material for binding the detectable particles on at least one wall. The method may further include removing the first liquid from the capillary tube, wherein the bound detectable particle is retained in the capillary, and introducing the second liquid into the capillary tube.
A collection of capillaries may include a plurality of individual capillaries that contain at least one outer wall that defines a lumen. The outer wall of the capillary can be one or more interconnected walls. Similarly, the wall may define a lumen that is cylindrical, square, hexagonal, or any other geometric shape, as long as the walls create a lumen for containing the fluid or sample. The capillaries of a capillary system can be held together in close proximity to form a planar structure. Capillaries can be joined by splicing (eg if the capillaries are made of glass), gluing, gluing or clamping together. A capillary pool can be formed from any number of individual capillaries, for example in the range of 100 to 4,000,000 capillaries. A collection of capillaries can form a microtiter plate containing about 100,000 or more individual capillaries linked together.
Arrays or "biochips"
Nucleic acids or polypeptides of the invention can be immobilized or applied to an array. The arrays can be used to screen or monitor libraries of compositions (eg, small molecules, antibodies, nucleic acids, etc.) for their ability to bind or modulate the activity of a nucleic acid or polypeptide of the invention. For example, in one aspect of the invention, the monitored parameter is the expression of an epoxide hydrolase gene transcript. One or more or all of the cellular transcripts can be measured by hybridizing a sample containing the cellular transcripts or nucleic acids representative or complementary to the cellular transcripts, by hybridizing with immobilized nucleic acids on an array or "biochip". Using an "array" of nucleic acids on a microarray, some or all of a cell's transcripts can be quantified simultaneously. Alternatively, the arrays containing the genomic nucleic acid can also be used to genotype a newly produced strain produced by the methods of the invention. "Polypeptide arrays" can also be used to quantify multiple proteins simultaneously. The present invention can be practiced with any known "array", also referred to as a "microarray" or "nucleic acid array" or "polypeptide array" or "antibody array". or "biochip" or a variation thereof. Arrays are generally multiple "spots" or "targets", each target containing a predetermined amount of one or more biological molecules, eg, mRNA transcripts.
Any known array and/or method of making and using arrays, or variations thereof, may be incorporated into the practice of the methods of the invention, in whole or in part, as described, for example, in U.S. Pat. Pat. no. 6277628; 6,277,489; 6,261,776; 6,258,606; 6,054,270; 6,048,695; 6,045,996; 6,022,963; 6,013,440; 5,965,452; 5,959,098; 5856174; 5,830,645; 5,770,456; 5,632,957; 5556752; 5143854; 5,807,522; -5,800,992; 5,744,305; 5,700,637; 5556752; 5,434,049; see also e.g. WO 99/51773; WO 99/09217; WO 97/46313; WO 96/17958; see also e.g. Johnston (1998) Curr. Biol. 8:R171-R174; Schummer (1997) Biotechniques 23:1087-1092; Kern (1997) Biotechniques 23:120-124; Solinas-Toldo (1997) Genes, Chromosomes and Cancer 20:399-407; Bowtell (1999) Nature Genetics Supp. 21:25-32. See also published U.S. patent application no. 20010018642; 20010019827; 20010016322; 20010014449; 20010014448; 20010012537; 20010008765.
Antibodies and antibody-based screening methods
The invention provides isolated or recombinant antibodies that specifically bind to the epoxide hydrolase of the invention. These antibodies can be used to isolate, identify or quantify fluorescent polypeptides of the invention or related polypeptides. These antibodies can be used to isolate other polypeptides within the scope of the invention or other related epoxide hydrolases.
Antibodies can be used in immunoprecipitation, staining, immunoaffinity columns, and the like. If desired, nucleic acid sequences encoding specific antigens can be produced by immunization followed by isolation of the polypeptide or nucleic acid, amplification or cloning and immobilization of the polypeptide on the matrix of the invention. Alternatively, the methods of the invention can be used to modify the structure of an antibody produced by an engineered cell, eg the affinity of the antibody can be increased or decreased. Additionally, the ability to produce or modify antibodies may be a phenotype introduced into a cell by the methods of the invention.
Methods of immunization, production and isolation of antibodies (polyclonal and monoclonal) are known to experts and described in scientific and patent literature, see, e.g. Coligan, CURRENT PROTOCOLS IN IMMUNOLOGY, Wiley/Greene, NY (1991); Stites (ed.) BASIC AND CLINICAL IMMUNOLOGY (7th ed.) Lange Medical Publications, Los Altos, CA ("Stites"); Goding, MONOCLONAL ANTIBIES: PRINCIPLES AND PRACTICE (2nd ed.) Academic Press, New York, N.Y. (1986); Kohler (1975) Nature 256:495; Harlow (1988) ANTIBORIES, A LABORATORY MANUAL, Cold Spring Harbor Publications, New York. Antibodies can also be produced in vitro, e.g. using recombinant phage libraries expressing the antibody binding site, in addition to traditional in vivo animal methods. See, eg, Hoogenboom (1997) Trends Biotechnol. 15:62-70; Katz (1997) Annu. Rev. Biophys. biomol. Structure. 26:27-45.
The polypeptides or peptides can be used to generate antibodies that specifically bind to the polypeptides of the invention. The obtained antibodies can be used in immunoaffinity chromatography procedures to isolate or purify polypeptides or to determine whether a polypeptide is present in a biological sample. In such methods, a protein preparation, such as an extract or biological sample, is brought into contact with an antibody that can specifically bind to one of the polypeptides of the invention.
In immunoaffinity assays, the antibody is attached to a solid support such as a bead or other matrix column. The protein preparation is placed in contact with the antibody under conditions in which the antibody specifically binds to one of the polypeptides of the invention. After washing to remove non-specifically bound proteins, specifically bound polypeptides are eluted.
The ability of proteins in a biological sample to bind to an antibody can be determined using any of a number of methods known to those skilled in the art. For example, binding can be determined by labeling the antibody with a detectable label, such as a fluorescent agent, an enzyme label, or a radioisotope. Alternatively, binding of the antibody to the sample can be detected using a secondary antibody that has such a detectable label on it. Specific tests include ELISA, sandwich tests, radioimmunoassays and Western Blots.
Polyclonal antibodies raised against a polypeptide of the invention can be obtained by directly injecting the polypeptide into an animal or by administering the polypeptide to a non-human animal. The antibody thus obtained will then bind the polypeptide itself. In this way, even a sequence that encodes only a fragment of a polypeptide can be used to generate antibodies that can bind to the entire native polypeptide. Such antibodies can then be used to isolate the polypeptide from cells expressing the polypeptide.
Any technique that provides antibodies produced by continuous culture of cell lines can be used to prepare monoclonal antibodies. Examples include the hybridoma technique, the trioma technique, the human B cell hybridoma technique, and the EBV hybridoma technique (see, e.g., Cole (1985) in Monoclonal Antibodies and Cancer Therapy, Alan R. Liss, Inc., pp. 77-96).
The described techniques for producing single chain antibodies (see, eg, US Patent No. 4,946,778) can be adapted to produce single chain antibodies against polypeptides of the invention. Alternatively, transgenic mice can be used to express humanized antibodies to these polypeptides or fragments thereof.
Antibodies raised against polypeptides of the invention can be used to search for similar polypeptides from other organisms and samples. In such techniques, polypeptides from the organism are brought into contact with the antibody and those polypeptides that specifically bind the antibody are detected. Any of the methods described above can be used to detect antibody binding.
Sets
The invention provides kits containing compositions, eg nucleic acids, expression cassettes, vectors, cells, polypeptides (eg epoxide hydrolases) and/or antibodies of the invention. The kits may also include material to teach the methodology and industrial applications of the invention as described herein.
Measurement of metabolic parameters
The methods of the invention provide for whole cell evolution or whole cell engineering to develop a new cell strain having a new phenotype by modifying the genetic makeup of the cell, wherein the genetic makeup is modified by adding nucleic acid to the cell. To detect a new phenotype, at least one metabolic parameter of the modified cell is monitored in "real time" or "on-line" in the cell. In one embodiment, a plurality of cells, such as a cell culture, is monitored in "real time" or "on-line". In one embodiment, multiple metabolic parameters are monitored in "real time" or "on-line". Metabolic parameters can be monitored using the fluorescent polypeptides of the invention.
Metabolic flux analysis (MFA) is based on a known biochemical structure. A linearly independent metabolic matrix was constructed based on mass conservation and the pseudo-steady state hypothesis (PSSH) of intracellular metabolites. In the practice of the methods of the invention, metabolic networks are established, including:
identity of all substrates, products and intermediate metabolites of the pathway, identity of all chemical reactions between metabolites of the pathway, stoichiometry of pathway reactions, identity of all catalytic enzymes, kinetics of enzyme reactions, regulatory interactions between pathway components, e.g. allosteric interactions, enzyme-enzyme interactions, etc., intracellular compartmentalization of enzymes or any other supramolecular organization of enzymes and the presence of any concentration gradients of metabolites, enzymes or effector molecules or diffusion barriers to their movement.
Once a metabolic network has been built for a particular strain, a mathematical representation can be made using the matrix concept to estimate intracellular metabolic fluxes if on-line metabolome data are available. The metabolic phenotype is based on changes in the entire metabolic network in the cell. The metabolic phenotype is based on the change in pathway usage with respect to environmental conditions, genetic regulation, developmental status and genotype, etc. In one aspect of the method of the invention, after on-line calculation of MFA, the dynamic behavior of cells, their phenotype and other properties are analyzed by testing of trace use. For example, if the supply of glucose increases and oxygen decreases during yeast fermentation, the use of the respiratory pathway will decrease and/or stop, and the use of the fermentation pathway will dominate. Control of the physiological state of cell cultures will be possible after path analysis. The methods of the invention can help determine how to manipulate fermentation by determining how to change substrate supply, temperature, use of inducers, etc. to control the physiological state of cells to move in a desired direction. In the practice of the methods of the invention, MFA results can also be compared with transcriptome and proteome data to design experiments and protocols for metabolic engineering or gene shuffling, etc.
In practicing the methods of the invention, any modified or new phenotype can be assigned and detected, including new or improved properties in a cell. Every aspect of metabolism or growth can be monitored.
Monitoring mRNA transcript expression
In one aspect of the invention, the modified phenotype includes increasing or decreasing mRNA transcript expression or generating new transcripts in the cell. This increased or decreased expression can be monitored using a fluorescent polypeptide of the invention. mRNA transcripts or messages can also be detected and quantified by any method known in the art, including, for example, Northern blots, quantitative amplification reactions, array hybridization, and the like. Quantitative amplification reactions include e.g. quantitative PCR, including e.g. quantitative reverse transcription polymerase chain reaction or RT-PCR; real-time quantitative RT-PCR or "real-time kinetic RT-PCR" (see, e.g., Kreuzer (2001) Br. J. Haematol. 114:313-318; Xia (2001) Transplantation 72:907-914).
In one aspect of the invention, the modified phenotype is created by eliminating the expression of a homologous gene. The sequence encoding the gene or one or more transcriptional control elements can be removed, e.g. enhancer promoters. Therefore, the expression of the transcript can be completely removed or only reduced.
In one aspect of the invention, the modified phenotype includes increased expression of a homologous gene. This can be done by knocking out a negative control element, including a cis- or trans-acting transcriptional regulatory element, or mutagenizing a positive control element. One or more or all of the cellular transcripts can be measured by hybridizing a sample containing the cellular transcripts or nucleic acids representative or complementary to the cellular transcripts by hybridizing with immobilized nucleic acids on the array.
Monitoring the expression of polypeptides, peptides and amino acids
In one aspect of the invention, the modified phenotype involves increasing or decreasing the expression of the polypeptide or generating new polypeptides in the cell. This increased or decreased expression can be monitored using the epoxide hydrolase of the invention. Polypeptides, peptides, and amino acids can also be detected and quantified by any methods known in the art, including, for example, nuclear magnetic resonance (NMR), spectrophotometry, radiography (protein labeling), electrophoresis, capillary electrophoresis, high performance liquid chromatography (HPLC). . ), thin layer chromatography (TLC), hyperdiffusion chromatography, various immunological methods, e.g. immunoprecipitation, immunodiffusion, immunoelectrophoresis, radioimmunoassays (RIA), enzyme-linked immunosorbent assays (ELISA), immunofluorescence assays, gel electrophoresis (eg SDS-PAGE), antibody staining, fluorescence-activated cell sorter (FACS), pyrolysis mass spectrometry, Fourier transform infrared mass spectrometry , Raman spectrometry, GC-MS and LC-Electrospray mass spectrometry and cap-LC-tandem-electrospray and the like. Novel biological activities can also be assayed using the methods or variations thereof described in U.S. Pat. Pat. 6,057,103. Additionally, as discussed in detail below, one or more or all polypeptides in a cell can be measured using a protein array.
Test development
Several test methods can be used to obtain EH. These test methods include growth-based tests, direct activity-based tests, and sequence-based tests. Conveniently, all three test methods can be used complementarily to successfully obtain a series of EH with the desired properties.
Height based tests.
The most direct and efficient growth-based selection method for identifying enzymes capable of catalyzing epoxide modifications. EHs can be detected if they convert an epoxy substrate into a diol that can be used by the host bacteria as a carbon source. When library cells are grown in minimal medium supplemented with this epoxide as the sole carbon source, only those clones carrying active epoxide hydrolases will be able to produce the appropriate diol and use it as a carbon source for growth and proliferation. Over time, these clones will dominate the microbial population and can therefore be easily isolated.
Two epoxides, glycidol and propylene oxide (Figure 15), will initially be used as selection substrates because the corresponding vicinal diols, glycerol, propanediol, are known to support the growth of E. coli or its mutants as sole carbon sources. (Maloy, S.R.; Nunn, W.D.J. Bacteriol. 1982, 149, 173-180; and Hacking, A.J.; Lin, E.C.C.J. Bacteriol. 1976, 126, 1166-1172). It will be used as a racemic mixture and as pure enantiomers. It should be noted that both of these epoxides are important chiral synthons in the chemical and pharmaceutical industries.
Appropriate hosts should be used for selection experiments. For example, selection for propylene oxide requires a fucA-disrupted E. coli mutant that can use propanediol as a carbon source. (Hacking, A.J.; Lin, E.C.C.J. Bacteriol. 1976, 126, 1166-1172.) These hosts can be generated by site-directed mutation of specific genes or by random transposon (Tn) mutagenesis. The latter strategy is more attractive because it is more practical and extremely effective. Tn is introduced into E. coli hosts by electroporation where in vivo transposition leads to random insertion of Tn into genomic DNA. This provides an E. coli insertion library suitable for searching for desired mutants, such as those that can use propanediol as a carbon source. Several insertion libraries from different E. coli hosts will be used to screen mutants using propane diol. Specifically, library cells will be plated on agar plates containing minimal medium with propanediol as the sole carbon source. After incubation, clones using propanediol can be identified because only these will grow and form colonies on the plates.
Using these two simple epoxies as discovery substrates can be expected to yield a range of EHs with different specificities; for example, different EHs with optimal specificity against more complex epoxides can still be detected if they have poor activity against glycidol or propylene oxide. Ultimately, the generality of this detection technique will depend on the sensitivity of the selection. Additional selection epoxy substrates can also be identified if E. coli mutants that can grow on other vicinal moieties are discovered from Tn insertion libraries. Screening of these diol-utilizing mutants will be performed using the protocols described above.
Epoxides are known to be toxic to microbes due to alkylation of proteins and nucleic acids. The effect of different concentrations of glycidol on the growth of the E. coli host was evaluated. The results showed that E. coli can tolerate up to 0.05% glycidol (v/v). This concentration may be high enough for selection because cells could grow with 0.025% extracellularly supplied glycerol in the medium as the sole carbon source. However, if necessary, mutants of E. coli that show greater tolerance to glycidol can be detected by screening libraries of mutated hosts, including the Tn insertion libraries mentioned above.
A positive control clone having epoxide hydrolase activity was also developed. Having such control is beneficial because it can be used to guide and evaluate test development for both selection and screening. The epoxide hydrolase of A. radiobacter whose nucleotide sequence is given can be easily cloned and expressed in E. coli. (Arand, M.; Oesch, F. Biochem. J. 1999, 344, 273-280.) Primers for amplification and cloning of this gene were designed and synthesized. Additionally, as described below, an active epoxide hydrolase has been identified that can be used as a positive control.
Sequence-based tests
A complementary approach to activity-based detection of epoxide hydrolases is sequence-based detection of epoxide hydrolases followed by assessment of their substrate specificity in secondary assays. The use of sequence-based methods is a valuable strategy for discovering individual classes of enzymes. Significant amounts of sequence and structure information are available in EH, enabling the development of sequencing-based discovery.
Since this method is not activity-based, it complements other activity-based methods. In addition, it can be very high throughput. Both prokaryotic and eukaryotic EHs belong to the a,b-hydrolase superfamily and share low but significant sequence homology. (Nardini, M.; Ridder, I.S.; Rozeboon, H.J.; Kalk, K.H.; Rink, R.; Janssen, D.B.; Dijkstra, B.W.J. Biol. Chem. 1999, 274, 14579-14596; Argiridadi et al., Proc. Natl. (Acad. Sci. USA 1999, 96, 10637-10642 and Zou, J. et al., Structure 2000, 8, 111-122.) Bacterial EHs, however, have greater sequence similarity Alignment of bacterial EH nucleotide sequences will allow identification of conserved sequences. Thresholds will be designed based on these regions. These primers will be used to generate PCR products from DNA libraries. The products will be gel separated, purified and sequenced. Positive full-length hit sequences can be recovered by Southern blotting. The activity of these hits will then be tested using fluorogenic or chromogenic assays. One limitation of the sequence-based approach compared to activity-based methods is that it is limited to the discovery of genes that share homology with existing genes. However, as new EH genes are discovered and the database created of EH sequence data, the sequence-based approach becomes increasingly powerful as more sequences can be used in probe design.
Bioinformatic analysis of the DNA database yielded a total of 6 putative epoxide hydrolase genes as well as 3 partial open reading frames (ORFs) that share homology with A. radiobacter and other epoxide hydrolases. Based on the conserved nucleotide sequences extracted from these ORFs, degenerate primers were designed and used to screen a gene library known to contain one of these genes. This investigation resulted in the discovery of a known gene, as expected. Another PCR product (~200 bp) was also obtained and, when sequenced, the partial ORF showed strong sequence homology to other known EHs. This unexpected result therefore indicates that the sequence-based strategy can detect new EHs.
Fluorescence-based assays
Fluorogenic and chromogenic assays have been used with great effect in high-throughput screening for enzyme characterization and discovery. Fluorogenic assays are commonly used for many hydrolytic enzymes where the substrates release a fluorescent signal during the hydrolysis reaction. These assays are activity-based, similar to the selection method, but can be applied to more different substrates than selection experiments. A limitation, however, is that they have a lower throughput than selection tests.
A periodate-conjugated fluorogenic assay for EH described in the literature was modified and developed into a high-throughput screening method. (Badalassi, F.; Wahler, D.; Klein, G.; Crotti, P.; Reymond, J.-L. Angew. Chem. Int. Ed. 2000, 39, 4067-4070.) As shown in Sl. 16, the epoxy substrate (13) used in this assay contains a sequestered fluorophore that can generate strong fluorescence when released. After EH-catalyzed hydrolysis of compound 13 , periodate is added to oxidize the vicinal diol ( 14 ) to produce the carbonyl-containing intermediate ( 15 ). Under alkaline conditions, 15 can undergo a β-elimination reaction catalyzed by bovine serum albumin (BSA) to release a fluorescent product (16) such as umbelliferone.
The test is performed in a format of 1536 wells. Clones from the gene libraries are distributed into individual wells, preferably 5 clones per well for the first screen. These clones were allowed to grow for 24-48 hours before adding substrate 19. After 2 hours of incubation, sodium periodate and BSA were added to accelerate the b-elimination reaction. The level of fluorescence in each well is measured to identify initial hits. These hits can be reconfirmed by running a second round of testing. Robotic systems have been developed to automate all liquid handling and fluorescence measurement processes.
The first substrate for the development of this assay, 19, was synthesized according to FIG. 17. Coupling of umbelliferone (17) with 4-bromo-1-butene in the presence of potassium carbonate at 50°C gave olefin 18 which was epoxidized with meta-chloroperbenzoic acid (mCPBA). The obtained epoxide 19 was used to detect the activity of epoxide hydrolase 6 of the clones mentioned above. These clones contain putative epoxide hydrolase genes. One of them was found to be active for 19 years. This showed that the test was useful.
Colorimetric test
A colorimetric assay can be extremely useful in high-throughput screening if a sensitive color change is involved and the assay can be performed in a solid agar format. Sieving on solid agar provides extremely high throughput, and the color change enables easy identification of hits. A colorimetric assay using 4-(p-nitrobenzyl)-pyridine (20) can be used to detect epoxy substrates. In a liquid-based assay (see Scheme 12), epoxides react with 20 to form an adduct (21) that can tautomerize to the highly conjugated compound 22. 22 exhibits a blue color (Imax=560 nm). Hydrolyzed epoxides (eg, diols) do not react with 20, so the observation of a decrease in absorbance at 560 nm indicates hydrolysis of the epoxide. In the sediment test, colonies grown on agar plates were transferred to filter paper pre-incubated with epoxies. Epoxide hydrolase activity was detected by the formation of colorless halos on blue filter paper. This test can be converted to an HTP screen. The disadvantage of this test is that it detects the disappearance of the substrate instead of the appearance of the product. The advantage, however, is that it directly targets substrates and not their derivatives. Therefore, even if its relatively low sensitivity proves to be a problem in HTP screening, it can be used for secondary screening of primary hits detected by other detection methods.
The colorimetric assay was tested on the epoxide hydrolase-positive clone mentioned above using three epoxides: styrene oxide, epichlorohydrin, and glycidol. All three epoxides were found to be substrates, and epichlorohydrin showed the highest activity.
These search methods can be used to discover a wide range of new epoxide hydrolases, thus creating a pool of synthetically useful biocatalysts. Optionally, if necessary, the evolutionary technologies discussed below can be used to optimize enzyme properties.
In a more preferred embodiment, the developed assays will be used to screen environmental gene libraries for the presence of microbial enzymes with the required activity and substrate specificity. Positive hits from these screens can then be sequenced and the genes subcloned into expression vectors. The expressed recombinant enzymes can then be characterized for activity and substrate selectivity. If the identified enzymes require improvement of one or more of their properties (e.g. optimal pH and temperature, thermostability, thermotolerance, substrate specificity, etc.), they can be optimized using GSSM™ (Gene Site Saturation Mutagenesis), Gene Reassembly™ and other technologies discussed speaks below. These epoxide hydrolases can be used in the chemoenzymatic synthesis of certain fine chemicals and fine pharmaceutical and agrochemical precursors. The optimized enzymes developed using the method of the present invention can be used in the development of a commercially viable route for the synthesis of one or more target compounds. In particular, epoxide hydrolases can be used as key intermediates in the synthesis of fine chemicals and enantiomeric drugs of desired purity.
In one aspect, environmental gene libraries are constructed using DNA isolated from a wide variety of microenvironments around the world. Using an appropriate detection method then allows the extraction of enzymes from these libraries according to function, enzyme class, or a specific combination of the two. Unlike traditional discovery programs, the preferred discovery method ensures the capture of genes from uncultured microbes and facilitates screening in well-defined, domesticated laboratory hosts. This method of expression cloning leads to simultaneous capture of enzyme activity and corresponding genetic information.
The method of discovery includes: isolation and fractionation of nucleic acids from nature or other suitable sources; construction of ecological gene libraries; searching environmental libraries for genes to discover the desired genes encoding the desired enzymes using the methods described below; optimizing the desired genes to optimize the activity of the desired enzymes using evolutionary technologies described in U.S. Pat. Pat. 5,830,696, US Patent No. 5,830,696, No. 5,939,250 and US Patent No. 5,965,408, which are incorporated herein by reference; sequencing of optimized genes; overexpression of sequenced genes in appropriate host strains; producing a large number of suitable strains containing the optimized genes by fermentation and obtaining the desired enzymes, optionally contained in the host strains, after purification.
Newly cloned or discovered enzymes can then be further adapted using the evolutionary technologies described in U.S. Pat. Pat. 5,830,696, US Patent No. 5,830,696, No. 5,939,250 and US Patent No. 5,965,408 and the combinatorial evolution technology described below.
The screening step in one aspect of the present invention can be performed using one or more expression and sequence-based screening methods, including single cell activity screening, microtiter plate activity screening, sequence-based screening, and growth selection methods. All of these methods can be used to detect epoxide hydrolases using the assays described above.
The method for screening the activity of individual cells is derived from fluorescence-activated cell sorting (FACS) by thoroughly modifying the FACS platform for screening ecological libraries based on expression and sequence hybridization (Figure 18). In expression screening, fluorescent substrates are incorporated into clone libraries, and when the clone expresses a gene product capable of cleaving the substrate, the fluorescence quantum yield increases. Alternatively, the FACS hybridization cloning methodology allows recovery of recombinant clones based on sequence homology. This single-cell activity screening method enables a screening speed of 50,000 clones per second and a daily screening rate of up to 109 clones.
The growth selection method can be one of the most powerful enzyme discovery methods. In this method, a selected substrate acts as a source of nutrients for host cells only when those cells contain the enzyme activity of interest, allowing them to grow selectively. This method of growth selection may involve genetic manipulation of cell lines. The substrate used in this method can also be custom synthesized.
From another aspect, sequence-based discovery methods can be a powerful and complementary alternative to expression cloning. Both solution phase and FACS-based formats can be used for very high-throughput DNA hybridization-based detection techniques, such as environmental biopanning, which facilitate the screening of large and complex environmental gene libraries. In solution-based environmental biopanning, inserts from megalibraries are made single-stranded and ligated in solution to arrays of biotinylated hybridization probes known as hooks (FIGURE 19). Clones from the library containing related sequences are hybridized to hooks and captured on streptavidin-coated magnetic beads. DNA inserts enriched with eluted sequences are then either subjected to a second round of biopanning or re-cloned into lambda. In this way, an enrichment of more than 1000-fold is achieved for the sequences of interest. A FACS-based biopanning approach further facilitates the enzyme identification process by allowing biopanning without amplification of small and large insert clones.
Laboratory evolution of enzymes can be used to further improve, adapt or refine enzyme properties. These laboratory evolution technologies include gene site saturation mutagenesis (GSSMTM) and GeneReassembly™, where multiple natural genes can be combined to create a combinatorial evolution library. If desired, these technologies can be applied to epoxide hydrolases discovered by enzyme discovery to further optimize those epoxide hydrolases for properties such as thermostability, specific activity, or stereospecificity.
In one aspect, the present invention provides rapid screening of libraries from more than one organism, such as a mixed population of organisms, for example from an environmental sample or an uncultured population of organisms or a cultured population of organisms.
In one embodiment, gene libraries are generated by obtaining nucleic acids from a mixed population of organisms and cloning the nucleic acids into a suitable vector for transforming multiple clones to generate a gene library. A gene library therefore contains a gene or gene fragments present in organisms of a mixed population. The gene library may be an expression library, in which case the library may be screened for an expressed polypeptide with the desired activity. Alternatively, the gene library can be screened for sequences of interest, for example by PCR or hybridization screening. In one embodiment, nucleic acids from sample isolates comprising a mixed population of organisms are pooled and the pooled nucleic acids are used to create a gene library.
By "isolates" is meant that a particular species, genus, family, order or class of organisms is obtained or derived from a sample containing more than one organism or from a mixed population of organisms. Nucleic acids from these isolated populations can then be used to create a gene library. Isolates can be obtained by selective filtering or culturing a sample containing more than one organism or a mixed population of organisms. For example, bacterial isolates can be obtained by filtering a sample through a filter that excludes organisms based on size or by growing the sample in a medium that selectively grows or selectively inhibits certain populations of organisms.
"Enriched population" is a population of organisms in which the percentage of organisms belonging to a certain species, genus, family, order or class of organisms is increased in relation to the total population. For example, selective growth or inhibition media can increase the total number of organisms. It can be enriched with prokaryotic organisms in relation to the total number of organisms in the population. Similarly, a particular species, genus, family, order, or class of organisms can be enriched by growing a mixed population in a selective medium that inhibits or promotes the growth of a subpopulation within the mixed population.
In another embodiment, nucleic acids from multiple (eg, two or more) isolates from a mixed population of organisms are used to generate multiple gene libraries containing multiple clones, and then the gene libraries from at least two isolates are combined to produce a "combined isolate library."
After gene libraries are generated, the clones are screened for bioactivity, in this case for activity as a catalyst for modification of the epoxide or biomolecule of interest (eg, EH). Such screening techniques include, for example, contacting a clone, a population of clones, or a population of nucleic acid sequences with a substrate or substrates having a detectable molecule that provides a detectable signal upon interaction with the bioactivity or biomolecule of interest. The substrate can be an enzyme substrate, a bioactive molecule, an oligonucleotide, and the like.
In one aspect, gene libraries are generated, clones are either exposed to a chromogenic or fluorogenic substrate or substrates of interest, or hybridized to a labeled probe (eg, of interest, and positive clones are identified by a detectable signal (eg, fluorescence emission). ).
In one aspect, expression libraries generated from a mixed population of organisms are screened for an activity of interest. Specifically, expression libraries are created, clones are exposed to the substrate or substrates of interest, and the positive clone is identified and isolated. This invention does not require cell survival. Cells need to be viable only long enough to produce the molecule to be detected, and can then be viable or nonviable as long as the expressed biomolecule (eg, enzyme) remains active.
In one aspect, the invention provides an approach that combines the direct cloning of genes encoding novel or desirable bioactivities from environmental samples with a high-throughput screening system designed for the rapid discovery of novel molecules, such as enzymes. This approach is based on the construction of environmental 'expression libraries' that can represent the collective genomes of many natural microorganisms archived in cloning vectors that can be propagated in E. coli or other suitable host cells. Because cloned DNA can be initially extracted directly from environmental samples or from isolates of environmental samples, libraries are not limited to the small fraction of prokaryotes that can be grown in pure culture. Additionally, normalization of the environmental DNA present in these samples could allow for a more even representation of the DNA of all species present in the sample. Normalization techniques (described below) can dramatically increase the efficiency of finding genes of interest from smaller sample components that may be underrepresented by several orders of magnitude compared to the dominant species in the sample. Normalization can occur in any of the above forms after obtaining nucleic acids from a sample or isolate(s).
In another aspect, the invention provides a high-throughput capillary screening system that enables evaluation of a large number of clones to identify and recover cells that encode useful enzymes as well as other biomolecules (eg, ligands). In particular, the hollow fiber array techniques described herein can be used to search, identify, and recover proteins with desired biological activity or other ligands with desired binding affinities. For example, binding assays can be performed using a suitable substrate or other marker that emits a detectable signal after the desired binding event.
In addition, fluorescence-activated cell sorting can be used to screen and isolate clones that have an activity or sequence of interest. Previously, FACS machines were used in research focused on the analysis of eukaryotic and prokaryotic cell lines and cell culture processes. FACS is also used to monitor the production of foreign proteins in eukaryotes and prokaryotes to study, for example, differential gene expression and the like. These examples utilize the detection and counting capabilities of FACS. However, FACS has never before been used in a discovery process to search for and recover bioactivities in prokaryotes. Furthermore, the present invention does not require cells to survive, as previously described technologies require, since the desired nucleic acid (recombinant clones) can be obtained from living or dead cells. Cells need only be viable long enough to produce the compound being detected, and can then be viable or non-viable cells as long as the expressed biomolecule remains active. The present invention also solves the problems that would be associated with the detection and classification of E. coli expressing recombinant enzymes and the recovery of the encoding nucleic acids. Additionally, this invention encompasses in its embodiment any device capable of detecting fluorescent wavelengths associated with biological material, and such devices are defined herein as fluorescence analyzers (a FACS device is one example).
In some cases, it is desirable to identify nucleic acid sequences from a mixed population of organisms, isolates or enriched populations. In this embodiment, expression of the gene products is not necessary. Nucleic acid sequences of interest can be identified or "biomastered" by contacting a clone, device (eg gene chip), filter, or nucleic acid sample with a probe labeled with a detectable molecule. The probe will typically have a sequence that is substantially identical to the nucleic acid sequence of interest. Alternatively, the probe will be a fragment or full-length nucleic acid sequence encoding the polypeptide of interest. The probe and nucleic acids are incubated under conditions and for a time to allow hybridization of the probe and substantially complementary sequences. The stringency of hybridization will vary depending on, for example, the length and GC content of the probe. Such factors can be determined empirically (see, for example, Sambrook et al., Molecular Cloning - A Laboratory Manual, Cold Spring Harbor Laboratory, Cold Spring Harbor, N.Y., 1989 and Current Protocols in Molecular Biology, M. Ausubel et al. , ed. ., (Current Protocols, a joint venture of Greene Publishing Associates, Inc. and John Wiley & Sons, Inc., latest addition)). After hybridization, the complementary sequence can be amplified by PCR, identified by hybridization techniques (eg, exposing the probe-nucleic acid mixture to a membrane), or by chip-based nucleic acid detection.
Prior to this invention, evaluation of complex gene libraries or environmental expression libraries was rate limiting. The subject invention enables rapid screening of complex ecological libraries containing, for example, genomic sequences from thousands of different organisms or their subsets and isolates. The advantages of this invention can be seen, for example, when examining a complex environmental sample. Previously, screening a complex sample required laborious methods to screen several million clones to cover the genome's biodiversity. The invention provides an extremely efficient screening method that allows the evaluation of this huge number of clones. The disclosed method is capable of screening from about 30 million to about 200 million clones per hour for a desired nucleic acid sequence, biological activity, or biomolecule of interest. This allows environmental libraries to be thoroughly screened for clones expressing novel bioactivities or biomolecules.
Once a sequence or biological activity of interest (eg, an enzyme of interest) is identified, the sequence or polynucleotide encoding the biological activity of interest can be engineered, mutated, or engineered to modify the amino acid sequence to, for example, produce modified activities such as increased thermostability, specificity or activity.
The invention provides methods for identifying a nucleic acid sequence encoding a polypeptide of known or unknown function. For example, the great diversity of microbial genomes is a consequence of the rearrangement of gene groups in the microbial genome. These groups of genes can be found in different species or be phylogenetically related to other organisms.
For example, bacteria and many eukaryotes have a coordinated mechanism for regulating genes whose products are involved in related processes. Genes are grouped in structures called "gene clusters" on a single chromosome and are transcribed under the control of a single regulatory sequence, including a single promoter that initiates transcription of the entire cluster. A group of genes, a promoter, and additional sequences that function together in regulation is called an "operon" and can contain up to 20 or more genes, usually 2 to 6 genes. Thus, a gene cluster is a group of adjacent genes that are either identical or related, usually in function.
Some gene families consist of identical members. Clustering is a prerequisite for maintaining identity between genes, although clustered genes are not necessarily identical. Gene clusters range from extremes where duplication occurs, to adjacent linked genes, to cases where hundreds of identical genes lie in tandem. Sometimes no meaning can be seen in the repetition of a particular gene. A prime example of this is the expression of duplicated insulin genes in some species, while in other mammalian species a single insulin gene is sufficient.
In addition, gene clusters are constantly being reorganized, so the ability to generate heterogeneous libraries of gene clusters, for example from bacteria or other prokaryotic sources, is valuable in identifying sources of new proteins, especially involving enzymes such as, for example, polyketide synthases responsible for the synthesis polyketides with a wide range of useful activities. For example, polyketides are molecules that are an extremely rich source of bioactivities, including antibiotics (such as tetracyclines and erythromycin), anticancer agents (daunomycin), immunosuppressants (FK506 and rapamycin), and veterinary products (monensin). Many polyketides (produced by polyketide synthases) are valuable as therapeutic agents. Polyketide synthases are multifunctional enzymes that catalyze the biosynthesis of a large number of carbon chains that vary in length and patterns of functionality and cyclization. Polyketide synthase genes are divided into gene clusters, and at least one type (designated as type 1) of polyketide synthase has genes and enzymes of large size, complicating genetic manipulation and in vitro testing of these genes/proteins. Other types of proteins that are the product(s) of gene clusters have also been considered, including, for example, antibiotics, antiviral agents, anticancer agents, and regulatory proteins such as insulin.
The ability to select and combine desired components from a library of polyketides and post-polyketide biosynthetic genes to generate new polyketides for research is attractive. The methods of the present invention enable and facilitate the cloning of new polyketide synthases and other gene clusters, since gene banks can be generated with clones containing large inserts (especially using f-factor vectors), which facilitate the cloning of gene clusters.
For example, a gene cluster can be linked to a vector that contains the expression of regulatory sequences that can control and regulate the production of a detectable protein or the activity of a matrix associated with a protein from the linked gene cluster. The use of vectors that have an extremely high capacity for the introduction of exogenous nucleic acid is particularly suitable for use with such gene clusters and is described herein as an example for inclusion of the E. coli f-factor (or fertility factor). This E. coli f-factor is a plasmid that affects high-frequency transfer during conjugation and is ideal for obtaining and stably amplifying large nucleic acid fragments such as gene clusters from mixed microbial samples.
Nucleic acid isolated or derived from these samples (eg, a mixed population of microorganisms) or isolates thereof can be inserted into a vector or plasmid prior to polynucleotide screening. Such vectors or plasmids are typically those that contain expression control sequences, including promoters, enhancers, and the like.
Accordingly, the invention provides new systems for cloning and screening mixed populations of organisms, enriched samples or isolates thereof for polynucleotides encoding molecules of interest, enzyme activity and biological activity of interest in vitro. The method(s) of the invention enable the cloning and discovery of new bioactive molecules in vitro, and especially new bioactive molecules derived from uncultivated or cultured samples. Large gene clusters, genes and gene fragments can be cloned, sequenced and screened using the method(s) of the invention. Unlike previous strategies, the method(s) of the invention enable the cloning and identification of polynucleotides and polypeptides encoded by these polynucleotides in vitro from a wide range of environmental samples.
The invention enables the search and identification of polynucleotide sequences from complex environmental samples, their enriched samples or their isolates. Gene libraries can be generated from cell-free samples, as long as the sample contains nucleic acid sequences, or from samples containing cells, cellular material, or viral particles. Organisms from which libraries can be prepared include prokaryotic microorganisms such as Eubacteria and Archaebacteria, lower eukaryotic microorganisms such as fungi, algae and protozoa as well as mixed plant populations, plant spores and pollen. Organisms can be cultured or uncultured organisms obtained from environmental samples and include extremophiles such as thermophiles, hyperthermophiles, psychrophiles and psychrotrophs.
Sources of nucleic acids used to create a DNA library can be obtained from environmental samples, such as, but not limited to, microbial samples obtained from Arctic and Antarctic ice, water sources or permafrost, volcanic material, soil or plant material. sources in tropical areas, excreta of various organisms, including mammals and invertebrates, and dead and decaying matter and the like. Nucleic acids used to create gene libraries can be obtained, for example, from enriched subpopulations or sample isolates. In another embodiment, DNA from multiple isolates can be combined to create a source of nucleic acids for library production. Alternatively, nucleic acids can be obtained from multiple isolates, multiple gene libraries generated from multiple isolates to produce multiple gene libraries. Two or more gene libraries can be combined or combined to produce a combined isolate library. Thus, for example, nucleic acids can be recovered from a cultured or uncultured organism and used to generate a suitable gene library (eg, polynucleotide sequence) or screen for a bioactivity of interest (eg, enzyme or biological activity).
The following is a general procedure for creating libraries from culturable and non-cultivable organisms, enriched populations, as well as mixed populations of organisms and their isolates, which libraries can be screened, sequenced or screened for nucleic acid sequences having the desired or predicted identified biological activity (eg enzymatic activity) and which selected nucleic acid sequences can be further developed, mutagenized or derived.
As used herein, an environmental sample is any sample containing organisms or polynucleotides or a combination thereof. Therefore, an environmental sample can be obtained from any number of sources (as described above), including, for example, insect droppings, hot springs, soil, and the like. Any source of nucleic acids in purified or unpurified form can be used as starting material. Therefore, nucleic acids can be obtained from any source contaminated with an organism or from any sample containing cells. The environmental sample can be an extract of any body sample, such as blood, urine, spinal fluid, tissue, vaginal swab, stool, amniotic fluid, or mouthwash of any mammalian organism. For non-mammalian organisms (eg, invertebrates), the sample may be a tissue sample, a saliva sample, fecal material, or material in the organism's digestive tract. An environmental sample also includes samples taken from extreme environments, including, for example, hot sulfur pools, volcanic vents, and frozen tundra. A sample can come from a variety of sources. For example, in horticultural and agricultural testing, the sample may be a plant, fertilizer, soil, liquid, or other horticultural or agricultural product; in food testing, the sample may be fresh or processed food (for example, infant food, seafood, fresh produce and packaged food); and in environmental testing, the sample may be liquid, soil, sewage treatment plant, sediment, and any other sample in the environment suspected or suspected of containing the organism or polynucleotides.
When the sample is a mixture of material containing a mixed population of organisms, such as blood, soil, or mucus, it can be treated with an appropriate reagent that effectively breaks up the cells and exposes or separates the nucleic acid strands. Although not necessary, this nucleic acid lysis and denaturation step will allow cloning, amplification, or sequencing to occur more easily. Additionally, if desired, a mixed population can be cultured prior to analysis to purify or enrich a specific population or isolate of interest (eg, an isolate of a specific species, genus, or family of organisms) and thereby obtain a purer sample. However, this is not necessary. For example, culturing the organisms in the sample may involve culturing the organisms in microdroplets and dispersing the cultured microdroplets using a cell sorter into individual wells of a multi-well tissue culture plate. Alternatively, the sample can be cultured in any number of selective media of compositions designed to inhibit or promote the growth of a particular subpopulation of organisms.
If the isolates are from a sample containing a mixed population of organisms, nucleic acids can be obtained from the isolates as described below. Nucleic acids obtained from the isolate can be used to create a gene library or, alternatively, can be combined with other fractions of the sample isolate, wherein the combined nucleic acids are used to create a gene library. Isolates may be cultured prior to nucleic acid extraction or may be uncultured. Methods of isolating specific populations of organisms in a mixed population.
Suitably, the sample comprises nucleic acids from, for example, a diverse and mixed population of organisms (eg, microorganisms present in the gut of an insect). Nucleic acids are isolated from a sample using any number of DNA and RNA isolation methods. Such nucleic acid isolation methods are commonly performed in the art. When the nucleic acid is RNA, the RNA can be reverse transcribed into DNA using primers known in the art. When the DNA is genomic DNA, the DNA can be excised using, for example, a 25 gauge needle.
Nucleic acids can be cloned into a suitable vector. The vector used will depend on whether the DNA is to be expressed, amplified, sequenced, or manipulated in any manner known in the art (see, for example, US Patent No. 6,022,716, which discloses high-throughput sequencing vectors). Cloning techniques are known in the art or can be developed by one skilled in the art without undue experimentation. The choice of vector will also depend on the size of the polynucleotide sequence and the host cell to be used in the methods of the invention. Therefore, the vector used in the invention can be a plasmid, phage, cosmid, phagemid, virus (eg capsid protein). For example, cosmids and phagemids are commonly used where the specific nucleic acid sequence to be analyzed or modified is large, since these vectors are capable of stably propagating large polynucleotides.
A vector containing the cloned nucleic acid sequence can then be propagated by plating (ie, clonal propagation) or by transfecting the vector into a suitable host cell (eg, phage on an E. coli host). The cloned nucleic acid sequence is used to prepare a library for screening (eg, expression screening, PCR screening, hybridization screening, or the like) by transformation of a suitable organism. Hosts known in the art are transformed by the artificial introduction of vectors containing the nucleic acid sequence by inoculation under conditions that favor such transformation. It can be transformed with a double-stranded circular or linear nucleic acid, or there may also be cases where single-stranded circular or linear nucleic acid sequences can be transformed. Transformation or transformation means a permanent or transient genetic change induced in a cell by the incorporation of new DNA (eg DNA exogenous to the cell). When the cell is a mammalian cell, the permanent genetic change is generally achieved by introducing DNA into the genome of the cell. A transformed cell or host cell generally refers to a cell (eg prokaryotic or eukaryotic) into which (or into whose ancestor) a DNA molecule that is not normally present in the host organism has been introduced by recombinant DNA techniques.
A particular type of vector to be used in the invention includes the origin of replication factor f. Factor f (or fertility factor) in E. coli is a plasmid that is transferred at high frequency during conjugation and less frequently by the bacterial chromosome itself. . In a particular embodiment, cloning vectors called "cosmids" or bacterial artificial chromosome (BAC) vectors are used. They are derived from E. coli factor f, which can stably integrate large segments of DNA. When integrated into DNA from a mixed non-cultivated environmental sample, it allows obtaining large genomic fragments in the form of a stable library of environmental genes.
Nucleic acids from a mixed population or sample can be inserted into the vector by different methods. Generally, the nucleic acid sequence is inserted into the appropriate restriction endonuclease site(s) by methods known in the art. These and other procedures are considered to be within the scope of the expert. In a typical cloning scenario, DNA can be "blunted" with a suitable nuclease (eg mung bean nuclease), methylated with eg EcoRI methylase and ligated with EcoRI GGAATTCC linkers. The linkers are then digested with EcoRI restriction endonuclease and the DNA size fractionated (eg using a sucrose gradient). The resulting size-fractionated DNA is then ligated into an appropriate vector for sequencing, screening, or expression (eg, a lambda vector and packaged using in vitro extract lambda packaging).
Transformation of a host cell with recombinant DNA can be accomplished by conventional techniques well known to those skilled in the art. When the host is prokaryotic, such as E. coli, competent cells capable of taking up DNA can be prepared from cells collected after the exponential growth phase and then treated with CaCl 2 using procedures well known in the art. Alternatively, MgCl2 or RbCl can be used. Transformation can also be performed after protoplast formation of the host cell or by electroporation.
When the host is a eukaryotic organism, DNA transfection or transformation methods including coprecipitation with calcium phosphate, conventional mechanical procedures such as microinjection, electroporation, insertion of liposome-encapsulated plasmid or viral vectors, as well as others known in the art, can be used. Eukaryotic cells can also be cotransfected with another foreign DNA molecule encoding a selected marker, such as the herpes simplex virus thymidine kinase gene. Another method is to use a eukaryotic viral vector, such as simian virus 40 (SV40) or bovine papillomavirus, for transient infection or transformation of eukaryotic cells and protein expression. (Eukaryotic viral vectors, Cold Spring Harbor Laboratory, Gluzmanovo ed., 1982). A eukaryotic cell can be a yeast cell (eg Saccharomyces cerevisiae), an insect cell (eg Drosophila sp.) or a mammalian cell including a human cell.
Eukaryotic systems and mammalian expression systems enable the occurrence of post-translational modifications of expressed mammalian proteins. Eukaryotic cells possessing a cellular mechanism for primary transcription processing, glycosylation, phosphorylation or secretion of the gene product should be used. Such host cell lines may include, but are not limited to, CHO, VERO, BHK, HeLa, COS, MDCK, Jurkat, HEK-293, and W138.
In one aspect, after creating a library of clones using any number of methods including those described above, the clones are resuspended in a liquid medium, such as a nutrient-rich broth or other growth medium known in the art. Typically, the medium is a liquid medium that can be easily pipetted. One or more types of media containing at least one library clone are then introduced individually or together as a mixture into the capillaries (in whole or in part) of the capillary system.
In another aspect, the library is first subjected to biopanning prior to introduction or delivery to a capillary device or other screening technique. Such biopanning methods enrich the library with sequences or activities of interest. Examples of biopanning or enrichment methods are described below.
In one aspect, the library can be screened or sorted to enrich for clones containing the sequence or activity of interest based on the polynucleotide sequences present in the library or clone. Accordingly, the invention provides methods and compositions useful in screening organisms for a desired biological activity or biological sequence, and to assist in obtaining sequences of interest that can be further used in directed evolution, molecular biology, biotechnology and industrial applications.
Accordingly, the invention provides methods for rapidly searching, enriching and/or identifying sequences in a sample by searching and identifying nucleic acid sequences present in the sample. In this way, the invention increases the repertoire of available sequences that can be used to develop diagnostics, therapeutics or molecules for industrial applications. Accordingly, the methods of the invention can identify novel nucleic acid sequences that encode proteins or polypeptides with a desired biological activity.
Once gene libraries (eg, an expression library) have been generated, an additional step of "biopanning" such libraries can be included prior to expression screening. The procedure of "biopanning" refers to the process of identifying clones that have a particular biological activity by searching for sequence homology in a library of clones.
A probe sequence used to selectively interact with a target sequence of interest in a library can be a full-length coding region sequence or a partial coding region sequence for a known biological activity. The library can be screened with probe mixtures that contain at least a portion of a sequence that encodes a known bioactivity or has a desired bioactivity. These probes or probe libraries are preferably single-stranded. In one aspect, the library is preferably made in single-stranded form. Particularly suitable probes are probes derived from DNA encoding bioactivities with activity similar or identical to the particular bioactivity being tested. Probes can be used for PCR amplification and thus selection of target sequences. Alternatively, probe sequences can be used as hybridization probes that can be used to identify sequences with significant or desired homology.
In another aspect, in vivo biopanning can be performed using a FACS-based machine. Gene libraries or expression libraries are constructed from vectors containing elements that stabilize the transcribed RNA. For example, the inclusion of sequences that provide secondary structures such as hairpins designed to flank transcribed regions of RNA would serve to increase their stability, thus increasing their cellular half-life. Probe molecules used in the biopanning process consist of oligonucleotides labeled with detectable molecules that give a detectable signal upon interaction with the target sequence (eg, fluoresce only when the probe binds to the target molecule). Various dyes or dyes well known in the art, for example those described in "Practical Flow Cytometry", 1995 Wiley-Liss, Inc., Howard M. Shapiro, M.D., can be used to intercalate or bind to the nucleic acid to "label" the oligonucleotides. . These probes are introduced into recombinant library cells using one of several transformation methods. The probe molecules interact or hybridize with the transcribed target mRNA or DNA, resulting in DNA/RNA heteroduplex molecules or DNA/DNA duplex molecules. Binding of the probe to the target will produce a detectable signal (eg, a fluorescent signal) that is detected and sorted by a FACS or similar machine during the search process.
DNA probes should have at least about 10 bases, preferably at least 15 bases. In one embodiment, the entire coding region of a portion of the pathway can be used as a probe. When the probe is hybridized to the target DNA in an in vitro system, the hybridization conditions in which the target DNA is selectively isolated using at least one DNA probe will be designed to ensure a hybridization stringency of at least about 50% sequence identity, in particular a stringency that ensures sequence identity of at least about 70%.
The resulting libraries of transformed clones can then be screened for clones that exhibit the activity of interest. Clones can be transferred into alternative hosts for expression of active compounds or screened using the methods described herein.
An alternative to the in vivo biopanning described above is an encapsulation technique, such as gel microdroplets, which can be used to locate multiple clones in one spot for screening on a FACS machine. Clones can then be split into individual clones for re-examination on a FACS machine to identify positive individual clones. Screening in this manner using a FACS device is fully described in Patent Application Ser. 08/876,276, filed Jun. 16, 1997. Thus, for example, if a mixture of clones has a desired activity, individual clones can be found and rescreened using a FACS machine to determine which of such clones has a particular desired activity.
Various types of encapsulation strategies and compounds or polymers can be used in the present invention. For example, high-temperature agarose can be used to form stable microdroplets at high temperatures, allowing cells to be stably encapsulated after heat-killing steps used to remove all background activity when screening for thermostable bioactivity. Encapsulation can be in beads, high-temperature agarose, gel microdroplets, cells such as red blood cells or macrophages, liposomes, or any other way of encapsulating and localizing molecules.
For example, methods for making liposomes are described (eg, US Patent Nos. 5,653,996, 5,393,530 and 5,651,981), as well as the use of liposomes to encapsulate various molecules (eg, US Patent Nos. 5,595,756, 5,605,703, 5,627,159, 5,652,225, 556743 3, 4235871 , 5227170). Entrapment of proteins, viruses, bacteria and DNA in erythrocytes during endocytosis has also been described (see, for example, Journal of Applied Biochemistry 4, 418□435 (1982)). Also described are erythrocytes used as in vitro or in vivo carriers for substances trapped during hypo-osmotic lysis or dielectric membrane breakdown (reviewed in Ihier, G. M. (1983) J. Pharm. Ther). These techniques are useful in the present invention for encapsulating samples in the microenvironment for searching.
As used herein, "microenvironment" means any molecular structure that provides a suitable environment to facilitate the interactions required for the method of the invention. Environments suitable for facilitating molecular interactions include, for example, liposomes. Liposomes can be made from a variety of lipids, including phospholipids, glycolipids, steroids, long-chain alkyl esters; eg alkyl phosphates, fatty acid esters; eg lecithin, fatty amines and the like. A mixture of fatty material such as a combination of neutral steroid, amphiphilic charge and phospholipid can be used. Illustrative examples of phospholipids include lecithin, sphingomyelin, and dipalmitoyl phosphatidylcholine. Representative steroids include cholesterol, cholestanol and lanosterol. Representative charged amphiphiles generally contain from 12 to 30 carbon atoms. Examples of compounds include mono- or dialkyl phosphate esters or alkyl amines; eg diacetyl phosphate, stearylamine, hexadecylamine, dilauryl phosphate and the like.
Furthermore, it is possible to combine some or all of the above embodiments such that the normalization step is performed before the expression library is generated, the expression library is then generated, the expression library so generated is then biopanned, and the biopanned expression library is then screened using a high-throughput cell sorter. permeability. So there are many options, including: (i) generating a library and then searching it; (ii) normalize the target DNA, generate the library and screen it; (iii) normalize, generate library, biopan and search; or (iv) generate, biomaster, and search the library. Nucleic acids used to create a library can be obtained, for example, from environmental samples, mixed populations of organisms (eg cultured or uncultured), enriched populations thereof, and isolates thereof. Additionally, screening techniques include, for example, hybridization screening, PCR screening, expression screening, and the like.
Gel microdroplet technology has been important in amplifying signals available for flow cytometry analysis and enabling the screening of microbial strains in biotechnology strain improvement programs. Wittrup et al., (Biotechnolo. Bioeng. (1993) 42:351-356) developed a microencapsulation selection method that allows rapid and quantitative screening of >106 yeast cells for increased secretion of Aspergillus awamori glucoamylase. The method allows a 400-fold enrichment in a single passage for high-secretion mutants.
Gel microdroplets or other related technologies can be used in the present invention for localization, sorting as well as signal amplification in high-throughput screening of recombinant libraries. Cell survival in screening is not a problem or concern because nucleic acid can be obtained from microdroplets.
Using any number of biopanning techniques that can enrich the library population for clones containing sequences of interest, the enriched clones are resuspended in a liquid medium such as nutrient broth or other growth medium. Thus, enriched clones contain multiple host cells transformed with constructs containing vectors into which nucleic acid sequences derived from a sample (eg, mixed populations of organisms, isolates thereof, and the like) have been introduced. A liquid medium containing a subset of clones and one or more substrates having a detectable molecule (eg, ). The interaction (including reaction) of a substrate and an enzyme-expressing clone having the desired enzyme activity yields a product or a detectable signal that can be spatially detected to identify one or more clones or capillaries containing at least one signal-producing clone. Signal-producing clones or nucleic acids contained in a signal-producing clone can then be recovered using any number of techniques.
The term "substrate" as used herein includes, for example, substrates for the detection of bioactivity or biomolecules (eg, enzymes and their specific enzymatic activities). Such substrates are well known in the art. For example, various enzymes and suitable substrates specific for such enzymes are given in Molecular Probes, Handbook of Fluorescent Probes and Research Chemicals (Molecular Probes, Inc.; Eugene, Oregon.), the disclosures of which are incorporated herein by reference. The substrate may have a detectable molecule associated therewith, including, for example, chromogenic or fluorogenic molecules. A suitable substrate for use in the present invention is any substrate that produces an optically detectable signal upon interaction (eg, reaction) with a given enzyme having the desired activity or a given clone encoding such an enzyme.
A person skilled in the art can select an appropriate substrate, for example, based on the desired enzyme activity. Examples of desired enzymes/enzyme activities include those listed herein. The desired enzymatic activity may also involve a group of enzymes in an enzymatic pathway for which there is an optical signal substrate. One example is the set of enzymes for carotenoid synthesis.
Substrates are known and/or commercially available for glycosidases, epoxide hydrolases, phosphatases and monooxygenases, among others. When the desired activity is in the same class as other biomolecules or enzymes that have a number of known substrates, the activity can be tested using a cocktail of known substrates. For example, substrates for about 20 commercially available esterases are known, and a combination of these known substrates can yield detectable, if not optimal, signal generation.
The optical signaling substrate can be a chromogenic substrate, a fluorogenic substrate, a bio- or chemiluminescent substrate, or a substrate for fluorescence resonance energy transfer (FRET). The detectable type may be that resulting from substrate cleavage or a secondary molecule that is so affected by the cleavage or other substrate/biomolecule interaction that it undergoes a detectable change. Countless examples of detectable assay formats are known in the diagnostic art using immunoassay, chromogenic assay, and labeled probe methodologies.
In one embodiment, the optical signal substrate can be a bio- or chemiluminescent substrate. Chemiluminescent substrates for several enzymes are available from Tropix (Bedford, MA). Enzymes that have known chemiluminescent substrates include alkaline phosphatase, beta-galactosidase, beta-glucuronidase, and beta-glucosidase.
In another embodiment, chromogenic substrates can be used, especially for certain enzymes, such as hydrolytic enzymes. For example, the optical signal substrate can be an indolyl derivative that is enzymatically cleaved to produce a chromogenic product. When using chromogenic substrates, the optically detectable signal is optical absorbance (including changes in absorbance). In this aspect, signal detection can be provided by measuring the absorbance using a spectrophotometer or the like.
In another embodiment, a fluorogenic substrate is used such that the optically detectable signal is fluorescence. Fluorogenic substrates provide high sensitivity for better detection, as well as alternative detection methods. Hydroxyl- and amino-substituted coumarins are the most common fluorophores used in the preparation of fluorogenic substrates. A typical coumarin-based fluorogenic substrate is 7-hydroxycoumarin, commonly known as umbelliferone (Umb). Derivatives and analogues of umbelliferone are also used. Substrates based on derivatives and analogues of fluorescein (such as FDG or C12-FDG) and rhodamine are also used. Substrates derived from resorufin (eg, resorufin beta-D-galactopyranoside or resorufin beta-D-glucuronide) are particularly useful in the present invention. Resorufin-based substrates are useful, for example, in the screening of glycosidases, hydrolases and dealkylases. Lipophilic derivatives of the above substrates (eg, alkylated derivatives) may be useful in certain embodiments because they are generally more readily loaded into cells and may tend to associate with lipid regions of the cell. Fluorescein and resorufin are commercially available as alkylated derivatives that form relatively water-insoluble (ie, lipophilic) products. For example, fluorescence imaging can be performed using C12-resorufin galactoside, manufactured by Molecular Probes (Eugene, Oreg.) as a substrate. The particular fluorogenic substrate used can be chosen based on the enzyme activity being assayed.
Typically, substrates can enter the cell and maintain their presence in the cell long enough to perform the assay (eg, once the substrate is in the cell, it does not "leak" back out before reacting sufficiently with the test enzyme to elicit a detectable response). Substrate retention in the cell can be improved by various techniques. In one method, the substrate compound is structurally modified by adding a hydrophobic (eg, alkyl) tail. In another embodiment, a solvent such as DMSO or glycerol can be used to coat the outer surface of the cell. The substrate can also be applied to the cells at a reduced temperature, which has been observed to delay the efflux of the substrate from the cells. However, entry of the substrate into the cell is not necessary when, for example, the enzyme or polypeptide is secreted, present in a lysed cell sample, etc., or when the substrate can act outside the cell (eg, a ligand complex).
The optical signaling substrate may, in some embodiments, be a FRET substrate. FRET is a spectroscopic method that can monitor the proximity and relative angular orientation of fluorophores. A fluorescent monitoring system that uses FRET to measure substrate or product concentration includes two fluorescent units that have emission and excitation spectra that make one fluorescent group the "donor" and the other fluorescent group the "acceptor." The two fluorescent parts are chosen so that the excitation spectrum of the fluorescent acceptor part coincides with the emission spectrum of the excited part (fluorescent donor part). The donor part is excited by light of the appropriate intensity within the excitation spectrum of the donor part and emits the absorbed energy as fluorescent light. When the acceptor part of the fluorescent protein is set to quench the donor part in the excited state, the fluorescence energy is transferred to the acceptor part, which can emit a second photon. The emission spectra of the donor and acceptor groups have minimal overlap so that the two emissions can be distinguished. Therefore, when the acceptor emits fluorescence at a longer wavelength than the donor, then the net steady-state effect is that the donor emission is damped and the acceptor now emits when excited at the donor absorption maximum.
The detectable or optical signal can be measured using, for example, a fluorometer (or the like) to detect fluorescence, including fluorescence polarization, time-resolved fluorescence, or FRET. In general, the excitation radiation from the excitation source of the first wavelength causes the excitation radiation to excite the sample. In response, fluorescent compounds in the sample emit radiation at a wavelength different from the excitation wavelength. Methods of performing determinations on fluorescent materials are well known in the art and are described, for example, by Lakowicz (Principles of Fluorescence Spectroscopy, New York, Plenum Press, 1983) and Herman ("Resonance Energy Transfer Microscopy", in: Fluorescence Microscopy of Living Cells in Culture, Part B, Methods in Cell Biology, Volume 30, eds Taylor & Wang, San Diego, Academic Press, 1989, pp. 219-243). Examples of fluorescence detection techniques are described in more detail below.
Additionally, several methods of using reporter genes to measure gene expression have been described in the literature. Nolan et al. describes a technique for analyzing beta-galactosidase expression in mammalian cells. This technique uses fluorescein-di-beta-D-galactopyranoside (FDG) as a substrate for beta-galactosidase, which releases fluorescein, a product that can be detected by fluorescence emission after hydrolysis (Nolan et al., 1991). Other fluorogenic substrates such as 5-dodecanoylaminofluorescein (C12-FDG) di-beta-D-galactopyrazide (Molecular Probes) have been developed, which differ from FDG in that they are lipophilic fluorescein derivatives that can readily cross most of cell membranes in physiological conditions Culture.
The aforementioned beta-galactosidase assays can be used to screen single E. coli cells expressing recombinant beta-D-galactosidase isolated, for example, from hyperthermophilic archaea such as Sulfolobus solfataricus. Other reporter genes may be useful as substrates and are known for beta-glucuronidase, alkaline phosphatase, chloramphene acetyltransferase (CAT) and luciferase.
For example, a library can be screened for specific enzyme activity. For example, the enzyme activity under investigation may be a catalyst for epoxide modification. Recombinant enzymes can then be re-assayed for more specific enzyme activity.
Alternatively, the library can be screened for more specialized enzyme activity. For example, instead of a general examination for biological activity, the library can be examined for a more specialized activity, ie. the type of bond that epoxide hydrolase acts on. Thus, for example, the library can be screened for those EHs that act on one or more specific epoxy groups, such as monosubstituted epoxides, 2,2-disubstituted epoxides, 2,3-disubstituted epoxides, trisubstituted epoxides, and styrene oxides.
As described in connection with one of the above aspects, the invention provides a method of testing the activity of clones containing selected DNA derived from a microorganism, the method comprising:
screening a library for biomolecules or biological activity of interest, a library containing a plurality of clones, wherein the clones are prepared by recovering nucleic acids (eg, genomic DNA) from a mixed population of organisms, enriched populations thereof, or isolates thereof, and transformation of the host with the nucleic acids to generate of clones that are screened for biomolecules or biological activity of interest.
In another aspect, the enrichment step may be applied prior to activity-based screening. The enrichment step can be, for example, a biopanning method. This "biopanning" procedure is described and illustrated in U.S. Pat. Pat. Pat. 6,054,002, issued Apr. 25, 2000, which is incorporated herein by reference.
In another aspect, the polynucleotides are contained in clones, wherein the clones are prepared from nucleic acid sequences of a mixed population of organisms, wherein the nucleic acid sequences are used to prepare a gene library of the mixed population of organisms. A gene library is screened for a sequence of interest by transfecting a host cell containing the library with at least one nucleic acid sequence having a detectable molecule that is all or part of a DNA sequence encoding a biological activity with the desired activity and separating the library into clones containing the desired sequence, e.g. by fluorescence analysis.
The biopanning approach described above can be used to generate libraries enriched in clones carrying sequences homologous to a given probe sequence. Using this approach, libraries containing clones with inserts up to 40 kbp can be enriched approximately 1000-fold after each round of panning. This allows to reduce the number of clones searched after 1 round of biopanning enrichment. This approach can be used to create libraries enriched in clones carrying sequences of interest associated with a biological activity of interest, for example polyketide sequences.
Hybridization screening using high-density filters or biopanning has been shown to be an effective approach to detect homologues of pathways containing conserved genes. However, other approaches are needed to discover new bioactive molecules that may not have known counterparts. Another approach of the present invention is to screen E. coli for expression of small molecule ring structures or "backbones". Since the genes encoding these polycyclic structures are often expressed in E. coli, a small molecular backbone can be produced, albeit in an inactive form. Bioactivity is transferred upon transfer of the molecule or pathway to a suitable host that expresses the necessary glycosylation and methylation genes that can modify or "decorate" the structure into its active form. Therefore, inactive ring compounds recombinantly expressed in E. coli are screened to identify clones, which are then transported to a metabolically rich host such as Streptomyces for subsequent production of the bioactive molecule. The use of high-throughput robotic systems enables screening of hundreds of thousands of clones in multiplexed arrays in microtiter dishes.
One approach to detecting and enriching clones bearing these structures is to use capillary screening or FACS screening methods, a procedure described and illustrated in U.S. Pat. Cheese. 08/876276, filed Jun. 16, 1997. Polycyclic ring compounds typically have characteristic fluorescence spectra when excited by ultraviolet light. Therefore, clones expressing these structures can be distinguished from the background using a sufficiently sensitive detection method. For example, high-throughput FACS can be used to screen small molecule backbones in E. coli libraries. Commercially available FACS machines are capable of screening up to 100,000 clones per second for UV-active molecules. These clones can be sorted for further FACS screening or resident plasmids can be extracted and transferred into Streptomyces for activity screening.
In an alternative screening approach, after transfer to Streptomyces hosts, organic extracts from candidate clones can be tested for biological activity by testing susceptibility to test organisms such as Staphylococcus aureus, E. coli or Saccharomyces cerevisiae. In this approach, FACS screening can be used by co-encapsulating clones with the test organism.
An alternative to the aforementioned screening methods provided by the present invention is an approach referred to as "mixed extract" screening. The "mixed extract" screening approach exploits the fact that additional genes required to transfer the activity of polycyclic scaffolds are expressed in metabolically rich hosts such as Streptomyces, and that enzymes can be extracted and combined with scaffolds extracted from E. coli to produce a bioactive compound in vitro. Preparations of enzyme extracts from metabolically rich hosts, such as Streptomyces strains, at different growth stages are combined with pools of organic extracts from E. coli libraries and then assessed for bioactivity.
Another approach to detecting activity in E. coli clones is to search for genes that can convert bioactive compounds into different forms.
For example, capillary screening can also be used to detect the expression of UV fluorescent molecules in metabolically rich hosts such as Streptomyces. Recombinant oxytetracylin retains its diagnostic red fluorescence when produced heterologously in S. lividans TK24. Pathway clones that can be identified by the methods and systems of the invention can therefore be screened for polycyclic molecules at high throughput.
Recombinant bioactive compounds can also be tested in vivo using "two-hybrid" systems that can detect enhancers and inhibitors of protein-protein interactions or other interactions, such as interactions between transcription factors and their activators or receptors and their cognate targets. In this embodiment, both the small molecule pathway and the GFP reporter construct are coexpressed. Clones with altered GFP expression can then be identified and clone isolated for characterization.
The present invention also enables the transfer of cloned pathways derived from uncultured samples into metabolically rich hosts for heterologous expression and further screening for bioactive compounds of interest using the various screening approaches briefly described above.
After viable or non-viable cells, each containing a different expression clone from the gene library, have been screened and positive clones obtained, DNA can be isolated from the positive clones using techniques well known in the art. The DNA can then be amplified in vivo or in vitro using any of a variety of amplification techniques known in the art. In vivo amplification would involve transformation of clones or subclones into a viable host followed by host growth. In vitro amplification can be performed using techniques such as polymerase chain reaction. After amplification, the identified sequences can be "developed" or sequenced.
One advantage provided by this invention is the ability to manipulate identified biomolecules or biological activities to generate and select encoded variants with altered sequence, activity, or specificity.
Clones with screened biomolecules or bioactivities can be subjected to site-directed mutagenesis to develop new biomolecules or bioactivities with desired properties or to develop modified biomolecules or bioactivities with particularly desirable properties that are absent or less pronounced in nature (eg, activity wild type) , such as stability to heat or organic solvents. Any of the known techniques of site-directed mutagenesis are applicable to the invention. For example, particularly preferred mutagenesis techniques for use in the present invention include those described below.
Alternatively, it may be desirable to distinguish a biomolecule (eg, peptide, protein, or polynucleotide sequence) or bioactivity (eg, enzyme activity) obtained, identified, or cloned as described herein. Such diversity may modify the biomolecule or bioactivity to increase or decrease, for example, the activity, specificity, affinity, function, and the like of the polypeptide. DNA shuffling can be used to increase diversity in a particular sample. DNA shuffling is intended to indicate recombination between substantially homologous but non-identical sequences, in some embodiments DNA shuffling may involve crossing over by non-homologous recombination, such as cer/lox and/or fp/frt systems and the like (see e.g., e.g. .US Patent No. 5,939,250, issued to Dr. Jay Short on August 17, 1999, and assigned to Diversa Corporation, the disclosure of which is incorporated herein by reference). Various methods for shuffling, mutating, or altering polynucleotide or polypeptide sequences are discussed below.
Nucleic acid shuffling is a method of in vitro or in vivo homologous recombination of sets of shorter or smaller polynucleotides to produce a polynucleotide or polynucleotides. Mixtures of related nucleic acid or polynucleotide sequences are subjected to sex-PCR to generate random polynucleotides and reassembled to produce a library or mixed population of recombinant hybrid nucleic acid or polynucleotide molecules. Unlike cassette mutagenesis, shuffling alone and error-prone PCR allow blind mutation of a set of sequences (without sequence information other than primers).
The advantage of mutagenic shuffling according to the invention over error-prone PCR alone for repeated selection can be best explained as follows. Consider DNA shuffling versus error-prone PCR (sex-free PCR). The starting library of selected or combined sequences may consist of linked sequences of different origins or may be derived from any type of single gene mutagenesis (including shuffling). The set of selected strings is obtained after the first round of activity selection. Mixing allows, for example, any combinatorial linking of all related sequences.
This method differs from error-prone PCR in that it is a reverse chain reaction. In error-prone PCR, the number of polymerase start sites and the number of molecules grows exponentially. However, the order of the polymerase start sites and the order of the molecules remain largely the same. In contrast, in nucleic acid reassembly or random shuffling of polynucleotides, the number of start sites and the number (but not the size) of random polynucleotides decrease over time. For polynucleotides derived from whole plasmids, the theoretical endpoint is one large concatamer molecule.
Since crossovers occur in regions of homology, recombination will preferentially occur between members of the same sequence family. This discourages combinations of arrays that are highly incompatible (eg have different activities or specificities). It is believed that multiple sequence families can be mixed in the same reaction. Also, shuffling generally preserves relative order.
Rare mixtures will contain a large number of the best molecules (eg, the highest activity or specificity) and these rare mixtures can be selected based on their higher activity or specificity.
A group of 100 different polypeptide sequences can be permuted in 103 different ways. Such a large number of permutations cannot be represented in a single DNA sequence library. Therefore, it is believed that multiple cycles of DNA shuffling and selection may be required depending on the length of the sequence and the desired sequence diversity. In contrast, error-prone PCR keeps all selected sequences in the same relative orientation, generating a much smaller mutation cloud.
A template polynucleotide that can be used in the methods of the invention can be DNA or RNA. It can vary in length depending on the size of the gene or the shorter or smaller polynucleotide being recombined or reassembled. Preferably, the polynucleotide template is 50 bp to 50 kb in size. It is believed that entire vectors containing the nucleic acid encoding the protein of interest can be used in the methods of the invention and have been used successfully.
The template polynucleotide can be obtained by PCR amplification (US Patent Nos. 4,683,202 and 4,683,195) or other amplification or cloning methods. However, removing free primers from PCR products before subjecting them to combined PCR products and gender PCR may give more efficient results. Failure to properly remove primers from the primary pool prior to sex PCR can lead to low cloning frequency.
The template polynucleotide is often double-stranded. A double-stranded nucleic acid molecule is preferred to ensure that the regions of the resulting single-stranded polynucleotides are complementary to each other and can therefore hybridize to a double-stranded molecule.
It is contemplated that single-stranded or double-stranded nucleic acid polynucleotides having regions identical to the template polynucleotide and regions heterologous to the template polynucleotide can be added to the template polynucleotide at this stage. It is also thought that two different but related polynucleotide templates can be mixed at this stage.
The double-stranded polynucleotide template and any added double-stranded or single-stranded polynucleotides are subjected to a slow-down or stop-ending PCR reaction to produce a mixture of about 5 bp to 5 kb or more. Preferably the size of the random polynucleotides is from about 10 bp to 1000 bp, more preferably the size of the polynucleotides is from about 20 bp to 500 bp.
Alternatively, it is also contemplated that a double-stranded nucleic acid having multiple nicks may be used in the methods of the invention. A nick is a break in one strand of a double-stranded nucleic acid. The distance between such cuts is preferably from 5 bp to 5 kb, more preferably from 10 bp to 1000 bp. This may provide self-priming regions for the production of shorter or smaller polynucleotides that are incorporated into polynucleotides arising from, for example, random primers.
The concentration of any particular polynucleotide will not exceed 1% by weight of the total polynucleotides, more preferably the concentration of any particular nucleic acid sequence will not exceed 0.1% by weight of the total nucleic acid.
The number of different specific polynucleotides in the mixture will be at least about 100, preferably at least about 500, and more preferably at least about 1000.
At this stage, single-stranded or double-stranded polynucleotides, either synthetic or natural, can be added to randomly selected double-stranded shorter or smaller polynucleotides to increase the heterogeneity of the polynucleotide mixture.
It is also contemplated that populations of double-stranded randomly broken polynucleotides can be mixed or linked at this stage with polynucleotides from the sex PCR process and optionally subjected to one or more additional cycles of sex PCR.
When it is desired to introduce a mutation into a template polynucleotide, single-stranded or double-stranded polynucleotides having a region identical to the template polynucleotide and a region heterologous to the template polynucleotide in a 20-fold excess of weight compared to the entire nucleic acid. Preferably, single-stranded polynucleotides can be added in a 10-fold excess of weight. compared to total nucleic acid.
When a mixture of different but related template polynucleotides is desired, populations of polynucleotides from each template can be combined in a ratio of less than about 1:100, preferably less than about 1:40. For example, it may be desirable to backcross wild-type polynucleotides with a population of mutated polynucleotides to eliminate neutral mutations (eg, in such an example, the ratio of randomly supplied wild-type polynucleotides that can be added to randomly supplied hybrid polynucleotides during the sex cycle is about 1: 1 to about 100:1, and preferably from 1:1 to 40:1.
A mixed population of random polynucleotides is denatured to form single-stranded polynucleotides and then rehybridized. Only those single-stranded polynucleotides that have regions of homology with other single-stranded polynucleotides will be reassembled.
Random polynucleotides can be denatured by heating. A person skilled in the art can determine the conditions necessary for complete denaturation of a double-stranded nucleic acid. Preferably the temperature is from 80°C to 100°C, more preferably the temperature is from 90°C to 96°C. Other methods that can be used to denature polynucleotides include pressure and pH.
Polynucleotides can be reattached by cooling. The temperature is preferably from 20°C to 75°C, more preferably the temperature is from 40°C to 65°C. recombination can be forced using a low annealing temperature, although this process becomes more difficult. The degree of annealing that occurs will depend on the degree of homology between the population of single-stranded polynucleotides.
Renaturation can be accelerated by adding polyethylene glycol ("PEG") or salt. The salt concentration is preferably from 0 mM to 200 mM, more preferably the salt concentration is from 10 mM to 100 mM. The salt can be KCl or NaCl. The PEG concentration is preferably from 0% to 20%, more preferably from 5% to 10%.
The bound polynucleotides are then incubated in the presence of nucleic acid polymerase and dNTPs (ie, dATP, dCTP, DGTP, and dTTP). The nucleic acid polymerase can be Klenow fragment, Taq polymerase, or any other DNA polymerase known in the art.
The approach to use for assembly depends on the minimum degree of homology that should still result in crossovers. If the regions of identity are large, Taq polymerase can be used at an annealing temperature between 45□65□C. If the regions of identity are small, Klenow polymerase with an annealing temperature between 20-30°C can be used. The skilled worker can vary the annealing temperature to increase the number of crosses obtained.
The polymerase can be added to random polynucleotides before hybridization, simultaneously with hybridization, or after hybridization.
The cycle of denaturation, renaturation, and incubation in the presence of polymerase is referred to herein as nucleic acid shuffling or reassembly. This cycle is repeated the desired number of times. Preferably the cycle is repeated from 2 to 50 times, more preferably the sequence is repeated from 10 to 40 times.
The resulting nucleic acid is a larger double-stranded polynucleotide of about 50 bp to about 100 kb, preferably a larger polynucleotide of 500 bp to 50 kb.
These larger polynucleotides may contain a number of polynucleotide copies of the same size as the tandem template polynucleotide. This concatameric polynucleotide is then denatured into individual copies of the template polynucleotide. The result will be a population of polynucleotides approximately the same size as the template polynucleotide. The population will be a mixed population in which single-stranded or double-stranded polynucleotides having an identical region and a heterologous region have been added to the polynucleotide template prior to mixing. These polynucleotides are then cloned into the appropriate vector and ligation mixture used to transform the bacteria.
It is believed that individual polynucleotides can be obtained from a larger concatameric polynucleotide by amplification of the individual polynucleotide prior to cloning by various methods, including PCR (US Patent Nos. 4,683,195 and 4,683,202), rather than by digestion of the concatamer.
The vector used for cloning is not critical as long as it accepts a polynucleotide of the desired size. If expression of a particular polynucleotide is desired, the cloning medium should further contain transcriptional and translational signals near the polynucleotide insertion site to enable expression of the polynucleotide in the host cell.
The resulting bacterial population will contain numerous recombinant polynucleotides with random mutations. This mixed population can be screened to identify the desired recombinant polynucleotides. The method of selection will depend on the desired polynucleotide.
For example, if a polynucleotide identified by the methods described herein encodes a protein with a first binding affinity, further mutated (eg, scrambled) sequences with increased ligand binding efficiency may be desired. In such a case, the proteins expressed with each of the polynucleotide moieties in the population or library can be tested for their ability to bind a ligand by methods known in the art (ie, panning, affinity chromatography). If a polynucleotide encoding a protein with increased drug resistance is desired, the proteins expressed by each of the polynucleotides in the population or library can be tested for their ability to confer drug resistance to the host organism. One skilled in the art, having knowledge of the desired protein, can readily screen the population to identify polynucleotides that confer the desired properties to the protein.
It is believed that one skilled in the art could use a phage display system in which protein fragments are expressed as fusion proteins on the surface of phage (Pharmacia, Milwaukee WI). Recombinant DNA molecules are cloned into phage DNA in situ, which results in the transcription of a fusion protein, part of which is encoded by the recombinant DNA molecule. A phage containing a recombinant nucleic acid molecule is replicated and transcribed in the cell. The leader sequence of the fusion protein directs the transport of the fusion protein to the top of the phage particle. Thus, the fusion protein partially encoded by the recombinant DNA molecule is displayed on the phage particle for detection and selection by the methods described above.
It is further contemplated that a certain number of nucleic acid mixing cycles can be performed with polynucleotides from a subpopulation of the first population, which subpopulation contains DNA encoding the desired recombinant protein. In this way, proteins with even higher binding affinity or enzymatic activity can be obtained.
It is also contemplated that a number of rounds of nucleic acid shuffling can be performed with a mixture of wild-type polynucleotides and subpopulations of nucleic acids from the first or subsequent rounds of nucleic acid shuffling to remove any silent mutations from the subpopulation.
Any source of nucleic acid in purified form can be used as starting nucleic acid. Accordingly, the method may utilize DNA or RNA, including messenger RNA, which DNA or RNA may be single- or double-stranded. Alternatively, a DNA/RNA hybrid containing one strand of each can be used. The length of the nucleic acid sequence can vary depending on the size of the nucleic acid sequence to be mutated. Preferably, the specific nucleic acid sequence is from 50 to 50,000 base pairs. It is believed that entire vectors containing the nucleic acid encoding the protein of interest can be used in the methods of the invention.
Any specific nucleic acid sequence can be used to generate a population of hybrids by this method. It is only necessary that a small population of hybrid sequences of a particular nucleic acid sequence exist or be available for this method.
A population of specific nucleic acid sequences with mutations can be generated in a number of different ways. Mutations can be created by error-prone PCR. Error-prone PCR uses low-fidelity polymerization conditions to randomly introduce low levels of point mutations in a long sequence. Alternatively, mutations can be introduced into the polynucleotide template by oligonucleotide-directed mutagenesis. In oligonucleotide-directed mutagenesis, a short polynucleotide sequence is removed from the polynucleotide by digestion with restriction enzymes and replaced with a synthetic polynucleotide in which various bases have been changed from the original sequence. The polynucleotide sequence can also be changed by chemical mutagenesis. Chemical mutagens include, for example, sodium bisulfite, nitric acid, hydroxylamine, hydrazine or formic acid. Other agents that are analogues of nucleotide precursors are nitrosoguanidine, 5-bromouracil, 2-aminopurine or acridine. Generally, these agents are added to the PCR reaction in place of the nucleotide precursor, thereby altering the sequence. Intercalating agents such as proflavin, acriflav, quinacrine and the like can also be used. Random mutagenesis of the polynucleotide sequence can also be achieved by irradiation with X-rays or ultraviolet light. Generally, such mutated plasmid polynucleotides are introduced into E. coli and propagated as a pool or library of hybrid plasmids.
Alternatively, a small mixed population of specific nucleic acids may be found in nature, as they may consist of different alleles of the same gene or of the same gene from different related species (ie related genes). Alternatively, they may be related DNA sequences within a species, such as immunoglobulin genes.
Once a mixed population of specific nucleic acid sequences has been generated, the polynucleotides can be used directly or inserted into an appropriate cloning vector using techniques well known in the art.
The choice of vector depends on the size of the polynucleotide sequence and the host cell to be used in the methods of the invention. Templates of the invention can be plasmids, phages, cosmids, phagemids, viruses (eg, for example, cosmids and phagemids are preferred when the specific nucleic acid sequence to be mutated is larger because these vectors are capable of stably propagating large polynucleotides).
If a mixed population of a specific nucleic acid sequence is cloned into a vector, it can be clonally propagated. Utility can be readily determined by screening for expressed polypeptides.
The DNA shuffling method of the invention can be performed blindly on a set of unknown sequences. By adding to the reassembled mixture oligonucleotides (with ends homologous to the reassembled sequences), any mixture of sequences can be incorporated at any specific position into the other mixture of sequences. Therefore, it is believed that mixtures of synthetic oligonucleotides, PCR polynucleotides or even whole genes can be mixed in different sequenced libraries at specific positions. The insertion of one sequence (mixture) is independent of the insertion of the sequence into another part of the template. Therefore, the degree of recombination, the required homology, and the diversity of the library can vary independently and simultaneously along the length of the reassembled DNA.
Mixing requires the presence of homologous regions that separate regions of dissimilarity. Scaffold-like protein structures may be particularly suitable for mixing. A conserved scaffold defines general folding by self-association while displaying relatively unconstrained loops that mediate specific binding. Examples of such scaffolds are the immunoglobulin beta barrel and the four-helix bundle, which are well known in the art. This shuffling can be used to create scaffold-like proteins with different combinations of mutated binding sequences.
Equivalents of some standard genetic matings can also be performed by in vitro mixing. For example, "molecular backcrossing" can be performed by mixing a hybrid nucleic acid with a wild-type nucleic acid repeatedly while selecting for mutations of interest. Similar to traditional breeding, this approach can be used to combine phenotypes from different sources into a selected background. This is useful, for example, to remove neutral mutations that affect unselected properties (eg, immunogenicity). Therefore, it could be useful to determine which mutations in a protein are involved in increased biological activity and which are not, an advantage that cannot be achieved by error-prone mutagenesis or cassette mutagenesis methods.
Large, functional genes can be properly assembled from a mixture of small random polynucleotides. This reaction may be useful for reassembling genes from highly fragmented fossil DNA. In addition, random fragments of nucleic acids from fossils can be combined with polynucleotides from similar genes of related species.
It is also believed that the method of the invention can be used for in vitro amplification of the entire genome from a single cell, which is necessary for various research and diagnostic applications. PCR amplification of DNA usually involves sequences of about 40 kb. Amplification of a whole genome such as E. coli (5000 kb) by PCR requires approximately 250 primers, yielding 125 polynucleotides of forty kb. On the other hand, random generation of genome polynucleotides in sex-specific PCR cycles followed by gel purification of small polynucleotides will provide many possible primers. Use of this mixture of random small polynucleotides as primers in a PCR reaction alone or with the whole genome as a template should result in a reverse chain reaction with a theoretical endpoint of a single concatemer containing multiple copies of the genome.
100-fold copy number amplification and an average polynucleotide size greater than 50 kb can be achieved when only random polynucleotides are used. It is believed that the larger concatamer is formed by overlapping many smaller polynucleotides. The quality of certain PCR products obtained with synthetic primers will be indistinguishable from those obtained from unamplified DNA. This approach is expected to be useful for genome mapping.
The polynucleotide to be mixed can be produced as a random or non-random polynucleotide, as determined by the physician. Furthermore, the invention provides a mixing method that is applicable to a wide range of sizes and types of polynucleotides, including the step of generating polynucleotide monomers for use as building blocks in the reassembly of a larger polynucleotide. For example, building blocks can be gene fragments, or they can consist of entire genes or gene pathways, or any combination thereof.
In an in vivo mixing aspect, a mixed population of a particular nucleic acid sequence is introduced into bacterial or eukaryotic cells under such conditions that at least two different nucleic acid sequences are present in each host cell. Polynucleotides can be introduced into host cells in a variety of ways. Host cells can be transformed with smaller polynucleotides by methods known in the art, for example by treatment with calcium chloride. If the polynucleotides are inserted into the phage genome, the host cell can be transfected with a recombinant phage genome having specific nucleic acid sequences. Alternatively, nucleic acid sequences can be introduced into a host cell using electroporation, transfection, lipofection, biolistics, conjugation, and the like.
Generally, in this aspect, specific nucleic acid sequences capable of stably replicating the sequences in the host cell will be present in the vectors. Furthermore, it is contemplated that the vectors will encode a marker gene so that host cells carrying the vector can be selected for. This ensures that the mutated specific nucleic acid sequence can be restored when introduced into a host cell. However, it is believed that the entire mixed population of specific nucleic acid sequences need not be present in the vector sequence. Rather, it is sufficient to clone enough sequences into vectors to ensure that when the polynucleotides are introduced into host cells, each host cell contains one vector containing at least one specific nucleic acid sequence. It is also thought that instead of a subset of the population of specific nucleic acid sequences cloned into vectors, this subset may already be stably integrated into the host cell.
It has been found that when two polynucleotides having identical regions are inserted into host cells, homologous recombination occurs between the two polynucleotides. Such recombination between two mutated specific nucleic acid sequences will in some situations result in double or triple hybrids.
It has also been found that recombination frequency increases if some of the mutated specific nucleic acid sequences are present on the linear nucleic acid molecules. Therefore, in one embodiment, some specific nucleic acid sequences are present on the linear polynucleotides.
After transformation, host cell transformants are selected to identify those host cell transformants that contain mutated specific nucleic acid sequences with the desired properties. For example, if it is desired to increase resistance to a particular drug, transformed host cells can be subjected to increased concentrations of the specified drug, and those transformants that produce mutated proteins that can confer increased drug resistance will be selected. If it is desired to increase the ability of a particular protein to bind to a receptor, protein expression can be induced from transformants and the resulting protein tested in a ligand binding assay by methods known in the art to identify a subset of mutant populations that exhibit enhanced ligand binding. Alternatively, the protein can be expressed in another system to ensure proper processing.
After a subset of the first recombinant specific nucleic acid sequences (progeny sequences) with the desired properties have been identified, they are then subjected to a second round of recombination. In a second cycle of recombination, the recombinant specific nucleic acid sequences can be mixed with the original mutated specific nucleic acid sequences (parental sequences) and the cycle is repeated as described above. In this way, a number of other recombinant specific nucleic acid sequences can be identified which have improved properties or encode proteins with improved properties. This cycle can be repeated an unlimited number of times.
It is also believed that in the second or subsequent recombination cycle, backcrossing can take place. Molecular backcrossing can be performed by mixing the desired specific nucleic acid sequences with a large number of wild-type sequences such that at least one wild-type nucleic acid sequence and the mutated nucleic acid sequence are present in the same transformed host cell. Recombination with a specific wild-type nucleic acid sequence will eliminate those neutral mutations that may affect unselected properties, such as immunogenicity, but not selected properties.
In another aspect of the invention, it is contemplated that during the first round, a subset of specific nucleic acid sequences may be generated as smaller polynucleotides by slowing or stopping their PCR amplification prior to introduction into the host cell. The size of the polynucleotide must be large enough to contain certain regions identical to other sequences in order to recombine homologously with other sequences. The size of the polynucleotide will range from 0.03 kb to 100 kb, more preferably from 0.2 kb to 10 kb. It is also contemplated that in subsequent rounds, all specific nucleic acid sequences other than those selected from the previous round may be used to generate PCR polynucleotides prior to introduction into host cells.
Shorter polynucleotide sequences can be single-stranded or double-stranded. Reaction conditions suitable for separating nucleic acid strands are well known in the art.
The stages of this process can be repeated indefinitely, and the only limit is the number of hybrids that can be obtained.
Therefore, an initial pool or population of mutated template nucleic acid is cloned into a vector that can replicate in a bacterium, such as E. coli. A specific vector is not required as long as it is capable of autonomous replication in E. coli. In one embodiment, the vector is designed to allow the expression and production of any protein encoded by the mutated specific nucleic acid associated with the vector. It is also preferred that the vector contains a gene encoding a selectable marker.
A vector population containing a set of mutant nucleic acid sequences is introduced into E. coli host cells. Nucleic acid vector sequences can be introduced by transformation, transfection or infection in the case of phage. The concentration of vectors used to transform bacteria is such that a certain number of vectors are introduced into each cell. Once in the cell, the efficiency of homologous recombination is such that homologous recombination occurs between different vectors. This creates hybrids (daughters) that have a combination of mutations that differ from the original mutant parent sequences. The host cells are then clonally replicated and selected for the marker gene present in the vector. During selection, only those cells that have the plasmid will grow. Host cells containing the vector are then screened for beneficial mutations.
Once a particular nucleic acid sequence with a progeny mutation conferring the desired properties is identified, the nucleic acid is either isolated, or already associated with the vector, or separated from the vector. This nucleic acid is then mixed with the first or parent population of nucleic acids and the cycle is repeated.
A parental mutated population of a specific nucleic acid, either as polynucleotides or cloned into the same vector, is introduced into host cells that already contain the daughter nucleic acids. The cells are allowed to recombine and the next generation of recombinants or grandchildren are selected by the methods described above. This cycle can be repeated many times until a nucleic acid or peptide with the desired properties is obtained. It is believed that in subsequent cycles, the population of mutant sequences added to the hybrids may be derived from the parental hybrids or any subsequent generation.
In an alternative embodiment, the invention provides a method of "molecularly" backcrossing the resulting recombinant specific nucleic acid to eliminate any neutral mutations. Neutral mutations are mutations that do not confer desired properties to the nucleic acid or peptide. Such mutations, however, can impart undesirable properties to the nucleic acid or peptide. Accordingly, it is desirable to eliminate such neutral mutations. The method of the invention provides means for this.
In this aspect, after obtaining a hybrid nucleic acid having the desired properties by the methods of implementation, the nucleic acid, the vector having the nucleic acid, or the host cell containing the vector and the nucleic acid are isolated.
The nucleic acid or vector is then introduced into a host cell with a large excess of wild-type nucleic acid. The hybrid nucleic acid and the nucleic acid of the wild-type sequence can be recombined. The resulting recombinants are subjected to the same selection as the hybrid nucleic acid. Only those recombinants that retain the desired characteristics will be selected. Any silent mutations that do not confer desired characteristics will be lost by recombination with wild-type DNA. This cycle can be repeated many times until all silent mutations are eliminated.
In another aspect, the invention provides a method of mixing, assembling, reassembling, recombining and/or combining at least two polynucleotides to form a progeny polynucleotide (eg, a pathway). two single-stranded sequences hybridized to each other as hybridization partners) is treated with an exonuclease to release nucleotides from one of the two strands, leaving the remaining strand free of the original partner so that, if desired, the remaining strand can be used to hybridize with another partner.
In a particular embodiment, the double-stranded polynucleotide end (which may be part of, or linked to, a polynucleotide or non-polynucleotide sequence) is treated with a source of exonuclease activity. An enzyme with 3' exonuclease activity, an enzyme with 5' exonuclease activity, an enzyme with 3' exonuclease activity and 5' exonuclease activity, and any combination thereof can be used in the invention. An exonuclease can be used to release nucleotides from one or both ends of a linear double-stranded polynucleotide and from one end-branched polynucleotide having more than two ends.
In contrast, a non-enzymatic step can be used to mix, assemble, reassemble, recombine, and/or assemble polynucleotide building blocks, which involves subjecting the working sample to denaturing (or "dissolving") conditions (eg, by changing temperature, pH, and/or salinity conditions ) so that they combine the working set of double-stranded polynucleotides into single-stranded polynucleotides. In the case of mixing, it is preferred that the individual polynucleotide strands engage in some degree of hybridization with different hybridization partners (ie, rather than simply returning to exclusive rehybridization between former partners prior to the denaturation step). However, the presence of former hybridization partners in the reaction vessel does not preclude, and sometimes may even favor, the reassociation of a single-stranded polynucleotide with its former partner to reconstitute the original double-stranded polynucleotide.
In contrast to this non-enzymatic displacement step involving denaturation of double-stranded polynucleotide building blocks followed by hybridization, the invention further provides an exonuclease-based approach that is unlikely to require denaturation, avoiding denaturing conditions and retaining double-stranded polynucleotides in an annealed (ie, non-denatured) necessary conditions for the work of exonucleases (eg exonuclease III and red alpha gene product). In contrast, the generation of single-stranded polynucleotide sequences capable of hybridizing with other single-stranded polynucleotide sequences is the result of covalent cleavage—and thus sequence destruction—of one of the hybridization partners. For example, the enzyme exonuclease III can be used to enzymatically release 3'-terminal nucleotides in one hybridization strand (to achieve covalent hydrolysis in that polynucleotide strand); and this promotes hybridization of the remaining single strand with a new partner (since the previous partner has undergone covalent cleavage).
It is particularly appreciated that enzymes can be discovered, optimized (eg, designed by directed evolution), or both discovered and optimized specifically for the immediately discovered approach, which have more optimal rates and/or specific activities and/or greater absence of side effects. In fact, it is expected that the invention may encourage the discovery and/or development of such engineered enzymes.
Furthermore, it will be clear that the end of a double-stranded polynucleotide can be protected, if desired, or rendered susceptible to the desired enzymatic action of the exonuclease. For example, a double-stranded polynucleotide end with a 3' overhang is not susceptible to the exonuclease exonuclease III. However, it can be made susceptible to exonuclease III by various means; for example, it can be blunted by polymerase treatment, excised to produce a blunt end or 5' overhang, joined (ligated or hybridized) to another double-stranded polynucleotide to produce a blunt end or 5' overhang, hybridized to a single-stranded polynucleotide yielding a blunt end, or 5' overhang or modified in any way).
In one embodiment, the exonuclease can act on one or both ends of a linear double-stranded polynucleotide and work to completion, near completion, or partial completion. When the exonuclease action is complete, the result is that the length of each 5' overhang will extend far into the middle region of the polynucleotide toward what can be considered the "meeting point" (which may be somewhere near the middle of the polynucleotide). This ultimately results in the production of single-stranded polynucleotides (which can be dissociated), each approximately half the size of the original double-stranded polynucleotide.
Therefore, an exonuclease-mediated approach is useful for shuffling, assembling and/or reassembling, recombining and linking polynucleotide building blocks. Polynucleotide building blocks can be up to ten bases, or tens of bases, or hundreds of bases, or thousands of bases, or tens of thousands of bases, or hundreds of thousands of bases, or millions of bases, or even longer.
Exonuclease substrates can be prepared by fragmenting a double-stranded polynucleotide. Fragmentation can be achieved by mechanical means (eg cutting, sonication, etc.), enzymatic means (eg using restriction enzymes) and any combination thereof. Fragments of a larger polynucleotide can also be generated by polymerase-mediated synthesis.
Additional examples of enzymes with exonuclease activity include red alpha and venom phosphodiesterase. Red alpha (red alpha gene product (also called lambda exonuclease) originates from bacteriophage alpha. The red alpha gene product acts at the 5' phosphorylated ends, releasing mononucleotides from duplex DNA (Takahashi and Kobayashi, 1990). Phosphodiesterase poison (Laskowski, 1980) it is capable of rapidly unwinding supercoiled DNA.
In one aspect, the design of nucleic acid building blocks is obtained by sequence analysis of a set of nucleic acid templates that serve as the basis for generating a progeny set of finalized chimeric nucleic acid molecules. These progenitor nucleic acid templates therefore serve as a source of sequence information to aid in the design of nucleic acid building blocks to be mutagenized, i.e., chimerized or scrambled.
In one example, the invention provides chimerization of a family of related genes and a family of related products encoded by them. In a specific example, the encoded products are enzymes. These examples, while illustrating certain specific aspects of the invention, are not intended to be limiting or to describe the scope of the disclosed invention.
Accordingly, in accordance with one aspect of the invention, the sequences of multiple nucleic acid templates identified by the methods of the invention are aligned to select one or more demarcation points, which demarcation points may be located within a region of homology. Breakpoints can be used to delineate the building blocks of nucleic acids to be generated. Therefore, the demarcation points identified and selected in the progeny molecules serve as potential points of chimerization within the progeny molecules.
Typically, a demarcation point is a region of homology (consisting of at least one homologous nucleotide base) shared by at least two parent templates, but a demarcation point can be a region of homology shared by at least half of the parent templates with at least two-thirds of the progenitor templates at least three-quarters of the progenitor templates, and almost all progenitor templates are desirable. More preferably, the point of demarcation is a region of homology that is common to all ancestral templates.
In another aspect, the ligation reassembly process is performed exhaustively to generate an exhaustive library. In other words, all possible ordered combinations of building blocks of nucleic acids are represented in the set of finalized chimeric nucleic acid molecules. At the same time, the order of assembly (ie, the order of assembly of each building block in the 5' to 3 sequence of each finalized chimeric nucleic acid) in each combination is by design (or non-stochastic). Due to the non-stochastic nature of the invention, the possibility of unwanted by-products is greatly reduced.
In yet another aspect, the invention provides that the ligation reassembly process is carried out systematically, for example to create a systematically partitioned library with compartments that can be searched systematically, eg one by one. In other words, the invention ensures that through the selective and judicious use of specific nucleic acid building blocks, together with the selective and judicious use of sequentially graded assembly reactions, an experimental design can be achieved in which specific sets of progeny are produced in each of several reaction vessels. This enables a systematic review and screening process. In this way, it enables the systematic study of a potentially very large number of daughter molecules in smaller groups.
Due to its ability to perform chimerization in a manner that is highly flexible but also exhaustive and systematic, especially when there is a low level of homology between progenitor molecules, the present invention ensures the creation of a library (or set) containing a large number of progeny molecules. Due to the non-stochastic nature of the ligation reassembly invention, the generated progeny molecules preferably contain a library of finalized chimeric nucleic acid molecules in the overall order of assembly selected by design. In a particular embodiment, the library thus created consists of more than 103 to more than 10¹0000 different progeny molecular species.
In one aspect, a pool of finalized chimeric nucleic acid molecules prepared as described herein comprises a polynucleotide encoding a polypeptide. According to one embodiment, said polynucleotide is a gene, which may be a human-made gene. According to another embodiment, said polynucleotide is a gene pathway, which may be an artificial gene pathway. The invention contemplates that one or more artificial genes produced according to the invention may be incorporated into an artificial gene pathway, such as that found in a eukaryotic organism (including a plant).
In another example, the synthetic nature of the step in which the building blocks are generated allows for the design and insertion of nucleotides (eg, one or more nucleotides, which may be, for example, codons or introns or regulatory sequences), which can be optionally removed by an in vitro process ( eg by mutagenesis) or by an in vivo process (eg by exploiting the ability of the host organism to assemble genes). It should be noted that in many cases the introduction of these nucleotides may also be desirable for a number of reasons other than the potential benefit of creating a demarcation point.
Therefore, according to another aspect, the invention provides that a nucleic acid building block can be used to introduce introns. Therefore, the invention ensures that functional introns can be introduced into the artificial gene according to the invention. The invention also ensures that functional introns can be introduced into the artificial gene pathway of the invention. Accordingly, the invention provides for the production of a chimeric polynucleotide, which is an artificially produced gene containing one (or more) artificially introduced introns.
Accordingly, the invention also provides for the production of a chimeric polynucleotide, which is an artificial gene pathway consisting of one (or more) artificially introduced introns. Preferably, the artificially introduced introns are functional in one or more host cells for gene assembly in a manner similar to how natural introns are functional for gene assembly. The invention provides a method for the production of artificially generated intron-containing polynucleotides that are introduced into a host organism for recombination and/or splicing.
An artificial gene produced using the invention can also serve as a substrate for recombination with another nucleic acid. Similarly, a man-made gene pathway generated using the invention may also serve as a substrate for recombination with another nucleic acid. Preferably, recombination is facilitated by regions of homology or occurs within regions of homology between the engineered intron-containing gene and the nucleic acid that serves as the recombination partner. In a particularly preferred case, the recombination partner may also be a nucleic acid produced according to the invention, including an artificial gene or an artificial gene pathway. Recombination may be facilitated or may occur in regions of homology that exist in one (or more) of the artificially introduced introns in the man-made gene.
The synthetic ligation reassembly method of the present invention utilizes a plurality of nucleic acid building blocks, each of which preferably has two ends that can be ligated. The two bindable ends on each nucleic acid building block may be two blunt ends (ie, each having an overhang of zero nucleotides), or preferably one blunt end and one overhang, or more preferably two overhangs.
The overhang for this purpose can be a 3' overhang or a 5' overhang. Accordingly, a nucleic acid building block may have a 3' overhang or alternatively a 5' overhang or alternatively two 3' overhangs or alternatively two 5' overhangs. The general order in which the nucleic acid building blocks are assembled to form the finalized chimeric nucleic acid molecule is determined by deliberate experimental design and is not random.
According to one preferred embodiment, the nucleic acid building block is created by chemically synthesizing two single-stranded nucleic acids (also called single-stranded oligonucleotides) and contacting them to allow them to hybridize into a double-stranded nucleic acid building block.
The double-stranded building block of a nucleic acid can vary in size. The sizes of these building blocks can be small or large. Preferred building block sizes range from 1 base pair (no overhang) to 100,000 base pairs (no overhang). Other preferred size ranges are available having lower limits of 1 bp to 10,000 bp (including any integer value in between) and upper limits of 2 bp to 100,000 bp (including any integer value in between).
There are many ways in which a double-stranded nucleic acid building block useful for the invention can be made; and they are known in the art and can be easily made by a person skilled in the art.
According to one aspect, a double-stranded nucleic acid building block is created by first creating two single-stranded nucleic acids and allowing them to hybridize to form a double-stranded nucleic acid building block. The two strands of a double-stranded nucleic acid building block can be complementary at every nucleotide except those that form an overhang; therefore, it contains no inconsistencies other than any omissions. In another aspect, the two strands of the double-stranded nucleic acid building block are complementary at less than every nucleotide except each that forms an overhang. Thus, according to this embodiment, a double-stranded nucleic acid building block can be used to introduce codon degeneracy. Codon degeneracy is preferably introduced using site-saturation mutagenesis as described herein using one or more N,N,G/T cassettes or alternatively using one or more N,N,N cassettes.
The in vivo recombination method of the invention can be performed blindly on a pool of unknown hybrids or alleles of a particular polynucleotide or sequence. However, it is not necessary to know the actual DNA or RNA sequence of a particular polynucleotide.
A recombinant approach within a mixed population of genes can be useful for the production of any useful proteins, for example interleukin I, antibodies, tPA and growth hormone. This approach can be used to produce proteins with altered specificity or activity. This approach can also be useful for generating hybrid nucleic acid sequences, for example, promoter regions, introns, exons, enhancers, 31 untranslated regions, or 51 untranslated regions of genes. Therefore, this approach can be used to generate genes with increased expression. This approach can also be useful in the study of repetitive DNA sequences. Finally, this approach can be useful for mutating ribozymes or aptamers.
The invention provides a method for selecting a subset of polynucleotides from an initial set of polynucleotides, which method is based on the ability to distinguish between one or more selection features (or selection markers) present anywhere in the working polynucleotide to enable selection for (positive selection) and/or against (negative selection) of each selectable polynucleotide. In one aspect, a method called terminal selection is provided, which method is based on the use of a selectable marker located partially or completely in the terminal region of the polynucleotide being selected, and such selectable marker may be referred to as "terminal selection". -Tick".
The final selection may be based on detection of native sequences or on detection of experimentally introduced sequences (including any mutagenesis procedure mentioned or not mentioned herein) or both, even within the same polynucleotide. The final selection marker can be a structural selection marker or a functional selection marker or both a structural and a functional selection marker. The final selection of markers may consist of a polynucleotide sequence or a polypeptide sequence or any chemical structure or any biological or biochemical marker, including markers selectable by methods based on the detection of radioactivity, enzymatic activity, fluorescence, any optical characteristic, magnetic properties (eg using magnetic beads), immunoreactivity and hybridization.
Terminal selection can be used in combination with any mutagenesis method. Such methods of mutagenesis include, but are not limited to, the methods described herein (above and infra). Such methods include, by way of non-limiting examples, any method that may be referred to herein or by others in the art by any of the following terms: "saturation mutagenesis", "shuffling", "recombination", "reassembly", "PCR error prone" , "ensemble PCR", "sexual PCR", "crossover PCR", "oligonucleotide primer-directed mutagenesis", "recursive (and/or exponential) ensemble mutagenesis (see Arkin and Youvan, 1992)", "cassette mutagenesis", "in vivo mutagenesis" and "in vitro mutagenesis". Additionally, final selection can be performed on molecules produced by any method of mutagenesis and/or amplification (see, eg, screening for the presence) of the desired progeny molecules.
Additionally, terminal selection can be applied to a polynucleotide independently of any mutagenesis method. In one embodiment, the subsequent selection provided herein can be used to facilitate a cloning step, such as a step of linking to another polynucleotide (including linking to a vector). The invention therefore provides terminal selection as a means to facilitate library construction, selection and/or enrichment for desired polynucleotides, and cloning in general.
In another aspect, the final selection may be based on (positive) selection of polynucleotides; alternatively, final selection may be based on (negative) selection against polynucleotides; and still alternatively, the final selection may be based on both (positive) selection and (negative) selection against the polynucleotide. Terminal selection, together with other selection and/or screening methods, can be performed iteratively, with any combination of similar or different selection and/or screening methods and mutagenesis or site-directed evolution methods, all of which can be performed iteratively and at any order, combinations and permutations. It will also be clear that subsequent selection can also be used to select polynucleotides in: circular (eg, substituted with any chemical group or residue.
In one non-limiting aspect, end selection of a linear polynucleotide is performed using a general approach based on the presence of at least one end selection marker located at or near the end or end of the polynucleotide (which can be either the 5' end or the 3' end). In one particular non-limiting example, terminus selection is based on the selection of a particular sequence at or near the terminus, such as, but not limited to, a sequence recognized by an enzyme that recognizes a polynucleotide sequence. An enzyme that recognizes and catalyzes the chemical modification of a polynucleotide is referred to herein as a polynucleotide-acting enzyme. In a preferred embodiment, enzymes that act on polynucleotides include, but are not limited to, enzymes that have polynucleotide cleavage activity, enzymes that have polynucleotide methylation activity, enzymes that have polynucleotide ligation activity, and enzymes that have multiple enzymatic activities (including, for example, polynucleotide cleavage activity and polynucleotide binding).
It will be appreciated that suitable enzymes that act on polynucleotides include any enzymes that can be identified by one skilled in the art (eg, the sticky end in the polynucleotide. It may be desirable to use restriction sites that are not contained, or alternatively not expected to be restricted or alternatively are not likely to be involved (e.g. finally screened. It is recognized that methods (e.g. mutagenesis methods) can be used to remove undesirable internal restriction sites. It should also be noted that a partial digestion reaction (i.e. a digestion reaction that continues (e.g. to partial completion) can be used to achieve digestion at the recognition site in the terminal region while sparing a sensitive restriction site within the polynucleotide that is recognized by the same enzyme. In one aspect, partial digestion is useful because certain enzymes have been observed to show preferential cleavage of the same recognition sequence depending on the location and environment in which the recognition sequence occurs.
It should also be noted that protection methods can be used to selectively protect certain restriction sites (eg, internal sites) from unwanted digestion by enzymes that would otherwise cleave a functional polypeptide in response to the presence of those sites; and that such methods of protection include modifications such as methylations and base substitutions (eg, U instead of T) that inhibit undesired enzyme activity.
In another aspect of the invention, a useful end selection marker is an end sequence recognized by a polynucleotide enzyme that recognizes a specific polynucleotide sequence. In one aspect of the invention, useful enzymes that act on polynucleotides also include enzymes other than classical type II restriction enzymes. In accordance with this aspect of the invention, useful enzymes that act on polynucleotides also include gyrases (eg, topoisomerases), helicases, recombinases, relaxases, and all enzymes related thereto.
It should be noted that subsequent selection can be used to distinguish and separate parent molecules (eg for mutagenesis) from progeny molecules (eg produced by mutagenesis). For example, a first set of primers lacking the topoisomerase I recognition site can be used to modify the terminal regions of the parent molecules (eg, in polymerase-based amplification). Another second set of primers (eg, having a topoisomerase I recognition site) can then be used to generate mutant progeny molecules (eg, using any polynucleotide chimerization method such as interrupted synthesis, pattern-switching polymerase-based amplification, or interrupted synthesis; or using saturation mutagenesis or using any other method of introducing a topoisomerase I recognition site into the mutant progeny molecule) from amplified template molecules. The use of topoisomerase I-based final selection can then facilitate not only discrimination but also topoisomerase I-based selective ligation of the desired progeny molecules.
It will be appreciated that the terminal selection approach using topoisomerase-based nicking and ligation has several advantages over previously available selection methods. In short, this approach achieves directional cloning (including expression cloning).
This method can be used to mix, by in vitro and/or in vivo recombination, any of the disclosed methods and in any combination of polynucleotide sequences selected by peptide display methods, wherein the cognate polynucleotide encodes the displayed peptide screened for a phenotype (eg, affinity for predetermined receptor (ligand).
An increasingly important aspect of biopharmaceutical drug development and molecular biology is the identification of peptide structures, including primary amino acid sequences, peptides or peptidomimetics that interact with biological macromolecules. One method of identifying peptides that possess a desired structure or functional property, such as binding to a specific biological macromolecule (eg, assigned by the peptide's amino acid sequence.
In addition to direct chemical synthesis methods for generating peptide libraries, several recombinant DNA methods have also been described. One type involves the display of a peptide sequence, antibody, or other protein on the surface of a bacteriophage particle or cell. In general, in these methods each bacteriophage particle or cell serves as an individual member of the library, displaying one type of display peptide in addition to the native protein sequences of the bacteriophage or cell. Each bacteriophage or cell contains information about the nucleotide sequence that encodes the specific displayed peptide sequence; therefore, the displayed peptide sequence can be determined by determining the nucleotide sequence of an isolated library member.
A well-known method of peptide display involves displaying the peptide sequence on the surface of a filamentous bacteriophage, usually as a fusion with a bacteriophage coat protein. A bacteriophage library can be incubated with an immobilized, predetermined macromolecule or small molecule (e.g., to a specific macromolecule. Bacteriophage particles (ie, library members) bound to the immobilized macromolecule are then recovered and replicated to amplify a selected subpopulation of bacteriophages for the next round enrichments After several rounds of affinity enrichment and phage replication, members of the thus selected bacteriophage library are isolated and the nucleotide sequence encoding the displayed peptide sequence is determined, thereby identifying peptide sequences that bind to predetermined macromolecules (eg, gripper). Such methods are further described in PCT Patent Publications WO 91/17271, WO 91/18980, WO 91/19818 and WO 93/08278.
The present invention also provides random, pseudo-random, and defined framework peptide libraries and methods for generating and screening those libraries to identify useful compounds (eg, that modify peptides or RNA in a desired manner. Random, pseudo-random, and defined framework peptide sequences are generated from peptide libraries of library members containing the displayed peptides or the displayed single-chain antibodies bound to the polynucleotide template from which the displayed peptide was synthesized.The method of binding may vary depending on the particular selected embodiment of the invention and may include encapsulation in a phage particle or integration into a cell.
An important advantage of the present invention is that no prior information about the expected structure of the ligand is required to isolate the peptide ligands or antibodies of interest. The identified peptide may have biological activity, which means at least a specific binding affinity for the selected receptor molecule, and in some cases will further include the ability to block the binding of other compounds, stimulate or inhibit metabolic pathways, act as a signal or messenger, stimulate or inhibit cellular activity and similar to.
The invention also provides a method of mixing a group of polynucleotide sequences identified by the methods of the invention and selected by affinity screening of a nascent library of peptide display polysomes (including single-chain antibodies) for members of the library that bind to a predetermined receptor (e.g., a mammalian protein receptor such as, for example, a peptidergic hormone receptor, cell surface receptor, intracellular protein that binds to other protein(s) to form intracellular protein complexes such as heterodimers and the like), or epitope (eg immobilized protein, glycoprotein, oligosaccharide and the like).
Polynucleotide sequences selected in a first round of selection (usually by selection for receptor binding affinity (eg, recombinant to produce a mixed pool containing a population of recombinant selected polynucleotide sequences). Recombinant selected polynucleotide sequences are subjected to at least one further round of selection. selection rounds can be used directly , sequenced and/or subjected to one or more additional rounds of shuffling and subsequent selection Selected sequences can also be backcrossed with polynucleotide sequences encoding neutral sequences (ie, have a negligible functional effect on binding), such as, for example, by backcrossing with with a wild-type sequence or a native sequence that is substantially identical to the selected sequence to produce native-like functional peptides that may be less immunogenic.Typically, backcrossing uses sequence selection to maintain binding properties to a particular receptor (ligand).
Before or simultaneously with the mixing of the selected sequences, the sequences can be mutagenized. In one embodiment, selected library members are cloned into a prokaryotic vector (eg, plasmid, phagemid, or bacteriophage) that creates a collection of individual colonies (or plates) representing the individual library members. Individual selected library members can then be manipulated (eg, site-directed mutagenesis, cassette mutagenesis, chemical mutagenesis, PCR mutagenesis, and the like) to create a collection of library members that represents a core of sequence diversity based on the sequence of the selected library member. The sequence of one selected library member or pool can be manipulated to include random mutation, pseudo-random mutation, defined nuclear mutation (ie, amino acid residues), codon-based mutation, and the like, segmentally or over the entire length of one selected library member sequence. Mutated selected library members are then mixed in vitro and/or in vivo by recombinant mixing as disclosed herein.
The invention also provides peptide libraries comprising a plurality of individual library members of the invention, wherein (1) each individual library member comprises a sequence formed by shuffling a set of selected sequences, and (2) each individual library member comprises a peptide variable segment sequence or a single chain antibody segment sequence that is distinct from the variable peptide segment sequence or single chain antibody sequence of other individual library members in said population (although some library members may be present in more than one copy per library due to nonuniform amplification, stochastic probability, etc.).
The invention also provides a post-processing product in which selected polynucleotide sequences having (or encoding a peptide having) a predetermined binding specificity are generated by: (1) screening an exposed peptide or an exposed library of single-chain antibodies against a predetermined receptor (e.g., a ligand) or epitope (eg, an antigenic macromolecule) and identifying and/or enriching library members that bind to a predetermined receptor or epitope to generate a pool of selected library members, (2) recombinantly mixing the selected library members (or amplified or cloned copies thereof) that binds a particular epitope and is thus isolated and/or enriched from the library to produce a shuffled library, and (3) screening the shuffled library for a predetermined receptor (eg, ligand) or epitope (eg, antigen macromolecule)) and identifying and/or enriching the shuffled library members that bind to the predetermined receptor or epitope to generate a pool of selected shuffled library members.
This method can be used to mix, by in vitro and/or in vivo recombination, any of the disclosed methods and in any combination of polynucleotide sequences selected by antibody display methods, wherein the cognate polynucleotide encodes the displayed antibody, which is screened for phenotype (e.g., affinity for binding to a predetermined antigen (ligand).
Various approaches to molecular genetics have been developed to encompass the vast immune repertoire represented by the extremely large number of different variable regions that can be present in immunoglobulin chains. The naturally occurring germline immunoglobulin heavy chain locus consists of separate tandem arrays of variable segment genes located upstream of a tandem array of diversity segment genes that are themselves located upstream of a tandem array of linker (i) region genes that are located upstream of the constant region genes. During B lymphocyte development, a V-D-J rearrangement occurs in which the heavy chain (VH) variable region gene rearranges to form a fused D segment followed by rearrangement with the V segment to form a fused V-D-J gene product that, if productively rearranged, encodes functional heavy chain variable region (VH). Similarly, light chain loci rearrange one of several V segments with one of several J segments to form a light chain variable region (VL) gene.
The vast repertoire of variable regions possible in immunoglobulins is due in part to the numerous combinatorial possibilities for combining V and i segments (and, in the case of heavy chain loci, D segments) during redistribution in B cell development. Additional sequence diversity in the severe variable chain regions is caused by uneven rearrangements of the D segments when combining V-D-J and the addition of the N region. In addition, antigenic selection of specific B cell clones selects for higher-affinity variants that have non-germline mutations in one or both of the severe and light chain variable regions; a phenomenon called "affinity maturation" or "affinity sharpening". Typically, these "affinity sharpening" mutations are concentrated in specific regions of the variable region, most commonly complementarity determining regions (CDRs).
In order to overcome many limitations in the production and identification of high-affinity immunoglobulins by antigen-stimulated β-cell development (ie, antibodies with high affinity to specific antigens). Recent advances in antibody expression in Escherichia coli and bacteriophage systems below) have increased the possibility that almost any specificity can be obtained either by cloning antibody genes from characterized hybridomas or by de novo selection using antibody gene libraries (eg from Ig cDNA).
Combinatorial antibody libraries have been generated in bacteriophage lambda expression systems that can be screened as bacteriophage plaques or as lysogen colonies (Huse et al., 1989); Caton and Koprowski, 1990; Mullinax et al., 1990; Persson et al., 1991). Various embodiments of bacteriophage antibody display libraries and lambda phage expression libraries have been described (Kang et al., 1991; Clackson et al., 1991; McCafferty et al., 1990; Burton et al., 1991; Hoogenboom et al. et al., 1991, Chang et al., 1991, Breitling et al., 1991, Marks et al., 1991, p. 581, Barbas et al., 1992, Hawkins and Winter, 1992, Marks et al., 1992, p. 779, et al., 1992, p. 16007, and Lowman et al., 1991; Lerner et al., 1992; all incorporated herein by reference). Typically, a bacteriophage antibody display library is screened using a receptor (eg, polypeptide, carbohydrate, glycoprotein, nucleic acid) that is immobilized (eg, (eg, for plaque or colony screening).
One particularly advantageous approach has been the use of so-called single-stranded variable fragment (scfv) libraries (Marks et al., 1992, p. 779; Winter and Milstein, 1991; Clackson et al., 1991; Marks et al., 1991, p. 581; Chaudhary et al., 1990; Chiswell et al., 1992; McCafferty et al., 1990; and Huston et al., 1988). Various embodiments of scfv libraries displayed on bacteriophage envelope proteins are described.
Beginning in 1988, single-chain analogs of Fv fragments and their fusion proteins were reliably generated by antibody engineering. The first step generally involves obtaining genes encoding VH and VL domains with the desired binding properties; these V genes can be isolated from a specific hybridoma cell line, selected from a combinatorial V gene library, or produced by V gene synthesis. A single-chain Fv is generated by linking the component V gene to an oligonucleotide encoding a suitably designed linker peptide such as (Gly-Gly-Gly-Gly -Ser (SEQ ID NO:81)) or equivalent binding peptide(s). A linker connects the C-terminus of the first V region and the N-terminus of the second, organized as VH-linker-VL or VL-linker-VH'. In general, the scfv binding site can faithfully replicate both the affinity and specificity of the linker site of the parent antibody.
Thus, scfv fragments consist of VH and VL domains linked into a single polypeptide chain by a flexible linker peptide. After assembly, the scfv genes are cloned into a phagemid and expressed at the end of phage M13 (or a similar filamentous bacteriophage) as fusion proteins with the bacteriophage coat protein PIII (gene 3). Enrichment of phage expressing the antibody of interest is performed by screening recombinant phage displaying the scfv population for binding to a predetermined epitope (eg, target antigen, receptor).
The polynucleotide of the linked library member provides the basis for replication of the library member after the search or selection process, and also provides the basis for determining, by nucleotide sequencing, the identity of the displayed peptide sequence or VH and VL amino acid sequences. The depicted peptide(s) or single chain antibody (eg, scfv) and/or its VH and VL domains or their CDRs can be cloned and expressed in a suitable expression system. Often, polynucleotides encoding the isolated VH and VL domains will be combined with polynucleotides encoding the constant regions (CH and CL) to produce polynucleotides encoding complete antibodies (eg, chimeric or fully human), antibody fragments, and the like. Often, polynucleotides encoding isolated CDRs will be grafted to polynucleotides encoding the appropriate variable region (and optionally constant region) framework to form polynucleotides encoding complete antibodies (eg, humanized or fully human), antibody fragments, and the like. Antibodies can be used to isolate preparative amounts of antigen by immunoaffinity chromatography. Various other uses of such antibodies are in the diagnosis and/or staging of disease (eg, cancer) and therapeutic use for the treatment of disease, such as, for example, cancer, autoimmune disease, AIDS, cardiovascular disease, infections, and the like.
Various methods have been described to increase the combinatorial diversity of the scfv library to expand the binding species repertoire (idiotype spectrum). The use of PCR has enabled the rapid cloning of variable regions from a specific hybridoma source or as a library of non-immunized cells, providing combinatorial diversity in the range of VH and VL cassettes that can be combined. Additionally, the VH and VL cassettes themselves may differ, for example, by random, pseudorandom, or site-directed mutagenesis. Typically, the VH and VL cassettes are differentiated at or near the complementarity determining regions (CDRS), often at the third CDR, CDR3. Enzymatic reverse mutagenesis PCR has been shown to be a simple and reliable method for constructing relatively large libraries of targeted scfv hybrids (Stemmer et al., 1993), as have error-prone PCR and chemical mutagenesis (Deng et al., 1994). Riechmann (Riechmann et al., 1993) demonstrated the semi-rational design of scfv antibody fragments using site-directed PCR randomization of degenerate oligonucleotides followed by phage display of the resulting scfv hybrids. Barbas (Barbas et al., 1992) attempted to circumvent the limited repertoire size resulting from the use of biased variable region sequences by randomizing the sequences in the synthetic human tetanus toxoid Fab binding CDR region.
CDR randomization has the potential to generate approximately 1 × 10²⁰ CDRs for heavy chain CDR3 alone and roughly similar numbers of heavy chain CDR1 and CDR2 variants and light chain CDR1-3 variants. Taken individually or together, the combination possibilities of randomizing the heavy and/or light chain CDRs require the generation of an insufficient number of bacteriophage clones to generate a clone library representing all possible combinations, the vast majority of which will be non-binding. Generation of such a large number of primary transformants is not possible with current bacteriophage transformation technology and display systems. For example, Barbas (Barbas et al., 1992) generated only 5 × 10⁷ transformants, which is only a small fraction of the potential diversity of a library of completely random CDRs.
Despite these fundamental limitations, display of scfv on bacteriophages has already yielded many useful antibodies and antibody fusion proteins. A bispecific single-chain antibody has been shown to mediate efficient tumor cell lysis (Gruber et al., 1994). Intracellular expression of anti-Rev scfv has been shown to inhibit HIV-1 replication in vitro (Duan et al., 1994), and intracellular expression of anti-p2lrar, scfv has been shown to inhibit meiotic maturation of Xenopus oocytes (Biocca et al., 1993). Recombinant scfv that can be used to diagnose HIV infection have also been described, demonstrating the diagnostic utility of scfv (Lilley et al., 1994). Fusion proteins in which the scFv is fused to another polypeptide, such as a toxin or fibrinolysis activator protein, have also been described (Holvost et al., 1992; Nicholls et al., 1993).
If it were possible to generate scfv libraries with greater antibody diversity and overcome many limitations of conventional CDR mutagenesis and randomization methods, which can cover only a very small fraction of potential sequence combinations, the number and quality of scfv antibodies suitable for therapy and diagnostic use could be greatly improved. To overcome this, the in vitro and in vivo mixing methods of the invention are used to recombine CDRs that are obtained (usually by PCR amplification or cloning) from nucleic acids derived from the selected presented antibodies. Such displayed antibodies can be displayed on cells, on bacteriophage particles, on polysomes, or any suitable antibody display system in which the antibody is linked to the encoding nucleic acid(s). In a variant, the CDRs are initially derived from mRNA (or cDNA) from antibody-producing cells (eg, WO 92/03918, WO 93/12227 and WO 94/25585), including hybridomas derived therefrom.
Polynucleotide sequences selected in the first round of selection (usually by selecting affinity for proven antibody binding to antigen (eg, in vivo recombination, especially CDR shuffling (usually shuffling of a heavy chain CDR with another heavy chain CDR and a light chain CDR) with other light chain CDRs ) to create a shuffled pool containing a population of recombinant selected polynucleotide sequences. The recombinant selected polynucleotide sequences are expressed in a selection format as the presented antibody and subjected to at least one further round of selection. additional rounds of shuffling and subsequent selection until an antibody with the desired binding affinity is obtained .Selected sequences can also be backcrossed with polynucleotide sequences encoding neutral antibody framework sequences (ie, having a negligible functional effect on antigen binding), such as, for example, backcrossing from the human variable region frame to produce antibodies with a sequence similar human. Generally, during backcrossing, further selection is used to maintain binding properties for a particular antigen.
Alternatively, or in combination with the above variants, the valency of the target epitope can be changed to control the average binding affinity of selected members of the scfv library. A target epitope can be bound to a surface or substrate at different densities, for example by incorporation of a competitive epitope, by dilution, or by any other method known to those skilled in the art. A high valence density of a predetermined epitope can be used to enrich for scfv library members that have relatively low affinity, while a low valence density can preferentially enrich for higher affinity scfv library members.
To generate different variable segments, a collection of synthetic oligonucleotides encoding a random, pseudo-random, or predetermined set of core peptide sequences can be ligated into a predetermined site (eg, a CDR). Similarly, the sequence diversity of one or more CDRs of the antibody single-chain cassette(s) can be expanded by mutating the CDRs by site-directed mutagenesis, CDR substitution, and the like. The resulting DNA molecules can be propagated in a host for cloning and amplification before shuffling, or they can be used directly (ie, the loss of diversity that can occur during propagation in the host cell can be avoided) and then the selected members of the library are shuffled.
The displayed peptide/polynucleotide complexes (library members) encoding the variable segment peptide sequence of interest or the single chain antibody of interest were selected from the library by affinity enrichment. This is accomplished with an immobilized macromolecule or epitope specific to the peptide sequence of interest, such as a receptor, another macromolecule, or another type of epitope. Repeating the affinity selection process ensures enrichment of library members encoding the desired sequences, which can then be isolated for assembly and shuffling, sequencing and/or further expansion and affinity enrichment.
Library members without the desired specificity are removed by washing. The degree and severity of washing required will be determined for each peptide sequence or single chain antibody of interest and immobilized predetermined macromolecule or epitope. A certain degree of control can be exerted on the binding characteristics of the resulting peptide/DNA complexes by adjusting the binding incubation conditions and subsequent washing. Temperature, pH, ionic strength, divalent cation concentration, and wash volume and duration will select the resulting peptide/DNA complexes within specific affinity ranges for the immobilized macromolecule. Selection based on slow dissociation rate, which usually predicts high affinity, is often the most practical route. This can be done either by continuing the incubation in the presence of a saturating amount of free specific macromolecules or by increasing the volume, number and length of washes. In each case, rebinding of the dissociated peptide/DNA or peptide/RNA complex is prevented, and the nascent peptide/DNA or peptide/RNA complexes recover with increasing affinity over time.
Additional modifications of the binding and washing procedures can be used to find peptides with special properties. The affinity of some peptides depends on the ionic strength or cation concentration. This is a useful feature of peptides to be used in affinity purification of various proteins when mild conditions are required to remove proteins from peptides.
One variation involves the use of multiple binding targets (multiple epitope types, multiple receptor types) so that the scfv library can be simultaneously screened for multiple scfvs having different binding specificities. Since the size of a scfv library often limits the diversity of potential scfv sequences, it is usually desirable to make scfv libraries as large as possible. The time and economic considerations involved in creating many very large polysomal scFv libraries for display can become prohibitive. To avoid this significant problem, multiple specific epitope types (receptor types) can be screened simultaneously in a single library, or multiple epitope types can be screened sequentially. In one embodiment, multiple target species epitopes, each encoded on a separate bead (or subset of beads), can be mixed and incubated with the scfv polysome display library under appropriate binding conditions. A pool of beads, containing multiple epitope types, can then be used to isolate, by affinity selection, members of the scfv library. In general, subsequent rounds of affinity screening may include the same mixture of beads, subsets thereof, or beads containing only one or two individual epitope types. This approach ensures efficient screening and is compatible with laboratory automation, batch processing, and high-throughput screening methods.
A variety of techniques can be used in the present invention to diversify a peptide library or a single chain antibody library, or to diversify, before or simultaneously with mixing, around variable segment peptides that have been found in early rounds of selection to have sufficient binding activity to a predetermined macromolecule or epitope . In one approach, positive selected peptide/polynucleotide complexes (those identified in an early round of affinity enrichment) are sequenced to determine the identity of active peptides. Oligonucleotides based on these active peptide sequences are then synthesized using a low level of total base incorporation at each step to produce minor changes in the main oligonucleotide sequences. This mixture of (slightly) degenerate oligonucleotides is then cloned into variable segment sequences at the appropriate sites. This method produces systematic, controlled variations of the starting peptide sequences, which can then be mixed. However, this requires sequencing individual positive nascent peptide/polynucleotide complexes prior to mutagenesis, and is therefore useful for expanding the diversity of the small number of recovered complexes and selecting variants with higher binding affinities and/or higher binding specificities. In a variant, mutagenic PCR amplification of positive selected peptide/polynucleotide complexes (especially variable region sequences whose amplification products are mixed in vitro and/or in vivo and one or more additional rounds of screening are performed prior to sequencing. The same general approach can be used with Mutageni antibodies oligonucleotides capable of in vitro recombination with selected library members can be incorporated, usually by diversifying CDRs or flanking frames before or simultaneously with shuffling, to increase diversity and improve binding affinity/specificity. PCR (synthesized by error-prone or high-fidelity methods ) can be added to the in vitro mixed mixture and included in the resulting shufflants.
The invention of shuffling allows the creation of a huge library of single-chain antibody CDR variants. One way to generate such antibodies is to insert synthetic CDRs into a single-chain antibody and/or randomize the CDRs before or simultaneously with mixing. Synthetic CDR cassette sequences are selected from known human CDR sequence data and are selected at the discretion of the physician according to the following guidelines: synthetic CDRs will have at least 40 percent sequence identity to known CDR sequences, and preferably have at least 50 to 70 percent positional sequence identity to known CDR sequences. For example, a set of synthetic CDR sequences can be generated by synthesizing a set of oligonucleotide sequences from the natural human CDR sequences listed in Kabat (Kabat et al., 1991); the set(s) of synthetic CDR sequences are calculated to encode CDR peptide sequences that share at least 40 percent sequence identity with at least one known native human CDR sequence. Alternatively, a collection of naturally occurring CDR sequences can be compared to generate consensus sequences such that amino acids that are frequently used at a residue position (ie, in at least 5 percent of known CDR sequences) are incorporated into synthetic CDRs at the appropriate ( S ) position. Typically, several (eg, from 3 to about 50) known CDR sequences are compared and the observed natural sequence variations between the known CDRs are tabulated, and a set of oligonucleotides encoding CDR peptide sequences containing all or most permutations of the observed natural variations is synthesized. sequences. . For example, but not limited to, if a set of human VH CDR sequences contains carboxy-terminal amino acids that are Tyr, Val, Phe, or Asp, then the set(s) of synthetic oligonucleotide CDR sequences are designed to allow the carboxy-terminal CDR residue(s) ) can be any of these amino acids. In some embodiments, residues other than those naturally occurring at a residue position in a set of CDR sequences are included: conservative amino acid substitutions are often included and up to 5 residue positions can be changed to include non-conservative amino acid substitutions compared to known native CDR sequences. Such CDR sequences can be used in primary library members (before the first round of screening) and/or can be used for in vitro shuffling reactions of selected library member sequences. Construction of such sets of defined and/or degenerate sequences will be readily accomplished by those of ordinary skill in the art.
A set of synthetic CDR sequences includes at least one element not known to be a natural CDR sequence. It is up to the practitioner to include or not include a portion of the random or pseudo-random sequence corresponding to the addition of the N region to the heavy chain CDRs; the sequence of the N region ranges from 1 nucleotide to about 4 nucleotides located at the junctions V-D and DJ. A set of synthetic heavy chain CDR sequences includes at least about 100 unique CDR sequences, typically at least about 1,000 unique CDR sequences, preferably at least about 10,000 unique CDR sequences, often more than 50,000 unique CDR sequences; however, there are usually no more than about 1x10⁶ unique CDR sequences in a collection, although occasionally there are between 1x107 and 1x10⁸ unique CDR sequences, especially if conservative amino acid substitutions are allowed at positions where a conservative amino acid substitution is absent or rare (ie, less than 0, 1 percent) at that position in the natural human CDRS. In general, the number of unique CDR sequences contained in the library should not exceed the expected number of primary transformants in the library by more than a factor of 10. Such single chain antibodies generally bind at least preferably with an affinity of at least about 5x10⁷ M-1 , more preferably with an affinity of at least 1x10⁸ M -1 to 1x109 M-1 or more, sometimes up to 1x1010 M-1 or more. Often, the predetermined antigen is a human protein, such as, for example, a human cell surface antigen (eg, CD4, CD8, IL-2 receptor, EGF receptor, PDGF receptor), another human biological macromolecule (eg, carbohydrate antigen, Lewis sialyl antigen, L-selectin) or non-human disease-associated macromolecule (eg, bacterial LPS, virion capsid protein or envelope glycoprotein), and the like.
High-affinity single-chain antibodies with the desired specificity can be constructed and expressed in a variety of systems. For example, scfvs have been produced in plants (Firek et al., 1993) and can be readily produced in prokaryotic systems (Owens and Young, 1994; Johnson and Bird, 1991). In addition, single-chain antibodies can be used as a basis for the construction of whole antibodies or different fragments thereof (Kettleborough et al., 1994). The sequence encoding the variable region can be isolated (eg, by PCR amplification or subcloning) and fused to the desired sequence encoding the human constant region to encode a human antibody sequence more suitable for human therapeutic applications, where immunogenicity is preferably minimized. The polynucleotide(s) having the resulting fully human coding sequence(s) can be expressed in a host cell (eg, from a mammalian cell expression vector) and purified into a pharmaceutical composition.
After the antibody is expressed, individual mutated immunoglobulin chains, mutated antibody fragments, and other immunoglobulin polypeptides of the invention can be purified by standard procedures in the art, including ammonium sulfate precipitation, fractionated column chromatography, gel electrophoresis, and the like (see Scopes, 1982 generally). Once purified, partially or to the desired homogeneity, the polypeptides can then be used therapeutically or in the development and performance of assay procedures, immunofluorescence staining, and the like (see generally, Lefkovits and Pernis, 1979 and 1981; Lefkovits, 1997).
Antibodies produced by the method of the present invention can be used for diagnosis and therapy. By way of illustration and not limitation, they may be used to treat cancer, autoimmune diseases, or viral infections. For the treatment of cancer, antibodies will typically bind to an antigen that is preferentially expressed on cancer cells, such as erbB-2, CEA, CD33, and many other antigens and binding members well known to those skilled in the art.
Shuffling can also be used for recombinational diversification of a pool of selected library members obtained by screening a two-hybrid screening system to identify library members that bind a predetermined polypeptide sequence. Selected library members are combined and mixed by in vitro and/or in vivo recombination. The shuffled pool can then be screened in a yeast two-hybrid system to select library members that bind said predetermined polypeptide sequence (eg, an SH2 domain) or that bind an alternative predetermined polypeptide sequence (eg, an SH2 domain from a different type of protein).
An approach to identifying polypeptide sequences that bind to a predetermined polypeptide sequence has been to use a so-called "two-hybrid" system, in which the predetermined polypeptide sequence is present in a fusion protein (Chien et al., 1991). This approach identifies protein-protein interactions in vivo by reconstitution of a transcriptional activator (Fields and Song, 1989), the yeast Gal4 transcription protein. Typically, the method is based on the properties of the yeast Gal4 protein, which consists of separate domains responsible for DNA binding and transcription activation. Polynucleotides encoding two hybrid proteins, one consisting of a yeast Gal4 DNA binding domain fused to a polypeptide sequence of a known protein, and the other consisting of a Gal4 activation domain fused to a polypeptide sequence of another protein, are constructed and introduced into a yeast host cell. Intermolecular binding between the two fusion proteins reconstitutes the Gal4 DNA-binding domain with the Gal4 activation domain, resulting in transcriptional activation of a reporter gene (eg lacz, HIS3) operably linked to the Gal4 binding site. Typically, the two-hybrid method is used to identify new polypeptide sequences that interact with a known protein (Silver and Hunt, 1993; Durfee et al., 1993; Yang et al., 1992; Luban et al., 1993; Hardy et al., et al., 1992; Bartel et al., 1993; and Vojtek et al., 1993). However, variations of the two-hybrid method have been used to identify mutations in a known protein that affect its binding to another known protein (Li and Fields, 1993; Lalo et al., 1993; Jackson et al., 1993; and Madura et al., 1993 .). Two-hybrid systems have also been used to identify interacting structural domains of two known proteins (Bardwell et al., 1993; Chakrabarty et al., 1992; Staudinger et al., 1993; and Milne and Weaver 1993) or domains responsible for oligomerization of a single protein (Iwabuchi et al., 1993; Bogerd et al., 1993). Variations of two-hybrid systems have been used to study the activity of proteolytic enzymes in vivo (Dasmahapatra et al., 1992). Alternatively, the E. coli/BCCP interactive screening system (Germino et al., 1993; Guarente, 1993) can be used to identify interacting protein sequences (ie, protein sequences that heterodimerize or form higher order heteromultimers). The sequences selected by the two-hybrid system can be combined and mixed and entered into the two-hybrid system for one or more subsequent rounds of screening to identify polypeptide sequences that bind to the hybrid containing the predetermined binding sequence. The sequences thus identified can be compared to identify the consensus sequence (sequence) and the core consensus sequence.
Samples of one microgram of template DNA are taken and treated with UV radiation. light causes the formation of dimers, including TT dimers, especially purine dimers. UV exposure is limited so that only a few photoproducts are generated per gene in a DNA sample sample. Many samples are treated with UV rays. light for different periods of time to obtain representative DNA samples with different numbers of UV dimers. exposure.
A random primer set using a non-correction polymerase (for example, the Prime-It II Random Primer Labeling kit from Stratagene Cloning Systems) is used to generate polynucleotides of various sizes starting at random locations on templates prepared by U.V. light (as described above) and stretching along the pattern. Priming protocols, such as those described in the Prime-It II Random Primer Labeling kit, can be used to extend primers. Dimers formed by U.V. exposure serves as a block to expansion by a non-corrective polymerase. Therefore, a pool of polynucleotides of random size is present after completion of extension with random primers.
The invention further relates to a method of generating a selected mutant polynucleotide sequence (or a population of selected polynucleotide sequences) typically in the form of amplified and/or cloned polynucleotides, wherein the selected polynucleotide sequences possess at least one desired phenotype feature (e.g., encodes a polypeptide, promotes transcription of spliced polynucleotides, binding protein and the like) for which selection can be performed. One method of identifying hybrid polypeptides that possess a desired structural or functional property, such as binding to a particular biological macromolecule (eg, a property conferred by the polypeptide's amino acid sequence.
In one aspect, the present invention provides a method for generating libraries of display polypeptides or display antibodies suitable for affinity interaction screening or phenotypic screening. The method comprises (1) obtaining a first plurality of selected library members comprising a display polypeptide or a display antibody and a cognate polynucleotide encoding said display polypeptide or a display antibody, and obtaining said cognate polynucleotides or copies thereof, wherein said cognate polynucleotides comprise a region of substantially identical sequence, optimally introducing mutations into said polynucleotides or copies, (2) combining the polynucleotides or copies, (3) producing smaller or shorter polynucleotides by interrupting the random or specific initiation process and the synthesis or amplification process, and (4) performing amplification, by the possibilities of PCR amplification and selective mutagenesis until homologous recombination of newly synthesized polynucleotides.
It is an object of the invention to provide a method for producing hybrid polynucleotides that express a useful hybrid polypeptide in a series of steps including:
(a) production of polynucleotides by interrupting the amplification or synthesis of polynucleotides with means to block or interrupt the amplification or synthesis process and thereby ensure multiple smaller or shorter polynucleotides by replicating polynucleotides at different stages of completion; (b) adding to the resulting population of single-stranded or double-stranded polynucleotides one or more single-stranded or double-stranded oligonucleotides, wherein said added oligonucleotides contain a region identical in heterology with one or more single-stranded or double-stranded polynucleotides of the population; (c) denaturing the resulting single-stranded or double-stranded oligonucleotides in order to produce a mixture of single-stranded polynucleotides, optionally separating shorter or smaller polynucleotides into groups of polynucleotides of different lengths, and optionally further subjecting said polynucleotides to a PCR procedure to amplify one or more oligonucleotides contained in at least one of said polynucleotide sets; (d) incubating a plurality of said polynucleotides or at least one set of said polynucleotides with polymerase under conditions that result in hybridization of said single-stranded polynucleotides in identical regions between single-stranded polynucleotides, thereby forming a mutated double-stranded polynucleotide chain; (e) optionally repeating steps (c) and (d); (f) expression of at least one hybrid polypeptide from said polynucleotide chain or chains; and (g) screening said at least one hybrid polypeptide for beneficial activity.
In one aspect of the invention, the method of blocking or interrupting the process of amplification or synthesis is the use of UV light, DNA adducts, DNA binding proteins.
In one aspect of the invention, DNA adducts or polynucleotides containing DNA adducts are removed from the polynucleotide or set of polynucleotides, for example, by a process that includes heating a solution containing DNA fragments prior to further processing.
In another aspect, clones identified as having a biomolecule or bioactivity of interest can also be sequenced to identify a DNA sequence encoding a polypeptide (eg, an enzyme) or the polypeptide sequence itself that has, for example, a particular activity. Therefore, in accordance with the present invention, it is possible to isolate and identify: (i) DNA encoding a biological activity of interest (e.g. an enzyme with a specific enzymatic activity), (ii) biomolecules (e.g. polynucleotides or enzymes with such activity (including their amino acid sequence)) and (iii) produce recombinant biomolecules or bioactivities.
Suitable clones (eg, 1-1000 or more clones) from the library are identified by the methods of the invention and sequenced using, for example, high-throughput sequencing techniques. The exact method of sequencing is not a limiting factor of the invention. Any method useful for sequence identification of a particular cloned DNA sequence can be used. In general, sequencing is an adaptation of the natural process of DNA replication. Therefore, a template (eg a vector) and primer strings are used. One general template preparation and sequencing protocol begins with the automatic collection of bacterial colonies, each containing a separate DNA clone that will act as a template for the sequencing reaction. Selected clones are placed in the medium and grown overnight. The DNA templates are then purified from the cells and resuspended in water. DNA quantification is followed by high-throughput sequencing using sequencers such as Applied Biosystems, Inc., Prism 377 DNA Sequencers. The resulting sequence data can then be used in additional methods, including database or database searches.
Many source databases are available containing a nucleic acid sequence and/or a deduced amino acid sequence for use in the invention to identify or determine the activity encoded by a particular polynucleotide sequence. All or a representative portion of the sequences (eg around 100 individual clones) to be tested are used to search a sequence database (eg GenBank, PFAM or ProDom), either simultaneously or individually. Many different methods for performing such sequential searches are known in the art. Databases can be specific to a particular organism or collection of organisms. For example, there are databases for C. elegans, Arabadopsis. sp., M. genitaium, M. jannaschii, E. coli, H. influenzae, S. cerevisiae and others. The cloned sequence data are then compared to the sequences in the database or databases using algorithms designed to measure homology between two or more sequences.
In some cases, it may be desirable to express a particular cloned polynucleotide sequence after determining its identity or activity, or associating a proposed identity or activity with the polynucleotide. In such cases, the desired clone, if not already cloned into an expression vector, is ligated downstream of a regulatory control element (eg, promoter or enhancer) and cloned into a suitable host cell. Expression vectors are commercially available along with suitable host cells for use in the invention.
As representative examples of expression vectors that can be used, viral particles, baculovirus, phage, plasmids, phagemids, cosmids, fosmids, bacterial artificial chromosomes, viral nucleic acid (eg, SV40), P1-based artificial chromosomes, yeast plasmids, artificial yeast chromosomes and any other vectors specific to particular hosts of interest (such as Bacillus, Aspergillus, yeasts, and the like). in any of a number of expression vectors for expression of the polypeptide. Such vectors include chromosomal, non-chromosomal and synthetic DNA sequences. A large number of suitable vectors are known to those skilled in the art and are commercially available. The following vectors are examples; Bacterial: pQE70, pQE60, pQE-9 (Qiagen), psiX174, pBluescript SK, pBluescript KS, pNH8A, pNH16a, pNH18A, pNH46A (Stratagene); pTRC99a, pKK223-3, pKK233-3, pDR540, pRIT5 (Pharmacia); Eukaryotic: pWLNEO, pSV2CAT, pOG44, pXT1, pSG (Stratagene), pSVK3, pBPV, pMSG, pSVL (Pharmacia). However, any other plasmid or vector can be used as long as it is replicable and viable in the host.
The nucleic acid sequence in the expression vector is operably linked to a suitable expression control sequence (promoter) to direct mRNA synthesis. Specifically named bacterial promoters include lacI, lacZ, T3, T7, gpt, lambda PR, PL and trp. Eukaryotic promoters include CMV direct early, HSV thymidine kinase, SV40 early and late, retrovirus LTR, and mouse metallothionein-I. Selection of a suitable vector and promoter is within the skill of the ordinary person. The expression vector also contains a ribosome binding site for translation initiation and a transcription terminator. The vector may also contain appropriate sequences to enhance expression. Promoter regions can be selected from any desired gene using CAT (chloramphenicol transferase) vectors or other vectors with selectable markers.
Additionally, expression vectors typically contain one or more selectable marker genes to provide a phenotypic trait for selecting transformed host cells, such as dihydrofolate reductase resistance or neomycin resistance in eukaryotic cell culture, or such as tetracycline or ampicillin resistance in E. coli.
The selected, cloned and sequenced nucleic acid sequences as described above can be further introduced into a suitable host to prepare a library that is screened for the desired biomolecule or biological activity. The selected nucleic acid is preferably already contained in a vector containing suitable control sequences by which the selected nucleic acid encoding the biomolecule or bioactivity can be expressed to detect the desired activity. The host cell may be a higher eukaryotic cell, such as a mammalian cell, or a lower eukaryotic cell, such as a yeast cell, or the host cell may be a prokaryotic cell, such as a bacterial cell. Selection of a suitable host is believed to be within the skill of one skilled in the art based on the teachings contained herein.
In some cases, it may be desirable to amplify a nucleic acid sequence present in a sample or a particular clone that has been isolated. In this embodiment, the nucleic acid sequence is amplified by PCR or a similar reaction known to those skilled in the art. Commercially available amplification kits are available to perform such amplification reactions.
Furthermore, it is important to recognize that the matching algorithms and searchable database can be implemented in hardware, software, or a combination thereof. Therefore, the isolation, processing and identification of a nucleic acid or polypeptide sequence can be performed in an automated system.
In addition to the sequence-based techniques described above, there are many traditional assay systems for measuring enzyme activity using multiwell plates. For example, existing screening technology is typically based on two-dimensional plates (eg, 96, 384, and 1536 wells). The present invention also provides a capillary array approach that has a number of advantages over well-established screening techniques, including eliminating the need for liquid dispensers to spray liquids (eg, a system (eg, glass capillaries can be reused)) (see, for example, 09/444,112 , filed Nov. 22, 1999, which is incorporated herein by reference in its entirety).
Accordingly, the capillaries, capillary arrays, and systems of the invention are particularly suitable for screening libraries for activities or biomolecules of interest, including polynucleotides. Activity screening can be performed on individual expression clones or initially on a mixture of expression clones to ensure that the mixture has one or more of the indicated activities. If the mixture has a specific activity, individual clones can be screened again for that activity or for a more specific activity after collection from the capillary array.
All headings and subheadings used herein are for the convenience of readers and should not be construed as limiting the invention.
As used herein and in the appended claims, the singular forms "a," and "the" include plural references unless the context clearly dictates otherwise. Thus, for example, a reference to a "clone" includes multiple clones, a reference to a "nucleic acid sequence" generally includes a reference to one or more nucleic acid sequences and their equivalents known to those skilled in the art, and so on.
Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one skilled in the art to which this invention pertains. While all methods, devices and materials similar or equivalent to those described herein may be used in the practice or testing of the invention, the preferred methods, devices and materials are described.
All publications listed herein are incorporated herein by reference in their entirety to describe and disclose the databases, proteins, and methodologies described in the publications that may be used in connection with the present invention. The publications discussed above and throughout are provided for disclosure only prior to the filing date of this application. Nothing herein shall be construed as an admission that inventors have no right to anticipate such disclosure based on prior invention.
The invention will now be described in more detail with reference to the following non-limiting examples.
EXAMPLES
Example 1
DNA isolation
DNA was isolated using the IsoQuick procedure according to the manufacturer's instructions (Orca Research Inc., Bothell, Washington). Isolated DNA can optionally be normalized according to Example 2 (below). Once the DNA is isolated, it is cleaved by pushing and pulling the DNA through a 25 gauge double-barreled needle and a 1 cc syringe approximately 500 times. A small amount is run on a 0.8% agarose gel to ensure that most of the DNA is in the desired size range (approximately 3-6 kb).
The blunt end of DNA. DNA was blunted at the ends by mixing 45 μl of 10× mung bean buffer, 2.0 μl of mung bean nuclease (1050 μl), and water to a final volume of 405 μl. The mixture was incubated at 37°C for 15 minutes. The mixture is extracted with phenol and chloroform followed by additional extraction with chloroform. One ml of ice-cold ethanol is added to the final extract to precipitate the DNA. DNA is precipitated for 10 minutes on ice. DNA is removed by centrifugation in a microcentrifuge for 30 minutes. The precipitate is washed with 1 ml of 70% ethanol and decanted in a microcentrifuge. After centrifugation, the DNA is dried and gently resuspended in 26 μl of TE buffer.
DNA methylation. DNA was methylated by mixing 4 μl 10× EcoRI methylase buffer, 0.5 μl SAM (32 mM), 5.0 μl EcoRI methylase (40 μl/μl) and incubating at 37°C for 1 hour. To provide blunt ends, the following can be added to the methylation reaction: 5.0 μl 100 mM MgCl₂, 8.0 μl dNTP mix (2.5 mM each dGTP, dATP, dTTP, dCTP), 4.0 μl Klenow ( 5u/μl). The mixture is then incubated at 12°C for 30 minutes.
After incubation for 30 min, 450 μl of 1×STE was added. The mixture is extracted once with phenol/chloroform followed by additional extraction with chloroform. One ml of ice-cold ethanol is added to the final extract to precipitate the DNA. DNA is precipitated for 10 minutes on ice. DNA is removed by centrifugation in a microcentrifuge for 30 minutes. The precipitate was washed with 1 ml of 70% ethanol, microfuged and allowed to dry for 10 minutes.
binding. DNA was ligated by gently suspending DNA in 8 μl EcoRI adapter (from Stratagene cDNA Synthesis Kit), 1.0 μl 10x ligation buffer, 1.0 μl 10 mM rATP, 1.0 μl T4 DNA ligase (4 Wu/μl), and incubation at 4° C for 2 days. The binding reaction was stopped by heating for 30 minutes at 70°C.
Phosphorylation of adapters. Adapter ends are phosphorylated by mixing the ligation reaction with 1.0 μl 10× ligation buffer, 2.0 μl 10 mM rATP, 6.0 μl H2O, 1.0 μl polynucleotide kinase (PNK) and incubating at 37°C for 30 min. After incubation for 30 minutes, 31 μl of H 2 O and 5 ml of 10 x STE were added to the reaction mixture and size fractionated on a Sephacryl S-500 spin column. The combined fractions (1-3) were subjected to a single phenol/chloroform extraction followed by an additional chloroform extraction. DNA is precipitated by adding ice-cold ethanol on ice for 10 minutes. The sediment was pelleted by centrifugation in a microcentrifuge at high speed for 30 minutes. The precipitate obtained is washed with 1 ml of 70% ethanol, centrifuged and left to dry for 10 minutes. The sample is resuspended in 10.5 μl of TE buffer. The sample is not plated, but directly connected to the lambda arms as described above, with the exception of 2.5 µl of DNA and no water is used.
Sucrose gradient (2.2 ml) Size fractionation. Ligation is stopped by heating the sample to 65°C for 10 minutes. The sample is gently applied to a 2.2 ml sucrose gradient and centrifuged in a mini-ultracentrifuge at 45,000 rpm at 20°C for 4 hours (no brake). Fractions are collected by puncturing the bottom of the gradient tube with a 20-gauge needle and allowing sucrose to flow through the needle. The first 20 drops are collected in a Falcon 2059 tube, followed by ten 1-drop fractions (labeled 1-10). Each drop has a volume of approximately 60 μl. Five μl of each fraction was resolved in a 0.8% agarose gel for size verification. Fractions 1-4 (about 10-1.5 kb) were collected, and fractions 5-7 (about 5-0.5 kb) were collected in a separate tube. One ml of ice-cold ethanol is added to precipitate the DNA and then placed on ice for 10 minutes. The precipitate was pelleted by centrifugation in a microcentrifuge at high speed for 30 minutes. The pellets are washed by resuspension in 1 ml of 70% ethanol and centrifuged at high speed in a microcentrifuge for 10 minutes and then dried. Each pellet is then resuspended in 10 μl of TE buffer.
Test ligation with Lambda arms. The assay is set up by spotting 0.5 μl of the sample onto ethidium bromide agarose together with standards (DNA sample of known concentration) to obtain an approximate concentration. The samples are then viewed under UV light and the estimated concentration is compared to standards. The following ligation reaction (5 μl reaction) is prepared and incubated at 4°C overnight as shown in Table 1 below:
TABLE 1
lambda
10 mM
weapon
Put it inside
DNA T4
Copy
H₂O
10X is more
RATP
(TO SHOOT)
DNA
ligaza
Fraction 1-4
0,5 μl
0,5 μl
0,5 μl
1,0 μl
2,0 μl
0,5 μl
Fraction 5-7
0,5 μl
0,5 μl
0,5 μl
1,0 μl
2,0 μl
0,5 μl
Test package and board. Binding reactions are packaged according to the manufacturer's protocol. Packaging reactions were stopped with 500 μl of SM buffer and combined with packages from the same ligation. One μl of each combination reaction is titrated on the appropriate host (OD₆₀₀=1.0) (XL1-Blue MRF). 200 μl of host (in MgSO 4 ) is added to Falcon 2059 tubes, inoculated with 1 μl of packaged phage and incubated at 37°C for 15 minutes. Add approximately 3 ml of 48°C agar medium (50 ml of stock containing 150 μl of IPTG (0.5 M) and 300 μl of X-GAL (350 mg/ml)) and place in 100 mm diameter plates. The plates are incubated overnight at 37°C.
Amplification of libraries (5.0 x 105 recombinants from each library). Approximately 3.0 ml of host cells (OD6₀0=1.0) were added to two 50 ml conical tubes, inoculated with 2.5 x 105 pfu of phage per conical tube, and then incubated at 37°C for 20 minutes. The above agar is added to each tube to a final volume of 45 ml. Each pipe is lined with five 150 mm plates. Plates are incubated at 37°C for 6-8 hours or until plaques are the size of pin heads. Plates are coated with 8-10 ml of SM buffer and placed at 4°C overnight (with gentle rocking if possible).
Collect the phage. The phage suspension is recovered by pouring SM buffer from each plate into a 50 ml conical tube. Add about 3 ml of chloroform, shake vigorously and incubate at room temperature for 15 minutes. The tubes are centrifuged at 2000 rpm for 10 minutes to remove cell debris. The supernatant is poured into a sterile flask, 500 μl of chloroform is added and stored at 4°C.
A library with improved names. Serial dilutions are made of the collected phage (for example, 10-5 = 1 μl of amplified phage in 1 ml of SM buffer; 10-6 = 1 μl of 10-3 dilutions in 1 ml of SM buffer, etc.) and 200 μl of host (in 10 mM MgSO4) is added to two test tubes. One tube is inoculated with 10 μl of dilution 10-6 (10-5). Another tube was inoculated with 1 µl of 10-6 dilution (10-6) and incubated at 37°C for 15 minutes.
Approximately 3 ml of 48°C agar medium (50 ml of stock containing 150 μl of IPTG (0.5 M) and 37 μl of X-GAL (350 mg/ml)) is added to each tube and placed in 100 mm diameter plates. . mm. The plates are incubated overnight at 37°C.
The ZAP II library was excised to create the pBLUESCRIPT library according to the manufacturer's protocols (Stratagene).
The DNA library can be transformed into host cells (eg, E. coli) to generate an expression clone library.
Example 2
Normalization
Purified DNA can be normalized prior to library generation. DNA is first fractionated according to the following protocol. The genomic DNA sample was purified in a cesium chloride gradient. The cesium chloride solution (Rf=1.3980) was filtered through a 0.2 µm filter and 15 ml was placed in a 35 ml OptiSeal tube (Beckman). Add the DNA and mix thoroughly. Ten micrograms of bis-benzimide (Sigma; Hoechst 33258) was added and mixed well. The tube is then filled with the filtered cesium chloride solution and centrifuged in a Bti50 rotor in a Beckman L8-70 ultracentrifuge at 33,000 rpm. revolutions per minute during 72 hours. After centrifugation, a syringe pump and fractionator (Brandel Model 186) are used to pass the gradient through an ISCO UA-5UV absorbance detector set at 280 nm. Peaks are obtained that represent the DNA of organisms present in the environmental sample. Eubacterial sequences can be detected by PCR amplification of rRNA-encoding DNA from a 10-fold dilution of the E. coli tip using the following amplification primers:
Forward primer: 5'-AGAGTTTGATCCTGGCTCAG-3' (SEQ ID NO:82)
Reverse primer: 5'-GGTTACTTGTTACGACTT-3' (SEQ ID NO:83)
The obtained DNA is cleaved or enzymatically digested into fragments of 3-6 kb. The lone binding primers are ligated and the DNA is size-selected. The selected DNA size is amplified by PCR if necessary.
Normalization is then performed by resuspending the double-stranded DNA sample in hybridization buffer (0.12 M NaH2PO4, pH 6.8/0.82 M NaCl/1 mM EDTA/0.1 % SDS). The sample is coated with mineral oil and denatured by boiling for 10 minutes. The sample is incubated at 68°C for 12-36 hours. Double-stranded DNA is separated from single-stranded DNA according to standard protocols (Sambrook, 1989) on hydroxyapatite at 60°C. The single-stranded DNA fraction was desalted and amplified by PCR. The process is repeated for several more rounds (up to 5 or more).
Example 3
Enzyme activity assay
Below is a representative example of a procedure for screening an expression library prepared in accordance with Example 1 for hydrolase activity.
Library plates prepared as described in Example 1 were used to propagate a single plate containing 200 µl of LB Amp/Meth, glycerin in each well. This step is performed using the Beckman BIOMEK™ High Density Replication Tool (HDRT) with 1% bleach, water, isopropanol and air sterilization cycles between each inoculation. One plate is grown for 2 hours at 37°C and then used to inoculate two white Dynatech 96-well microtiter plates containing 250 µl LB Amp/Meth, Glycerin in each well. The original single plate was incubated at 37°C for 18 hours, then stored at -80°C. Two pooled daughter plates were also incubated at 37°C for 18 hours. The joined daughter plates are then heated at 70°C for 45 minutes. to kill cells and inactivate E. coli host enzymes. A stock solution of 5 mg/mL morphouraphenylalanyl-7-amino-4-trifluoromethylcoumarin (MuPheAFC, "substrate") in DMSO is diluted to 600 µM in 50 mM Hepes buffer pH 7.5 containing 0.6 mg/mL dodecyl maltoside detergent. Fifty μl of the 600 μM MuPheAFC solution was added to each well of the white condensed plates in one 100 μl mixing cycle using the BIOMEK to give a final substrate concentration of approximately 100 μM. Fluorescence values are recorded (excitation = 400 nm, emission = 505 nm) on a fluorometer reading plate immediately after addition of substrate (t = 0). The plate was incubated at 70°C for 100 minutes, then allowed to cool to room temperature for an additional 15 minutes. Fluorescence values are recorded again (t=100). The values at t=0 are subtracted from the values at t=100 to determine whether an active clone is present.
MuPheAFC
The data will show whether one of the clones in a particular well hydrolyzes the substrate. To determine the individual clone carrying the activity, plates of the original library are thawed and individual clones are used to inoculate a new plate containing LB Amp/Meth, glycerol once. As above, the plate is incubated at 37°C for cell growth, heated to 70°C to inactivate host enzymes, and 50 μl of 600 μM MuPheAFC is added using Biomek.
After adding the substrate, fluorescence values are recorded at t=0, the plate is incubated at 70°C and t=100 min. values are recorded as above. This data shows which board the active clone is on.
The enantioselectivity value, E, for the substrate is determined according to the following equation:
mi
=
ul
[
(
1
-
C
(
1
+
So
P
)
]
ul
[
(
1
-
C
(
1
+
So
P
)
]
where eeₚ=enantiomeric excess (ee) of the hydrolysis product and c=reaction conversion percentage. See Wong and Whitesides, Enzymes in Synthetic Organic Chemistry, 1994, Elsevier, Tarrytown, N.Y., p. 9-12.
Enantiomeric excess is determined by chiral high performance liquid chromatography (HPLC) or chiral capillary electrophoresis (CE). The tests are performed as follows: two hundred μl of the appropriate buffer is added to each well of a white 96-well microtiter plate, followed by 50 μl of partially or fully purified enzyme solution; Add 50 μl of substrate and monitor the increase in fluorescence over time until 50% of the substrate is consumed or until the reaction stops, whichever occurs first.
Example 4
Site-directed mutagenesis of enzyme-positive clones
Site-directed mutagenesis was performed on two different enzymes (alkaline phosphatase and β-glycosidase) in order to create new enzymes that show a higher level of activity than the wild-type enzyme.
alkaline phosphatase
Strain XL1-Red (Stratagene) was transformed with genomic clone 27a3a (in plasmid pBluescript) encoding the alkaline phosphatase gene from organism OC9a, an organism isolated from the surface of whalebone, according to the manufacturer's protocol. A culture of 5 ml LB + 0.1 mg/ml ampicillin was inoculated with 200 μl of transformant and the culture was allowed to grow at 37°C for 30 hours. The culture was then mini-prepared, and the isolated DNA was screened by transforming 2 μl of the obtained DNA into XL-1 Blue cells (Stratagene) according to the manufacturer's protocol and according to the test procedure described below. Mutant phosphatase OC9a required 10 min to develop color and wild-type enzyme required 30 min to develop color in the screening assay.
Standard alkaline phosphatase screening test
Transformed XL1 Blue cells are seeded on LB/amp plates. The resulting colonies were picked up on Duralon UV (Stratagene) or HATF (Millipore) membranes and lysed in chloroform vapor for 30 seconds. Cells were killed by heat incubation for 30 minutes at 85°C. Filters were developed at room temperature in BCIP buffer and the fastest growing ("positive") colonies were selected to replate "positive" plates on a BCIP plate (BCIP buffer: 20 mm CAPS pH 9.0, 1 mm MgCl2, 0.01 mm ZnCl2, 0.1 mg/ml BCIP).
Beta-glucosidase
This protocol was used for the mutagenesis of Thermococcus 9N2 beta-glycosidase. PCR was performed by incubating 2 microliters of dNTP (10 mM stock); 10 microliters of 10×PCR buffer; 0.5 microliters Vector DNA-31G1A-100 nanograms; 20 microliters of 3' primer (100 pmol); 20 microliters of 5' primer (100 pmol); 16 microliters of MnCl4H20 (1.25 mM stock); 24.5 microliters of water; and 1 microliter of Taq polymerase (5.0 units) in a total volume of 100 microliters. The PCR cycle was as follows: 95°C for 15 seconds; 58°C for 30 seconds; 72°C for 90 seconds; 25 cycles (10 minutes extension at 72°C-4°C incubation).
Five microliters of the PCR product was resolved in a 1% agarose gel to check the reaction. Purify on a QIAQUICK column (Qiagen). Resuspend in 50 microliters H2O.
Twenty-five microliters of purified PCR product; 10 microliters of NEB buffer #2; 3 microliters of Kpn I (1 OU/microliter); 3 microliters of EcoRI (20 U/microliter); and 59 microliters of H2O. incubated for 2 hours at 37°C to digest the PCR product and purified on a QIAQUICK column (Qiagen). Wash with 35 microliters of H2O.
Ten microliters of digested PCR product, 5 microliters of vector (cut with EcoRI/KpnI and phosphatized with shrimp alkaline phosphatase), 4 microliters of 5x ligation buffer, and 1 microliter of T4 DNA ligase (BRL) were incubated overnight to ligate the PCR products into the vector.
The resulting vector was transformed into M15pREP4 cells by electroporation. 100 or 200 microliters of cells were plated on LB amp methkan plates and grown overnight at 37°C.
Beta-galactosidase was assayed by (1) performing colony pick-ups using Millipore HATF membrane filters; (2) lyse colonies with chloroform vapor in 150 mm glass Petri dishes; (3) transfer the filters to 100 mm glass Petri dishes containing a piece of Whatman 3 MM filter paper saturated with buffer Z containing 1 mg/ml XGLU (after transferring the colonies lysed on the filter to the glass Petri dish, keep the dish at room temperature); and (4) "positives" are observed as blue spots on filter membranes ("positive" spots that appear early). A Pasteur pipette (or glass capillary tube) was used to scoop out the blue spots on the filter membrane. Place a small filter disc in an Eppendorf tube containing 20 μl of water. Incubate the Eppendorf tube at 75°C for 5 minutes, then vortex to elute the plasmid DNA from the filter. Transform this DNA into electrocompetent E. coli cells and repeat the filter transfer assay on transformation plates to identify 'positives'. The transformation plates should be returned to the incubator at 37°C after filter removal for colony regeneration. Re-inoculate the purified positive samples with 3 ml of LBamp fluid and incubate at 37°C overnight. Isolate plasmid DNA from these cultures and sequence the plasmid insert. The filtration assay uses buffer Z (recipe below) containing 1 mg/mL of the substrate 5-bromo-4-chloro-3-indolyl-β-o-glucopyranoside (XGLU) (Diagnostic Chemicals Limited or Sigma). Z-buffer: (mentioned in Miller, J.H. (1992) A Short Course in Bacterial Genetics, p. 445) per liter:
Na2HPO4-7H2O 16.1 g Na2HPO4-4H2O 5.5 g KCl 0.75 g Na2HPO4-7H2O 0.246 g 6-mercaptoethanol 2.7 ml Adjust pH to 7.0
Example 5
Construction of a stable, large library of picoplankton genomic DNA inserts
Cell collection and DNA preparation. Agarose plugs containing concentrated picoplankton cells were prepared from samples collected during an oceanographic cruise from Newport, Oregon to Honolulu, Hawaii. Seawater (30 liters) was collected in Niskin bottles, sieved through 10 µm Nitex and concentrated by hollow fiber filtration (Amicon DC10) through polysulfone filters with a limiting power of 30,000 MW. Concentrated bacterioplankton cells were collected on a Durapore 0.22 μm, 47 mm filter and resuspended in 1 ml of 2×STE buffer (1 M NaCl, 0.1 M EDTA, 10 mM Tris, pH 8.0) to a final density of approximately 1 × 10¹⁰ cells per ml. The cell suspension was mixed with one volume of 1% dissolved Seaplaque LMP (FMC) agarose cooled to 40°C and immediately drawn into a 1 ml syringe. The syringe was closed with parafilm and placed on ice for 10 minutes. The agarose plug containing the cells was extruded into 10 ml of lysis buffer (10 mM Tris pH 8.0, 50 mM NaCl, 0.1 M EDTA, 1% sarkosyl, 0.2% sodium deoxycholate, 1 mg/ml lysozyme). and incubated at 37°C. In an hour. The agarose plug was then transferred to 40 ml of ESP buffer (1% Sarkosyl, 1 mg/ml proteinase K, in 0.5 M EDTA) and incubated at 55°C for 16 hours. The solution was decanted and replaced with fresh ESP buffer and incubated at 55°C for another hour. The agarose plugs were then placed in 50 mM EDTA and stored at 4°C on board for the duration of the oceanographic voyage.
A portion of an agarose plug (72 μl) prepared from a sample collected from the Oregon coast was dialyzed overnight at 4°C against 1 ml of buffer A (100 mM NaCl, 10 mM Bis Tris-propane-HCl, 100 μg/ml acetylated BSA: pH 7.0 at 25°C) in a 2 ml microcentrifuge tube. The solution was replaced with 250 μl of fresh buffer A containing 10 mM MgCl2 and 1 mM DTT and incubated on a rocking platform for 1 hour at room temperature. The solution was then changed to 250 μl of the same buffer containing 4 U Sau3A1 (NEB), brought to 37°C in a water bath, and then incubated on a rocking platform in a 37°C incubator for 45 min. The coverslip was transferred to a 1.5 ml microcentrifuge tube and incubated at 68°C for 30 minutes to inactivate the enzyme and dissolve the agarose. Agarose was digested and DNA was dephosphorylated using gelase and HK-phosphatase (Epicentre), according to the manufacturer's instructions. Protein was removed by gentle phenol/chloroform extraction, and DNA was ethanol precipitated, precipitated, and then washed with 70% ethanol. This partially digested DNA was resuspended in sterile water to a concentration of 2.5 ng/µl for ligation with the pFOS1 vector.
PCR amplification results from several agarose plugs indicated the presence of significant amounts of archaeological DNA. Quantitative hybridization experiments using rRNA extracted from a single sample, collected at a depth of 200 m off the coast of Oregon, showed that planktonic archaea (this assemblage constitutes approximately 4.7% of the total picoplankton biomass (this sample corresponds to "PACI" -200 m in Table 1. DeLong et al., Nature, 371:695-698, 1994. Results of targeted PCR rDNA amplification of archaea performed on agarose plug lysates confirmed the presence of relatively high amounts of archaeal DNA in this sample. Agarose plugs prepared from this sample of picoplankton were selected for subsequent .Each 1 ml agarose plug from this site contained approximately 7.5 x 10⁵ cells, therefore approximately 5.4 x 105 cells were present in the 72 μl compartment used to prepare the partially digested DNA.
Vector arms were prepared with pFOS1 as described (Kim et al., Stable propagation of cosmid size inserts of human DNA in an F factor based vector, Nucl. Acids Res., 20:10832-10835, 1992). Briefly, the plasmid was completely digested with AstII, dephosphorylated with HK phosphatase, and then digested with BamHI to generate two arms, each containing a cos site in the correct orientation for cloning and packaging of the bound DNA between 35-45 kbp. Partially digested picoplankton DNA was ligated overnight to PFOS1 arms in a 15 µl ligation reaction containing 25 ng each of vector and insert and 1 U of T4 DNA ligase (Boehringer-Mannheim). Ligated DNA in four microliters of this reaction was packaged in vitro using the Gigapack XL packaging system (Stratagene), cosmid particles were transfected into E. coli strain DH10B (BRL) and cells were plated on LBcm15 plates. The resulting cosmid clones were collected in 96-well microliter dishes containing LBcm15 supplemented with 7% glycerol. Recombinant cosmids, each containing approximately 40 kb of picoplankton DNA insert, resulted in a library of 3552 cosmid clones containing approximately 1.4 x 10⁸ base pairs of cloned DNA. All clones tested contained 38 to 42 kbp inserts. This library was stored frozen at -80°C for later analysis.
Numerous modifications and variations of the present invention are possible in light of the above teachings; therefore, within the scope of the claims, the invention may be practiced otherwise than as specifically described. While the invention has been described in detail with reference to certain preferred embodiments thereof, it is to be understood that modifications and variations are within the spirit and scope of what is described and claimed.
All publications, patents, patent applications, GenBank sequences and ATCC deposits cited herein are expressly incorporated by reference for all purposes.
Many aspects of the invention have been described. However, it should be understood that various modifications may be made without departing from the spirit and scope of the invention. Accordingly, other aspects are within the scope of the following claims.See more
FAQs
What is the function of epoxide hydrolases? ›
Epoxide hydrolases (EHs) are key enzymes involved in the detoxification of xenobiotics and biotransformation of endogenous epoxides. They catalyze the hydrolysis of highly reactive epoxides to less reactive diols. EHs thereby orchestrate crucial signaling pathways for cell homeostasis.
What is the chemical reaction of epoxide hydrolase? ›Microsomal epoxide hydrolase (MEH) catalyzes the addition of water to epoxides in a two-step reaction involving initial attack of an active site carboxylate on the oxirane to give an ester intermediate followed by hydrolysis of the ester.
What are the substrates of epoxide hydrolase? ›The human soluble epoxide hydrolase (sEH, also known as cytosolic EH, cEH) has 554 amino acids (62.3 kDa) and is the product of the EPHX2 gene. Its specific substrate is trans-stilbene oxide, and it appears unable to hydrate epoxides of bulky steroids or polycyclic aromatic hydrocarbons.
What are the different types of epoxide hydrolase? ›Humans express four epoxide hydrolase isozymes: mEH, sEH, EH3, and EH4.
What is the main function of hydrolases? ›Hydrolase is a class of enzyme that commonly perform as biochemical catalysts that use water to break a chemical bond, which typically results in dividing a larger molecule into smaller molecules.
What is the role and function of hydrolases in biological system? ›Hydrolases are pivotal for the body since they digest large molecules into fragments for synthesis, excrete waste materials, and provide carbon sources for the production of energy, during which many biopolymers are converted to monomers.
What types of reaction are catalyzed by hydrolases? ›besides hydrolysis, hydrolases also catalyze several related reactions as condensations (reversal of hydrolysis) and alcoholysis (a cleavage using an alcohol in place of water); 4.
What kind of reaction does a hydrolase enzyme catalyze? ›The reaction hydrolases do is hydrolysis. Hydrolysis is a chemical reaction in which water is used to break down a compound by inserting a water molecule across a bond.
How do epoxide reactions work? ›Under aqueous basic conditions the epoxide is opened by the attack of hydroxide nucleophile during an SN2 reaction. The epoxide oxygen forms an alkoxide which is subsequently protonated by water forming the 1,2-diol product.
What are 4 types of enzyme-substrate interactions used by enzymes? ›Binding of the substrate to the enzyme involves noncovalent bonds, such as hydrogen bonds, ionic attractions, hydrophobic bonds, and van der Waals interactions.
What are the 3 parts of an enzyme-substrate complex? ›
Substrate – The molecule or atom that an enzyme acts on. Activation Energy – The energy required for a reaction to start taking place. Catalyst – Any molecule or substance that lowers the activation energy of a particular reaction.
What drug is metabolized by epoxide hydrolase? ›Biotransformation (Metabolism) of Pesticides
Epoxide hydrolase is another phase I enzyme known to metabolize pesticides, a well-known example being the metabolism of the herbicide tridiphane by the epoxide hydrolase of mouse liver (Magdalou and Hammock, 1987).
Three forms of epoxide hydrolases have been identified in the liver, two membrane-bound forms and one in the cytosolic fraction. Of the two known membrane-bound epoxide hydrolases, one catalyzes the conversion of cholesterol 5,6-epoxide to the corresponding diol and displays no activity toward xenobiotic epoxides.
What is an example of epoxide reaction? ›For example, the acid- or base-catalyzed hydrolysis of propylene oxide gives propylene glycol. Epoxides can be used to assemble polymers known as epoxies, which are excellent adhesives and useful surface coatings. The most common epoxy resin is formed from the reaction of epichlorohydrin with bisphenol A.
What type of reaction forms an epoxide? ›The carbons in an epoxide group are very reactive electrophiles, due in large part to the fact that substantial ring strain is relieved when the ring opens upon nucleophilic attack. Both in the laboratory and in the cell, epoxides are usually formed by the oxidation of an alkene.
What are 3 examples of hydrolases? ›Some common examples of hydrolase enzymes are esterases including lipases, phosphatases, glycosidases, peptidases, and nucleosidases. Hydrolase enzymes are important for the body because they have degradative properties.
What are 2 examples of hydrolases? ›Examples of some common hydrolases include esterases, proteases, glycosidases, and lipases.
What are the substrates of hydrolases? ›Hydrolases bind an incredibly diverse set of substrates, which can be as small as diphosphate and acetamide or as large as starch, angiotensin I, neurotensin, polynucleotides, and polysaccharides.
Where do acid hydrolases function? ›The hydrolases are thus released into the lumen of the endosome, while the receptors remain in the membrane and are eventually recycled to the Golgi. Late endosomes then mature into lysosomes as they acquire a full complement of acid hydrolases, which digest the molecules originally taken up by endocytosis.
What is the mechanism of enzyme action of hydrolases? ›Hydrolases are a type of enzyme that acts as a biochemical catalyst by breaking a chemical bond with water, resulting in the division of a larger molecule into smaller molecules. Esterase enzymes, such as lipases, phosphatases, glycosidases, peptidases, and nucleosidases, are examples of hydrolase enzymes.
What are the activities of hydrolases? ›
However, hydrolases are not simply lytic enzymes that destroy the cell wall. Their activities are harnessed to support cell growth, division, and differentiation, enabling bacteria to propagate and adapt to changing environmental conditions (Fig.
What do hydrolases catalyze quizlet? ›(b) Hydrolases catalyze the cleavage of bonds, usually esters and amides, with water. These are called hydrolysis reactions.
What are the 6 types of enzyme catalyzed reactions? ›Based on the type of catalyzed biochemical reaction, enzymes are classified into one of six classes: oxidoreductases, transferases, hydrolases, lyases, isomerases, or ligases. Enzymes are biological catalysts, and nearly all of them are proteins.
What is hydrolase reaction? ›A hydrolysis reaction is a reaction in which one molecule breaks apart to form multiple smaller molecules. Acidic hydrolysis of an ester gives a carboxylic acid and an alcohol. Basic hydrolysis (saponification) of an ester gives a carboxylate salt and an alcohol.
What are the reactions of enzyme catalyzed reactions? ›The reactions are: Oxidation and reduction. Enzymes that carry out these reactions are called oxidoreductases. For example, alcohol dehydrogenase converts primary alcohols to aldehydes.
What reactions do enzymes catalyze in the body? ›Enzymes help with the chemical reactions that keep a person alive and well. For example, they perform a necessary function for metabolism, the process of breaking down food and drink into energy. Enzymes speed up (catalyze) chemical reactions in cells.
What is the catalyst for epoxide? ›Cobalt Catalyst Determines Regioselectivity in Ring Opening of Epoxides with Aryl Halides.
What is acid catalysed reaction of epoxide? ›The acid-catalyzed epoxide ring-opening process is best described as a hybrid, or cross, of the SN2 and SN1 mechanisms. The oxygen is first protonated, resulting in a suitable leaving group (step 1).
Where are epoxides used? ›Epoxides- Uses or applications
It is used as a stabilizer in materials like PVC. They are also used in the manufacture of Epoxy resists that have low viscosity and without compromising strength and physical properties.
- Amylase (made in the mouth and pancreas; breaks down complex carbohydrates)
- Lipase (made in the pancreas; breaks down fats)
- Protease (made in the pancreas; breaks down proteins)
What are the 3 digestive enzymes and their substrates? ›
Some of the most common digestive enzymes are: Carbohydrase breaks down carbohydrates into sugars. Lipase breaks down fats into fatty acids. Protease breaks down protein into amino acids.
What are the 7 types of enzymes? ›Enzymes can be classified into 7 categories according to the type of reaction they catalyse. These categories are oxidoreductases, transferases, hydrolases, lyases, isomerases, ligases, and translocases. Out of these, oxidoreductases, transferases and hydrolases are the most abundant forms of enzymes.
What are the three key enzymes? ›The three key enzymes are hexokinase, phosphofructokinase and pyruvate kinase.
What are the 3 enzymes involved in protein synthesis? ›mRNA, tRNA, and rRNA are the three major types of RNA involved in protein synthesis. The mRNA (or messenger RNA) carries the code for making a protein.
What are examples of epoxide drugs? ›Carfilzomib, oprozomib, ixabepilone, and maytansine are some of the examples of epoxide-based drugs which are in clinical use (Fig.
What enzymes are breaking down drugs? ›Drug-metabolizing enzymes are called mixed-function oxidase or monooxygenase and containing many enzymes including cytochrome P450, cytochrome b5, and NADPH-cytochrome P450 reductase and other components.
What are examples of drugs metabolized by cyp450? ›- Terfenadine. Terfenadine is the first non-sedating H1-antihistamine drug. ...
- Cimetidine. Cimetidine inhibits antihistamine H2-receptor binding and is used in the treatment of gastric ulcers. ...
- Grapefruit juice. ...
- Omeprazole. ...
- Erythromycin. ...
- Cyclosporin. ...
- Rifampicin.
Thus, the simplest epoxide, C2H4O, has the IUPAC name of 1,2-epoxyethane. Other examples include 3,4-epoxyheptane, with an epoxy group between carbons 4 and 5, and 1,2-epoxy-3,5-cyclohexadiene. Another common epoxide naming convention involves simply naming the parent alkene followed by the term oxide.
How is epoxide formed in the body? ›In the body vinyl chloride can undergo metabolism in the liver to form reactive and potentially mutagenic vinyl chloride epoxide. Ideally the epoxide will then react with a water molecule to form a diol (a molecule with 2 alcohol functional groups) and be excreted in the urine.
What are the uses of epoxides in daily life? ›Epoxides, whose key chemical feature is a three-member ring consisting of an oxygen atom bound to two carbon atoms, are used to manufacture products as varied as antifreeze, detergents, and polyester.
What is the use of epoxide in medicine? ›
In addition, epoxides are valuable building blocks in medicinal chemistry. They react with nucleophiles in a ring-opening process to form new C-C, C-O and C-N bonds. Herein we have designed and synthesized a library of small heteroaliphatic epoxides for drug design.
Why are epoxides important? ›Epoxides are an important class of compounds in organic synthesis, because nucleophilic ring opening takes place easily in an SN2 pathway with inversion of configuration at the reacting carbon center. The driving force of the high reactivity is the inherent strain of the three-membered heterocycle.
What are the two methods for formation of an epoxide? ›- Peroxyacid reactions with Alkenes.
- Intramolecular Williamson Ether Synthesis via Halohydrins.
Epoxides (also known as oxiranes) are three-membered ring structures in which one of the vertices is an oxygen and the other two are carbons.
What is the purpose of epoxide? ›Applications of Epoxide
It is used as a fumigant and in order to make antifreeze, ethylene glycol and various other useful compounds. As we know, more complicated epoxides are generally made up by the epoxidation of alkenes. This takes place by the common usage of peroxy acid in order to transfer an atom of oxygen.
The hydrolases involved in these processes catalyze the cleavage of bonds throughout the sugar and peptide moities of peptidoglycan. Phenotypes associated with these diverse hydrolases reveal new functions of the bacterial cell wall beyond growth and division.
What is the role of epoxy fatty acids and epoxide hydrolases in the pathology of neuro inflammation? ›Epoxy-fatty acids (EpFAs) reduce neuroinflammation and neurodegeneration. Soluble epoxide hydrolase (sEH) inhibition increase EpFA levels. High EpFAs could reduce the onset of Parkinson's disease, Alzheimer's disease and dementia.
What are the uses of epoxides in chemistry? ›Epoxides can be used to assemble polymers known as epoxies, which are excellent adhesives and useful surface coatings. The most common epoxy resin is formed from the reaction of epichlorohydrin with bisphenol A.
What is the importance of epoxides in chemistry? ›Epoxides are an important class of compounds in organic synthesis, because nucleophilic ring opening takes place easily in an SN2 pathway with inversion of configuration at the reacting carbon center. The driving force of the high reactivity is the inherent strain of the three-membered heterocycle.
What is the mode of action of hydrolases? ›Hydrolases are a type of enzyme that acts as a biochemical catalyst by breaking a chemical bond with water, resulting in the division of a larger molecule into smaller molecules. Esterase enzymes, such as lipases, phosphatases, glycosidases, peptidases, and nucleosidases, are examples of hydrolase enzymes.
What is the biological activity of epoxides? ›
Generally, epoxide possesses many biological activities, such as antianalgesic activity (Inceoglu et al., 2008), anti-inflammation (Morisseau et al., 2012), cytotoxicity (Ye et al., 2002), and tumorigenicity (Pal et al., 2013).
Which drugs contain epoxide? ›Carfilzomib, oprozomib, ixabepilone, and maytansine are some of the examples of epoxide-based drugs which are in clinical use (Fig. 4). ... ...
Where is epoxide hydrolase located? ›sEH is a member of the epoxide hydrolase family. This enzyme, found in both the cytosol and peroxisomes, binds to specific epoxides and converts them to the corresponding diols.