Molecular analysis of multiple cytochrome P450 genes from the malaria vector, Anopheles gambiae

Authors


  • §

    Present address: Parasite and Vector Biology Division, Liverpool School of Tropical Medicine, Pembroke Place, Liverpool, L3 5QA, UK.

Dr H. Ranson, Parasite and Vector Biology Division, Liverpool School of Tropical Medicine, Pembroke Place, Liverpool, L3 5QA, UK. Tel.: (44) 151 7053310; fax: (44) 151 7087319; e-mail: Hranson@liv.ac.uk

Abstract

Cytochrome P450s are a superfamily of haemoproteins, important in the metabolism of endogenous compounds and xenobiotics. As a first step to elucidating the role of this family in insecticide resistance in the malaria mosquito, Anopheles gambiae, we have cloned and mapped multiple P450 genes. Sixteen cDNAs encoding full-length P450s were cloned and physically mapped to the mosquito's polytene chromosomes. Fourteen of these encode putative CYP6 proteins and two encode P450s belonging to the CYP9 class. Eighteen new A. gambiae Cyp4 P450 genes were identified using degenerate PCR primers, cDNAs were detected for ten and in situ locations for thirteen members of this gene family.

Introduction

The cytochrome P450s are a superfamily of haemoproteins that are responsible for the oxidative metabolism of a wide variety of xenobiotic compounds and endogenous compounds of metabolism. Members of the cytochrome P450 family are found in all organisms from bacteria to mammals. To classify such a diverse family, a nomenclature system, based on amino acid sequence identity has been established (Nelson et al., 1993). Each P450 is assigned to a family (designated by a number) and subfamily (designated by a letter). P450 proteins are classified according to their sequence identities and phylogenetic relationships. Generally P450s belonging to the same family share > 40% identity at the amino acid level and members of the same subfamily share > 55% identity.

Sequencing of the Drosophila melanogaster genome has facilitated an analysis of the phylogenetic relationship among insect P450s. There are eighty-three genes in D. melanogaster that putatively encode functional P450s. These have been classified into twenty-five different families, but more than 50% belong to either the CYP4 or CYP6 families (Tijet et al., 2001). The exact number of genes encoding P450s in other insect species is not known but multiple genes from families 4 and 6 have been isolated from several species. For example, the mosquito Anopheles albimanus contains at least seventeen genes predicted to encode CYP4 proteins (Scott et al., 1994) and nine partial Cyp6 genes have been isolated from the Mediterranean fruitfly, Ceratitis capitata (Danielson et al., 1999).

Interest in insect P450s is focused on their role in the oxidative metabolism of insecticides. P450 mediated metabolism has been reported for virtually all insecticide classes, but it is thought that only a subset of this large enzyme family is capable of binding and metabolizing insecticides. There have been several reports that P450s from families CYP4, CYP6 and CYP12 are present at higher levels in insecticide resistant strains of insects (e.g. Tomita & Scott, 1995; Maitra et al., 1996; Pittendrigh et al., 1997; Guzov et al., 1998), but in very few cases has a definitive link between an increased expression of a particular P450 gene and increased insecticide metabolism been established (Scott, 1999). Even in systems in which the role of a particular P450 in conferring insecticide resistance has been clearly demonstrated, the molecular mechanism by which this resistance occurs is generally unknown. Mutations in both trans and cis regulatory factors have been implicated in resistance in the housefly and in Drosophila but in other cases, resistance is reportedly caused by qualitative changes in the P450 protein that result in an increased affinity for the insecticide (Scott, 1999).

Anopheles gambiae is the major malaria vector in Sub-Saharan Africa. Control of this species relies primarily on the use of pyrethroid impregnated bednets and this, together with the extensive contamination of the mosquito's breeding sites with pyrethroid insecticides used in agriculture, has led to the emergence of resistance to this insecticide class. Target site insensitivity is partly responsible for the increase in permethrin tolerance in A. gambiae from both West and East Africa (Martinez-Torres et al., 1998; Ranson et al., 2000) but increases in permethrin metabolism, associated with increased P450 levels have also been reported (Vulule et al., 1999).

The purpose of this study is to investigate the diversity of the P450 gene superfamily in A. gambiae as a prelude to determining the role of these enzymes in conferring pyrethroid resistance in this important disease vector. We focused primarily on the two largest insect P450 families, CYP4 and CYP6, and report the identification and physical mapping of multiple genes belonging to these families.

Results

Identification of A. gambiae Cyp6 genes

Partial sequencing of five BAC clones containing gDNA from chromosome 3R division 30A revealed a cluster of nine P450 genes belonging to the Cyp6 family (Fig. 1). The sequence of these Cyp6 open reading frames and the intergenic regions between the genes assembled into three contigs. The first of these contained three genes, Cyp6N1 and Cyp6M1, separated by an intergenic spacer (IGS) of 915 bp, and Cyp6Y1, 1855 bp downstream from Cyp6M1. The second contig contained four Cyp6 genes: Cy6N2, Cyp6R1, Cyp6S1 and Cyp6S2. The IGS between these genes ranged from just 71 bp (between Cyp6R1 and Cyp6S1) to 456 bp between Cyp6S1 and Cyp6S2. The final contig contained the 3′ end of gene Cyp6Z1 and a second, full-length Cyp6 gene, Cy6Z2, 1081 bp downstream from the stop codon of Cyp6Z1. Putative polyadenylation sites (consensus sequence AATAAA) were identified for eight of these nine genes. These were located 105 bp downstream of Cyp6Z2, 28 bp downstream of the stop codon for Cyp6N2, 61 bp from the stop codon of Cyp6S1, 84 bp from Cyp6S2 and 74 bp after Cyp6M1. Two possible polyadenylation sites were identified downstream of Cyp6Z1, Cyp6N1 and Cyp6Y1. These were found 91 bp and 364 bp downstream of the stop codon of Cyp6Z1, 100 bp and 128 bp downstream of the stop codon of Cyp6N1 and 39 bp and 48 bp downstream of Cyp6Y1.

Figure 1.

Schematic diagram showing the cluster of Cyp6 genes on chromosome 3R division 30A. The upper panel depicts the polytene chromosomes and shows the physical map position of eleven BAC clones containing full or partial P450 sequences. The second panel shows the orientation of five of these BAC clones and the position of the Cyp6 genes within these clones. The bottom panel shows the orientation of the individual genes, the size of the intergenic regions and the position of the introns (genes are shown as hatched boxes, introns in white). Also included in this figure are the approximate locations of the primers (shown as A, B, C, D, E and F) used in an attempt to amplify the intergenic region between the contigs (see text for further details). Note that the orientations of the contigs are not confirmed.

The order of contig 1 with respect to contig 2 was determined by PCR using primer pairs specific to individual Cyp6 genes and 10I20, 04I08 and 24H06 BAC clones as template DNA. The genes Cyp6N1 and Cyp6M1 are absent from BAC clones 04I08 and 24H06 but present in BAC clone 10I20, implying that these genes are located upstream of contig 2 as depicted in Fig. 1. The orientation of contig 1 is not known and several attempts were made to amplify the intervening sequence between the two contigs to resolve this issue. Primers A, B, C and D, extending outward from contigs 1 and 2 (see Fig. 1) were used in all possible combinations in PCR reactions using BAC clone 10I20 as a template, but no products were amplified. This suggests that the length of sequence between the contigs is beyond the capabilities of the PCR reaction and it is possible that additional gene(s) exist within this intergenic region. Nevertheless, sequencing of a further total of 1500 bp of BAC DNA outwards from both contigs (i.e. upstream of Cyp6N2, Cyp6N1 and Cyp6Y1) did not reveal any additional open reading frames. A similar approach was taken to determine the position and orientation of contig 3. The gene Cyp6Z2 was successfully amplified from BAC clone 18L09 but is not contained within BACs 10I20, 04I08 or 24H04. This result suggests that contig 3 is found upstream of contig 1. Attempts to amplify the intervening sequence between contig 1 and contig 3 using primers A, B, E and F (see Fig. 1) and genomic DNA as a template were unsuccessful and therefore it was not possible to confirm the order of the contigs or determine the orientation of contig 3.

In addition to the nine Cyp6 genes clustered on chromosome 3R division 30A, a further four partial gene sequences with sequence similarity to Cyp6 genes were identified in the BAC end sequence database (Table 1), three of which were physically mapped to chromosome 2R divisions 13 or 14 of the mosquito polytene chromosomes by in situ hybridization. Putative full-length coding sequences for two of these Cyp6 genes, Cyp6P1 and Cyp6P4 were obtained by further sequencing of BAC clone 07G05. These two genes are separated by an IGS of 579 bp. An additional Cyp6 gene, Cyp6P2, was identified 702 bp upstream of Cyp6P1 by further sequencing of BAC clone 18L09 (Fig. 2). The third BAC clone that mapped to this region of the chromosome, 29O06, contained the 3′ end of Cyp6AA1. The putative 5′ end of this gene was obtained by searching the A. gambiae Trace database (www.ncbi.nlm.nih.gov/blast/mmtrace.html), which consists of the unassembled shotgun sequences generated during the sequencing of the A. gambiae genome. PCR reactions using primers extending outwards from Cyp6AA1 with primers extending outwards from both Cyp6P2 and Cyp6P4 were performed in an attempt to orientate Cyp6AA1 with the cluster of Cyp6P genes but no products were obtained, suggesting that the intergenic distance is beyond the capability of the PCR reaction. The Trace database was also used to complete the sequences of Cyp6P3, partially contained within BAC clone 17D17 (in situ location unknown), and Cyp6Z1 contained within 18L09.

Table 1.  Partial P450 genes identified from the BAC end sequence database
BAC clonePredicted class of P450 geneName of P450 protein encodedIn situ map position
  1. The A. gambiae BAC end sequence database (www.genoscope.cns.fr/externe/English/Projets/Projet_AK/AK.html) was searched for sequences with similarity to P450 genes. The name of the clone is given in column one. Column two shows the predicted family of the putative P450 protein encoded by this gene sequence and column three gives the name of the protein encoded, where known. Eleven of the fifteen BAC clones have been physically mapped to the polytene chromosomes and the map position of these clones is shown in the final column.

17F10CYP4
30M05CYP4
19F10CYP4CYP4D172R 12A
23J14CYP4
04I08CYP6CYP6N23R 30A
12D02CYP6CYP6N13R 30A
07G05CYP6CYP6P12R 13C
24H06CYP6CYP6S23R 30A
29O06CYP6CYP6AA12R 14C
17D17CYP6CYP6P3 
18M16CYP6CYP6P42R 13C
18L09CYP6CYP6Z13R 30A
18B07CYP9CYP9L13L 46C
32K08CYP12 3R 29D
18 N16CYP12 3R 29D
Figure 2.

Schematic diagram showing the cluster of Cyp6 genes on chromosome 2R divisions 13 and 14. The upper panel depicts the polytene chromosomes and shows the physical map position of three BAC clones containing full or partial P450 sequences. The second panel shows the orientation of these BAC clones and the position of the Cyp6 genes within these clones. The bottom panel shows the orientation of the individual genes, the size of the intergenic regions and the position of the introns (genes are shown as hatched boxes, introns in white). Also included in this figure are the approximate locations of the primers (shown as G, H, I and J) used in an attempt to amplify the intergenic region between the two contigs (see text for further details).

Primers designed to encompass the predicted start and stop codons of the genes were used to amplify the full coding sequence of each of the fourteen Cyp6 genes from adult A. gambiae cDNA. These cDNAs were sequenced to confirm the intron exon boundaries. A single phase one intron is present in each of the Cyp6 genes. In twelve of the genes, this intron is found at an identical position, immediately prior to the conserved ETLR domain. The size of this intron ranges from 58 bp to 103 bp. Cyp6Z1 and Cyp6Z2 also contain a single phase one intron of 58 bp and 72 bp, respectively, but this intron is found approximately 86 residues downstream of the consensus intron present in the other A. gambiae Cyp6 genes. An alignment of the deduced amino acid sequence of these cDNAs, showing the position of the two introns is shown in Fig. 3.

Figure 3.

Alignment of the deduced amino acid sequences of fourteen Anopheles gambiae CYP6 proteins. The sequences were aligned using ClustalW. The ETLR, PERF and haem-binding domains are underlined. The position of the intron found in twelve of the fourteen genes is marked with a downward arrow and the upward arrow designates the position of the intron found in Cyp6Z1 and Cyp6Z2. The accession numbers for these sequences are AYO28782-6, AYO62207-8, AF487534-7, AYO81778 and AF487780.

The deduced amino acid sequences of the fourteen A. gambiae CYP6 proteins range in length from 492 amino acids for CYP6Z2 to 509 amino acids for CYP6P3. The haem binding domain (consensus PFXXGXXXCXG) and two other highly conserved domains (ETLR and PERF) (Tijet et al., 2001) are present in all fourteen of the A. gambiae Cyp6 genes (underlined in Fig. 3). A hydrophobic N terminus that functions as a membrane-anchor signal and is characteristic of membrane bound P450 proteins (Sakaguchi et al., 1987) is present in the deduced amino acid sequence of the fourteen A. gambiae CYP6 proteins (from 11/20 to 16/20 hydrophobic residues at the N-termini).

The D. melanogaster genome is predicted to encode twenty-two functional CYP6 proteins (Tijet et al., 2001). These have been classified into seven subfamilies, the largest of which, CYP6A, contains twelve members. Eight different subfamilies are represented by the fourteen A. gambiae CYP6 proteins described in this report. None of these are shared with Drosophila and seven of them are so far unique to A. gambiae. The exception is the CYP6N family, members of which have been identified in Aedes albopictus (Nelson, 2002).

Identification of A. gambiae Cyp4 genes

To maximize the likelihood of obtaining PCR products from each individual Cyp4 gene, we used larval and adult cDNA and the ND-1 BAC library containing A. gambiae genomic DNA as templates in PCR reactions using degenerate Cyp4 primers. A total of eighteen different P450 fragments were identified (Table 2). These sequences were submitted to the official P450 nomenclature committee (http://drnelson.utmem.edu/CytochromeP450.html) to confirm their identity as partial Cyp4 genes and for assignment to the relevant subfamily. Both cDNA and genomic clones were detected for nine of these Cyp4 genes. No corresponding genomic clone was detected for two of the Cyp4 cDNAs (Cyp4G16 and Cyp4H24) and seven sequences were amplified only from genomic DNA. Several of the genes were amplified multiple times and several allelic variants of the reported sequences were observed (data not shown). Of the eighteen different partial genes sequenced, the closest identity was between CYP4H19 and CYP4H24 (85.8% at the protein level) and the lowest between CYP4H14 and CYP4C28 (23% identity).

Table 2.  PCR amplification of partial Cyp4 genes from Anopheles gambiae
P450Accession no.Intron (length)cDNA detected?In situ map position
  1. n.k. = not known.

CYP4C25AYO62203Non.k.
CYP4C26AYO62205Non.k.
CYP4C27AYO62197Yes (97 bp)X 4A
CYP4C28AYO62204Non.k.
CYP4D15AYO62193Yes (69 bp)?2R 12C
CYP4D16AYO62194Yes (63 bp)?2R 12C
CYP4D17AYO62196Yes (64 bp)?2R 12A
CYP4G16AYO61289n.k.?n.k.
CYP4G17AYO62200NoX 5A
CYP4H14AYO62202Yes (76 bp)2R 12C
CYP4H15AYO62190No?2R 9B
CYP4H16AYO62191Yes (65 bp)?3R 30C
CYP4H17AYO62192Yes (65 bp)?3R 30C
CYP4H18AYO62195Yes (177 bp)3R 30C
CYP4H19AYO62199No?X 4C
CYP4H20AYO62201Yes (78 bp)?2R 12C
CYP4H24AYO62208n.k.?n.k.
CYP4J5AYO62198No?2L 23D

The degenerate primers used to amplify the partial fragments of the Cyp4 genes (Scott et al., 1994) bind to two highly conserved regions of the CYP4 family, the I helix (approximately amino acid position 300) and the haem binding domain (amino acid ∼450). Thus the region amplified equates to approximately 150 amino acids, which is less than one-third of the total P450 protein. Without the full-length sequence we cannot rule out the possibility that some of these putative Cyp4 genes may be pseudogenes. Cyp4 pseudogenes have not been found in the Drosophila genome, although a total of seven pseudogenes from other Cyp families are reported in this species (Tijet et al., 2001). In addition, in the German cockroach, Blattella germanica, several pseudogenes have been detected including two from the Cyp4 family (Wen et al., 2001).

By systematically screening each plate of the BAC library with the Cyp4 primers, we aimed to identify the full complement of this gene family in A. gambiae. However, two cDNAs, one of which (Cyp4G16) was isolated multiple times from larval and adult cDNA, were not amplified from the BAC library. Therefore the possibility that other members of this class were missed by our screening protocol cannot be ruled out. Of the sixteen partial genomic sequences obtained from the BAC library, nine are predicted to contain a single phase one intron at the same location, varying in length from 63 to 177 bp. An intron at this position has been observed in Cyp4 genes from many other species (Scott et al., 1994). Cyp4C27 is unique amongst the A. gambiae Cyp4 genes in containing a phase two intron 27 amino acid residues downstream of the intron found in the other Cyp4 sequences.

Thirteen of the eighteen Cyp4 genes have been physically mapped (Table 2 and Fig. 4). They are present on all three chromosomes, but two clusters containing at least three Cyp4 genes are found on chromosome 2R division 12C and chromosome 3R division 30C.

Figure 4.

Schematic map of the Anopheles gambiae polytene chromosomes showing the location of known A. gambiae P450 genes. Those genes for which a cDNA has not been detected are shown in brackets. For details of polytene divisions see Tables 1 and 2.

The CYP4 family is extensive in other insect species. In Drosophila twenty-one Cyp4 genes have been identified (Tijet et al., 2001) and seventeen partial Cyp4 fragments have been isolated from the mosquito Anopheles albimanus (Scott et al., 1994). The Anopheles Cyp4 genes belong to five different subfamilies, D, G, H, J and K. Two of these subfamilies, G and D are also found in Drosophila.

Identification and physical mapping of other partial P450 sequences

The full-length sequence for two Cyp9 genes, Cyp9K1 and Cyp9L1, was obtained by partial sequencing of BAC clones 11C21 and 01 N01. The successful amplification of the complete coding region of these genes from cDNA synthesized from mRNA extracted from adult RSP mosquitoes confirmed that both of these genes are actively transcribed and enabled the exon intron boundaries to be determined. Cyp9K1 contains a single intron of 67 bp and encodes a putative protein of 531 amino acids. A single intron of 357 bp, is found at a distinct position in Cyp9L1. The protein encoded by this gene contains 534 amino acids. An amino acid alignment of the two A. gambiae CYP9 proteins showing the conserved domains and the location of the introns is shown in Fig. 5. The haem binding domain, ETLR and PERF domains and hydrophobic N-termini are all found in the two A. gambiae Cyp9 proteins. The SRFALME motif, found immediately after the haem binding domain in both CYPK1 and CYP9L1 (shown in bold in Fig. 5) is characteristic of CYP9 P450s (Fogleman et al., 1998; Wen et al., 2001).

Figure 5.

Alignment of the deduced amino acid sequences of two Anopheles gambiae CYP9 proteins. The sequences were aligned using ClustalW. The ETLR, PERF and haem-binding domains are underlined and the SRFALME motif, characteristic of CYP9 proteins, is shown in bold. The position of the intron in Cyp9L1 is designated by a downward arrow and the intron in Cy9K1 is shown by an upward arrow. The accession numbers for these sequences are AF487781 and AF487533.

The BAC clones containing Cyp9K1 and Cyp9L1 have been physically mapped to chromosome X, division 5A and chromosome 3L, division 46C, respectively. Two BAC end sequences with sequence similarity to a Cyp12 gene were also identified. These both map to chromosome 3R, division 29D and sequence alignment suggests that they both encode the 3′ end of the same gene. An overlapping BAC clone containing the 5′ end of this gene has been identified but cDNA clones from this putative Cyp12 gene have not yet been detected. The position of these Cyp9 and Cyp12 P450 genes is shown schematically on Fig. 4.

Discussion

A total of sixteen full-length P450 genes and eighteen putative partial P450 genes have been isolated from the mosquito A. gambiae. Fourteen of these genes belong to the Cyp6 family, eighteen are members of the Cyp4 family and two are classified as Cyp9 P450s. At the time of writing, the sequence of the entire genome of A. gambiae was in the process of being assembled. This valuable resource is expected to be made publicly available in mid 2002 and will enable the full extent of the P450 family in this species to be determined. However, our prediction, based on estimates in other insects, is that we have identified approximately one-third of the total complement of P450s in this mosquito species. It is possible that some of these partial sequences may have been amplified from pseudogenes that do not encode functional P450 enzymes. These pseudogenes may be transciptionally silent or may contain deletions or substitutions that result in nonfunctional proteins. Clearly full-length cDNAs for each of the partial P450 genes are required before their designation as functional genes can be verified. The absence of frame shifts, deletions or premature termination codons in the sixteen fully sequenced A. gambiae P450 genes described in this report suggests that these genes all encode functional P450 enzymes.

Of the thirty-four P450 sequences amplified from A. gambiae, twenty-eight have been physically mapped to the mosquito polytene chromosomes. They are distributed on all three chromosomes, but three clusters containing three or more genes can be identified. The largest gene cluster identified thus far is on chromosome 3R. At least nine Cyp6 genes are contained within division 30A. In addition, three Cyp4 genes and a Cyp12 gene are located on polytene divisions flanking this Cyp6 cluster and it is conceivable that this region comprises a large sequential array of multiple P450 genes. Clusters of P450 genes from different classes are not unusual and have been found in Arabidopsis thaliana and D. melanogaster (Paquette et al., 2000; Tijet et al., 2001). A second cluster of three Cyp6 genes is present on chromosome 2R, division 13C and three Cyp4 genes are found on division 12C of this chromosome. There is some correlation between sequence similarity and physical proximity amongst the A. gambiae Cyp genes. For example, the cluster of three Cyp4 genes on chromosome 3R, are all members of the Cyp4H subfamily. Similarly the three Cyp6 genes on 2R, division 13C all belong to subfamily Cyp6P.

By identifying and physically mapping a significant subset of the A. gambiae P450 gene family, we are now in a position to investigate the potential role of each of these enzymes in conferring insecticide resistance. P450-mediated resistance is most frequently associated with up-regulation of one or more P450 genes, either through modifications in promoter sequences or trans acting regulatory proteins, although instances of resistance being linked to amino acid substitutions in existing P450 structural genes have been reported (Berge et al., 1998). By sequencing P450 genes from insecticide resistant and susceptible strains of mosquitoes and comparing the expression levels of these genes between the two strains, the potential role of each enzyme in conferring resistance can be assessed. Intriguingly, a genetic mapping approach to isolate the major genes conferring resistance to permethrin in an East African strain of A. gambiae identified two resistance loci, one of which maps to chromosome 3R, division 30 (H. Ranson, M. Paton, L. McCarroll, B. Jensen, J. Hemingway and F. H. Collins, manuscript in preparation). The colocalization of this resistance loci with a large cluster of P450 genes points to a role for one or more of these P450 proteins in conferring pyrethroid resistance in A. gambiae.

Experimental procedures

Mosquito strains

The RSP-ST strain originated from Kisumu, Western Kenya in 1992, and is derived from A. gambiae s.s. females collected from villages where permethrin-impregnated bednets and curtains had been in use. This colony was selected for several generations in the laboratory by exposing 1-day-old adults to permethrin impregnated filter papers for a length of time that resulted in 80% mortality. The colony was later selected for the standard chromosome arrangement. The PEST strain is also fixed for the standard chromosome arrangement (Mukabayire & Besansky, 1996). This strain was used to construct the Notre Dame 1 (ND-1) A. gambiae bacterial artificial chromosome (BAC) library (X. Wang, Z. Ke, A. J. Cornel, D. Smoller and F. H. Collins, manuscript in preparation).

cDNA synthesis, sequencing and in situ hybridization

Total RNA was extracted from A. gambiae RSP-ST individual mosquitoes using the TRI reagent (SIGMA), according to the manufacturer's instructions. The RNA was treated with DNase to remove any contaminating genomic DNA and the mRNA was reverse transcribed into cDNA using superscript II (Gibco BRL) and an oligo (dT) adapter primer (5′-GACTCGAGTCGACATCGA(dT)17-3′).

BAC DNA was isolated using Qiagen Plasmid maxi kits. BAC sequencing reactions were performed using 1 µg of BAC DNA as a template and ABI BigDye Terminator chemistry. After electrophoresis on an ABI 377 automatic sequencer, contigs were assembled and the sequences annotated using the LASERGENE software package (DNAstar, Madison, WI).

BAC clones were physically mapped to polytene chromosomes prepared from half-gravid ovaries of the PEST strain of A. gambiae as described previously (Kumar & Collins, 1994).

Cloning of A. gambiae Cyp6 genes

As part of the A. gambiae genome initiative, the insert ends of each clone from the ND-1 BAC library have been determined by single pass sequence at Genoscope and the Institut Pasteur (Paris, France) (www.genoscope.cns.fr/externe/English/Projets/Projet_AK/AK.html). These sequences were queried against the GenBank database and fifteen BAC end sequences with similarity to P450 gene sequences were identified (Table 1). Eight of these sequences encoded partial Cyp6 genes and these BAC clones were mapped to the polytene chromosomes by in situ hybridization. Four of these (BAC clones 04I08, 12D02, 18L09 and 24H06) mapped to the same polytene division of the polytene chromosome (division 30A on chromosome 3R). The end sequence of clone 04I08 was predicted to contain the 3′ end of a Cyp6 gene. The 840 bp of sequence from the end of BAC clone 24H06 was predicted to encode an internal fragment of a distinct Cyp6 P450, clone 12D02 encodes the putative 5′ end of a third Cyp6 gene and clone 18L09 contains the 3′ end of a fourth Cyp6 gene. We used primer pairs designed from these partial Cyp6 genes to screen the BAC library and identified a clone (10I20) that contains three of these four Cyp6 genes. This clone, and clone 18L09 were used as templates for further sequencing of this gene cluster.

Three of the remaining four BAC clones whose end sequence showed identity to Cyp6 genes mapped to division 13 and 14 of chromosome 2R (07G05, 18M16 and 29O06). Putative full-length sequences for each of these P450 genes were obtained by further sequencing of the above BAC clones and by searching the A. gambiae Trace database (www.ncbi.nlm.nih.gov/blast/mmtrace.html), which consists of the unassembled shotgun sequences generated during the sequencing of the A. gambiae genome. The putative 5′ ends of the partial Cyp6 genes identified in the end sequence data of 17D17 and 18L09 were also identified by screening the Trace database.

As the putative full-length sequence of each Cyp6 gene was determined, primers encompassing the predicted start and stop codons were designed and used to amplify the full coding region from adult A. gambiae cDNA. These cDNAs were sequenced to verify the intron/exon boundaries and translated to determine the predicted amino acid sequence. The A. gambiae P450s were assigned specific names by the P450 nomenclature committee (http://drnelson.utmem.edu/CytochromeP450.html). Amino acid alignments were performed using the ClustalW program (Thompson et al., 1994).

Cloning of A. gambiae Cyp9 genes

The end sequence of BAC clone 18B07, from the ND-1 library showed identity to the 3′ end of a Cyp9 gene. A primer pair was designed based on this sequence and used to isolate an overlapping clone from this BAC library (11C21), which was used as a template for obtaining the full-length sequence of this putative Cyp9 gene. A further partial Cyp9 sequence was identified in the end sequence of clone 105L23 from a second A. gambiae BAC library (J. R. Hogan and F. H. Collins, personal communication). Again, primers were designed to amplify this partial gene and used to screen the original BAC library, ND-1. The single positive clone identified (01N01) was used as a template to obtain the remainder of this Cyp9 sequence. As the putative full-length sequence of each of these Cyp9 genes was determined, primers encompassing the predicted start and stop codons were designed and used to amplify the full coding region from adult A. gambiae cDNA.

Cloning of A. gambiae Cyp4 genes

Degenerate oligonucleotide primers designed from the conserved regions of cytochrome P450 proteins of the CYP4 family (Scott et al., 1994) were used to amplify partial Cyp4 genes from A. gambiae cDNA. Twenty pmoles of primers Cyp4-F [5′-GAGGTIGAYACITTCATGTTCGARGGICACGAYAC-3′] and Cyp4-R [5′-CTGICCGATRCAGTTICGBGGICCIGCSIWGAABGG-3′] were used in PCR reactions containing 2 mm MgCl2, 0.2 mm dNTP and one unit of Taq DNA polymerase in 20 mm Tris-HCl and 50 mm KCl. RSP-ST cDNA, synthesized from larval or adult mRNA, was used as a template. The PCR conditions were 94 °C for 5 min and then thirty cycles of 94 °C for 30 s, 45 °C for 30 s and 72 °C for 30 s with a final 10 min extension at 72 °C. Products of the expected size (approximately 450 bp) were ligated into PGEM T-easy vector and sequenced. A total of thirty-one clones were sequenced and assembled into thirteen contigs. Eleven of these contigs showed identity to P450 genes. This cDNA cloning approach will preferentially detect highly expressed cDNAs (as an example, of the twenty-seven clones that encoded P450 fragments, CYP4G16 was represented ten times). In a more systematic attempt to identify all members of the CYP4 family in A. gambiae we used the ND-1 BAC library as a template for PCR reactions using the same degenerate CYP4 primers. This library consists of 12 200 clones arranged as individual clones in 384-well plates. Pools of DNA extracted from each of the thirty-two plates were screened by PCR with the Cyp4-F and Cyp4-R primers as above. Twenty-two of the thirty-two plates gave PCR products of the expected size. Pools of DNA extracted from subsets of four columns or four rows of the 384-well plate were then screened to identify a subset of sixteen BAC clones one or more of which contained a Cyp4 gene. In most cases, the individual BAC clone was identified by screening all sixteen of these clones to identify the positive BAC(s), but in some cases the PCR product from the screening of the rows and columns was subcloned directly and used for sequencing. BAC clones containing genes encoding nine of the eleven partial Cyp4 cDNAs previously cloned were identified by this approach. In addition a further seven partial Cyp4 genes were identified for which no corresponding cDNA was isolated in our preliminary screen.

Acknowledgements

We are grateful to the Anopheles gambiae genome-sequencing community for generating, and making publicly available, the sequence data referred to within this report. This work was partially funded by Wellcome Trust, Royal Society and Leverhulme fellowships (to H.R.) and a grant from the John D. and Catherine T. MacArthur Foundation (to F.H.C.). C.W.R. acknowledges support from the Institut Pasteur, the Centre National de la Recherche Scientifique and Genoscope-Centre national de séquençage, France.

Ancillary